There are multiple ways to do this Spark. Here we have discussed two of the approaches to accomplish this task.
1). We have an in-built indexing function monotonically_increasing_id to achieve this. Below is the code snippet that show how to use this function.
scala> val data = Seq("AAA","BBB","CCC","DDD","EEE").toDF("names") data: org.apache.spark.sql.DataFrame = [names: string] scala> data.withColumn("id",monotonically_increasing_id).show +-----+---+ |names| id| +-----+---+ | AAA| 0| | BBB| 1| | CCC| 2| | DDD| 3| | EEE| 4| +-----+---+
2). Using Window and row_number we can complete the same task. Below is the code for that:
scala> val window = Window.orderBy("names") window: org.apache.spark.sql.expressions.WindowSpec = org.apache.spark.sql.expressions.WindowSpec@12fa0f93 scala> val indexed = data.withColumn("index", row_number.over(window)) indexed: org.apache.spark.sql.DataFrame = [names: string, index: int] scala> indexed.show +-----+-----+ |names|index| +-----+-----+ | AAA| 1| | BBB| 2| | CCC| 3| | DDD| 4| | EEE| 5| +-----+-----+