How to add unique index or unique row number to reach row of a DataFrame?

There are multiple ways to do this Spark. Here we have discussed two of the approaches to accomplish this task.

1). We have an in-built indexing function monotonically_increasing_id to achieve this. Below is the code snippet that show how to use this function.

scala> val data = Seq("AAA","BBB","CCC","DDD","EEE").toDF("names")
data: org.apache.spark.sql.DataFrame = [names: string]

scala> data.withColumn("id",monotonically_increasing_id).show
+-----+---+
|names| id|
+-----+---+
|  AAA|  0|
|  BBB|  1|
|  CCC|  2|
|  DDD|  3|
|  EEE|  4|
+-----+---+

2). Using Window and row_number we can complete the same task. Below is the code for that:

scala> val window = Window.orderBy("names")
window: org.apache.spark.sql.expressions.WindowSpec = org.apache.spark.sql.expressions.WindowSpec@12fa0f93

scala> val indexed = data.withColumn("index", row_number.over(window))
indexed: org.apache.spark.sql.DataFrame = [names: string, index: int]

scala> indexed.show
+-----+-----+
|names|index|
+-----+-----+
|  AAA|    1|
|  BBB|    2|
|  CCC|    3|
|  DDD|    4|
|  EEE|    5|
+-----+-----+

Blog

How to add unique index or unique row number to reach row of a DataFrame?

Leave a Reply Cancel reply