map vs mapValues

mapValues -

This function works with PairRDDs only. So this function always requires an RDD of type RDD[(a,b)]. mapValues functions always works on the values of tuple, means the second part of the tuple.

mapĀ -

map function work on the elements of the RDD irrespective of type of the RDD.

Let's look a code snippet of using both map and mapValues.

 

val result: RDD[(A, B)] = rdd.map { case (k, v) => (k, func(v)) }

val result: RDD[(A, B)] = rdd.mapValues(f)

In the first code snippet we are trying to performing an operation on values of all the tuples of RDD. For that we are using a case function and then applying. Where as doing performing the same operation in the second step using mapValues function, which is very straight forward and easy to understand.

 

When we want to apply any function on both keys and values then mapValues function is not suitable, we must go for map function instead.

 

Please provide your feedback..

Leave a Reply

Your email address will not be published. Required fields are marked *