What is the difference between reduce() and reduceByKey()?
reduce() vs reduceByKey()
The main difference between reduce and reduceByKey is, reduce operates on rdd of object, whereas reduceByKey operates on RDD of key value pairs. The function reduce() is a member of RDD[T] class, while reduceByKey() is member of PairRDDFunctions[K,V] class.
reduce() function returns a collection which will be included in DAG for the next level, so it is implemented as action. While reduceByKey returns another RDD which might a part of sequence of operations in a DAG, so it it implemented as transformation.