Apache Spark interview questions Set 2

1.Difference between groupByKey() and reduceByKey() in spark?

groupBykey() works on dataset with key value pair(K,V) and groups data based on the key.A lot of shuffling occurs while grouping the dataset if it is not partitioned.

val dataset = sc.parallelize(Array((‘a’,5),(‘b,3),(‘b’,4),(‘c’,7)),3)