Apache Spark Interview question Set 3

1. Difference between Coalesce and repartition?

Repartition is used to increase or decrease the number of partition with equal sized data and creates a lot of shuffling.

Coalesce can be used to decrease the the number of partition or use existing partitions minimizing the amount of data that is shuffled.

2. Advantages of parquet file format in Spark?

Parquet file is native to spark and Parquet file with snappy compression is best optimized format for spark application .It carries metadata…