Spark Out Of Memory Error

OOM error Spark driver level:

1. Spark driver is the main control of spark application .if its configured with less memory to collect all data of files then it throws error.

2. If table size which is to be broadcasted is huge then also driver faces OOM error.

OOM error…

1. Difference between Coalesce and repartition?

Repartition is used to increase or decrease the number of partition with equal sized data and creates a lot of shuffling.

Coalesce can be used to decrease the the number of partition or use existing partitions minimizing the amount of data that is shuffled.

1.Difference between groupByKey() and reduceByKey() in spark?

groupBykey() works on dataset with key value pair(K,V) and groups data based on the key.A lot of shuffling occurs while grouping the dataset if it is not partitioned.

val dataset = sc.parallelize(Array((‘a’,5),(‘b,3),(‘b’,4),(‘c’,7)),3)

val groupdataset= data.groupByKey().collect()

group.foreach(println)

reduceByKey() equivalent to grouping+ aggregation .We can…

Are you worried about getting older?

Here is some tips you must follow .
1.Dont lie straight while sleeping .Always lie on your back otherwise it will obstruct blood circulation and cause skin sagging.

2. Always eat plenty of green leafy vegetables and lot of fruits. Avoid fried chips and…

A NoSQL database can be called as non SQL or non relational database that provides a way to store and retrieve data modeled in non tabular format.

Why NoSqL?

A traditional database system prefers more predictable, structured data and has been dominating the database industry for the past few years…

Debashree Gorai

Information and Technology Analyst|Bigdata Developer |Spark|Scala

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store