First create a azure free account and try this simple pipeline.

To create pipeline which moves file from one blob container to other blob container, we need 2 resources in Microsoft Azure

  1. Azure data factory

Azure storage account creation step by step:

Provide necessary details as…

Spark Out Of Memory Error

OOM error Spark driver level:

1. Spark driver is the main control of spark application .if its configured with less memory to collect all data of files then it throws error.

2. If table size which is to be broadcasted is huge then also driver faces OOM error.

OOM error…

1. Difference between Coalesce and repartition?

Repartition is used to increase or decrease the number of partition with equal sized data and creates a lot of shuffling.

Coalesce can be used to decrease the the number of partition or use existing partitions minimizing the amount of data that is shuffled.

1.Difference between groupByKey() and reduceByKey() in spark?

groupBykey() works on dataset with key value pair(K,V) and groups data based on the key.A lot of shuffling occurs while grouping the dataset if it is not partitioned.

val dataset = sc.parallelize(Array((‘a’,5),(‘b,3),(‘b’,4),(‘c’,7)),3)

val groupdataset= data.groupByKey().collect()


reduceByKey() equivalent to grouping+ aggregation .We can…

Are you worried about getting older?

Here is some tips you must follow .
1.Dont lie straight while sleeping .Always lie on your back otherwise it will obstruct blood circulation and cause skin sagging.

2. Always eat plenty of green leafy vegetables and lot of fruits. Avoid fried chips and…

Lip balm moisturizes our lips skin tone by enhancing blood circulation over lips. Due to the preservatives added in lip balms available in market , these are not beneficial to our skin tone of lips .

So let’s begin lip balm preparation at home which you can use all the…

Hbase is a distributed No SQL system built on top of HDFS(Hadoop distributed file system).

It is derived from Google’s Bigtable and stores huge volume of structured or unstructured data over discrete columns instead of rows and provides consistent read and write access. …

A NoSQL database can be called as non SQL or non relational database that provides a way to store and retrieve data modeled in non tabular format.

Why NoSqL?

A traditional database system prefers more predictable, structured data and has been dominating the database industry for the past few years…

Vijayadashami known as Dussehra is one of the major Hindu festivals which is celebrated every year at the end of Navaratri .It marks the end of Durga Puja and Ramlila. This Navratri festival is associated to the prominent battle of Maa Durga and buffalo demon Mahishasura that lasted for nine…

Debashree Gorai

Information and Technology Analyst|Bigdata Developer |Spark|Scala

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store