1.Difference between groupByKey() and reduceByKey() in spark?

groupBykey() works on dataset with key value pair(K,V) and groups data based on the key.A lot of shuffling occurs while grouping the dataset if it is not partitioned.

val dataset = sc.parallelize(Array((‘a’,5),(‘b,3),(‘b’,4),(‘c’,7)),3)

val groupdataset= data.groupByKey().collect()

group.foreach(println)

reduceByKey() equivalent to grouping+ aggregation .We can say it works on combining dataset pairs based on key within same machine before shuffling.

val data= Array(“a”,”b”,”c”,”d”)

val combined_data = sc.parallelize(words).map(w => (w,1)).reduceByKey((v+w)=> v+w)

data.collect.foreach(println)

2.Define lineage graph and DAG in spark?

All RDDs created in Spark depends one or more RDD that new rdd contains pointer to parent…


Are you worried about getting older?

Here is some tips you must follow .
1.Dont lie straight while sleeping .Always lie on your back otherwise it will obstruct blood circulation and cause skin sagging.

2. Always eat plenty of green leafy vegetables and lot of fruits. Avoid fried chips and oily food as much as possible. Drink fruit juice , almond milk, turmeric milk etc. Turmeric penetrates in the skin and makes it glow faster. Almond ,Walnut consists ingredients which gives a glowing skin.

3. Try to stay as less as possible in AC. If One has to stay in…


Lip balm moisturizes our lips skin tone by enhancing blood circulation over lips. Due to the preservatives added in lip balms available in market , these are not beneficial to our skin tone of lips .

So let’s begin lip balm preparation at home which you can use all the time.

Ingredients:

1.one beetroot

2. 1 table spoon glycerin

3. 1/2 table spoon coconut oil

Method:

  1. First take 1 beetroot and cut it into small pieces and grate it in mixer.
  2. Now extract juice from grated beetroot and strain the juice well.
  3. Then take a bowl and pour the juice…

Hbase is a distributed No SQL system built on top of HDFS(Hadoop distributed file system).

It is derived from Google’s Bigtable and stores huge volume of structured or unstructured data over discrete columns instead of rows and provides consistent read and write access. This makes use this HBase feature for high-speed requirements .

Data representation in Hbase Table:

An HBase table is divided into rows, column families, columns, and cells. …


A NoSQL database can be called as non SQL or non relational database that provides a way to store and retrieve data modeled in non tabular format.

Why NoSqL?

A traditional database system prefers more predictable, structured data and has been dominating the database industry for the past few years. Nowadays as business grows,social media dominates there is a need

1.support a large number of concurrent users

2.Handle huge amount of semi structured data as well as unstructured data

3.High availability system without any downtime

4.Huge amount of data insertion and population

Hence , Relational databases are unable to meet…


Vijayadashami known as Dussehra is one of the major Hindu festivals which is celebrated every year at the end of Navaratri .It marks the end of Durga Puja and Ramlila. This Navratri festival is associated to the prominent battle of Maa Durga and buffalo demon Mahishasura that lasted for nine days. This war ended at 10th day by elimination of demon mahishasura . Each day in Navratri is dedicated to the nine avatars of Maa Durga .


Healthy and shiny hair always creates a good impression . It is especially true for Women to notice hair first while meeting the person for first time. It is said that condition of ones hair can mirror the overall health of body. So hair care is as important as our body specially in winter. So it becomes prone to fall during this season.

Oil massaging:

Oiling hair 2 or 3 times a in week to keep hair moisturized so that it could avoid hair split ends , hair fall etc.

Warm up mix of olive oil, coconut oil and almond…


Cake is kind of sweet food prepared by flour, sugar and other ingredients.

It is very special for every occasions . Hence, it becomes pleasure.

Be it birthday party , engagement party ,anniversary celebration ,New year what not !! people hangs around until cake cutting ceremony completes. Though every one gathers several things but It is always on top priority for every celebration .

But as per today’s scenario most of us look to good diet plan for balanced health . Hence , we try to avoid consuming cake as less as possible though it is priority for all occasions.


I am explaining you how to configure flume configuration file .

I basically came across many times when people get stuck multiple times while configuring flume agent with respective source ,sink and channel. Here ,I shall ease you by providing an example to design flume configuration file though which you can extract data from source to sink via channel.

Let me demonstrate with an example .

Here I used only the parameters which are mandatory to configure source ,sink and channel for type spool, hdfs and memory respectively. you can add more parameters under source ,sink and channel if needed

Debashree Gorai

Information and Technology Analyst|Bigdata Developer |Spark|Scala

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store