Introduction to Apache Hbase and its commands

Debashree Gorai
6 min readOct 31, 2020

Hbase is a distributed No SQL system built on top of HDFS(Hadoop distributed file system).

It is derived from Google’s Bigtable and stores huge volume of structured or unstructured data over discrete columns instead of rows and provides consistent read and write access. This makes use this HBase feature for high-speed requirements .

Data representation in Hbase Table:

An HBase table is divided into rows, column families, columns, and cells. Row keys are unique keys to identify a row, column families are groups of columns, columns are fields of the table, and the cell contains the actual value or the data.

Timestamp is a combination of date and time, used for versioning of data in hbase table.. Whenever data is stored, it is stored with its timestamp.

Hbase architecture:

There are 3 types of servers present in a master-slave type of HBase Architecture.

1. HBase HMaster

2. Region Server

3. Zookeeper.

Whenever a client sends a read/write request, HMaster receives the request and forwards it to the corresponding region servers. These servers serve data for reads and write purposes .Zookeeper, maintains live cluster state and coordinates with all components.

HMaster

HBase HMaster acts as a master server in Hbase architecture which assigns regions to region servers in the Hadoop cluster for load balancing and does DDL operations in cluster. There is only one HMaster exists in a cluster .In case of multiple master setup, there is only single active master at a time.

ZooKeeper:

Zookeeper is a centralized monitoring server which maintains configuration information and provides distributed synchronization. It Provides ephemeral nodes, which represent different region servers and tracks server failure and network partitions

Region Server :

Region server is responsible for handling and executing read, write, update, and delete requests from clients .

Each region server (slave) serves a set of regions which are comprised of Column families .

HBase tables are partitioned into multiple regions with every region storing multiple table’s rows. There are following components of a Region Server, which runs on an HDFS data node:

Block Cache — This is the read cache and stores frequently read data in memory . When the block cache becomes full, recently used data is evicted.

MemStore- This is the write cache .It stores new data which is not yet written to the disk. One MemStore exists per column family per region.

Write Ahead Log (WAL) -Write Ahead Log is a file on the distributed file system that stores new data that is not persisted to permanent storage.Moreover, we also use it for recovery in the case of failure.

HFile -It store the rows as sorted KeyValues on disk.

HBASE write:

1.When Client wants to write data , in turn first communicates with Regions server and then regions writes the data to the WAL (Write Ahead Log).

The edits are then appended at the end of the WAL file.

This WAL file is maintained in every Region Server and Region Server uses it to recover data which is not committed to the disk.

2. As soon as the data is written to the WAL , it is placed in the MemStore

where the data gets sorted and then, flushes into HFile. Memstore is used for storing data in a Distributed file system based on Row Key. Memstore are placed in Region server main memory and HFiles are written into HDFS. Only one MemStore exists per column family and updates are sorted per column family.

Hbase Read:

1.Client wants to read data from Regions ,the scanner first looks for the Row cell in Block cache. Here all the recently read key value pairs are stored.

2.if it fails then the data moves to memstore as Client can have direct access to Mem store, and it can request for data.

3. Client approaches HFiles to get the data. The data are fetched and retrieved by the Client.

Now you have got idea about basic architecture of hbase and how data flows through hbase insertion ,updation and read .

Hbase commands

Like rdbms schema ,hbase has namespace.

If no namespace is defined all created hbase tables are in default namespace.

1.Create namespace in Hbase:

create_namespace ‘<namespace name>’

Example:

hbase(main):003:0> create_namespace ‘student_management’

0 row(s) in 0.0290 seconds

2.Creating table in hbase

create ‘<tablename>’, ‘<columnfamilyname>’

If namespace is defined then command:

create <namespace:tablename>, <columnfamilyname>

if there is more column family those should be declared here separed by comma(,).

Example:

hbase(main):004:0> create ‘student_management:student’,’personal_details’,’marks_details’

0 row(s) in 1.2180 seconds

=> Hbase::Table — student_management:student

3. listing out all tables

list

it will display all the tables that are present or created in HBase

4.Describe:

describe ‘<table name>’

example:

3.insert data into table:

command to insert data into hbase table:

put ‘<hbase tatble name>’, ‘<row_key>’,’<column_family_name:column_name>’,’<column_value>’

example:

put ‘student_management:student’,’1',’personal_details:name’,’rohit’

put ‘student_management:student’,’1',’personal_details:age’,16

put ‘student_management:student’,’1',’marks_details:math’,78

put ‘student_management:student’,’1',’marks_details:eng’,80

put ‘student_management:student’,’2',’personal_details:name’,’rita’

put ‘student_management:student’,’2',’personal_details:age’,15

put ‘student_management:student’,’2',’marks_details:math’,83

put ‘student_management:student’,’2',’marks_details:eng’,90

4.To fetch details from hbase table:

get <’tablename’>, <’rowname’>, <other parameters>

example:

5.alter table:

It is used to add or remove column and modify the table properties in hbase table

command to update versions :

alter <tablename>, NAME=>’<column familyname>’, VERSIONS=><new version no>

example

alter ‘student_management:student’ ,NAME =>’personal_details’,VERSIONS =>2

To delete and add column family:

Note: syntax for method should be followed to update regions using alter command

To enable snappy compression on the existing HBase column family:

6.Scan command:

This command scans entire table and displays the table contents.

scan <’tablename’>, {Optional parameters}

example:

7. Delete command:

This command will delete cell value at defined table of row or column.

delete <’tablename’>,<’row name’>,<’column name’>

example:

8.Truncate:

This command will truncate the records of hbase table but schema will be present.

It performs 3 functions-

Disables table if it already presents

Drops table if it already presents

Recreates the mentioned table

truncate <tablename>

example:

9.Drop

To drop the table present in HBase, first we have to disable it .

drop <’table name’>

example:

same like disable , enable command is present which is used to enable hbase table

Hope you got idea about the basic commands in Hbase and how to run in hbase shell.

Please follow for more updates.

Originally published at https://itechshree.blogspot.com.

--

--