Tuesday, January 31, 2012

Hadoop, HBase in Amazon Cloud (AWS)

Hadoop & HBase can be configured in Amazon Cloud to take the advantage of Distributed Computing where one can quickly start new instances as per requirement (or) load to an application at any time.

Elastic Compute Cloud  (EC2) : It provides easy access and configurable instances required to scale application easily.
Elastic Map Reduce (EMR) : It provides support for Hadoop to run Map-Reduce jobs on top of EC2 and S3 data storage.
Simple Storage Service (S3) : A persistent data storage service in Amazon Cloud with high availability.

Out of many running Amazon Machine Image (AMI), one acts as Namenode and rest as DataNode for Hadoop. NameNode or MasterNode contains HBase HMaster running on it which uses 'ssh', public-private-key to communicate to other SlaveNode. JobTracker talks to the NameNode and gets location of Data and it tells TaskTracker to perform the actual processing of Data on several DataNode/SlaveNode. HBase keep same data on many DataNode( equal to replication value defined HBase configuration) for faster access. When a table size/hit increases rapidly, it divides it into two to handle the bottleneck for any DataNode. 

Locally configured Hadoop, HBase can be converted to Amazon AMI and deployed to Cloud directly with any number of same instance. 

1 comment:

  1. This blog nicely explain about amazon cloud aws and about EC2, EMR, S3 and Hadoop.
    Thanks for sharing