Monday, July 23, 2012

How to : working with HBase Coprocessor

HBase Coprocessor : It allows user code to get executed at each region(for a table) in region server. Clients only get the final responses from every region. HBase provides AggregateProtocol to support common aggregation (sum,avg,min,max,std) functionality.

Coprocessor framework is divided into : Endpoint : It allows you to write your own pluggable class which extends BaseEndpointCoprocessor and can have any number of methods which you want to be executed at table region server. Method executes much faster at regionserver and minimizes the network load as only results get transmitted to the client. Client need to do the final reduction on results returned by each region server.

Example : Below example illustrates just call to HBase coprocessor, A separate 'GroupByAggregationProtocol' interface extending 'CoprocessorProtocol' with methods required and Actual implementing class which implements 'GroupByAggregationProtocol' and extends 'BaseEndpointCoprocessor' must be created and deployed in each regionserver.
Map<byte[], Map<String, List<Long>>> resultFromCoprocessor = table
        .coprocessorExec(GroupByAggregationProtocol.class,
        <start-RowKey>,  // byte array or can be null
        <end-Rowkey>,   // byte array or can be null
        new Batch.Call<GroupByAggregationProtocol,  Map<String, List<Long>>>() {
             @Override
             public Map<String, List<Long>> call(GroupByAggregationProtocol aggregation)  throws IOException {
                return aggregation.getGroupBySum(filterList, scan);
             }
});
for (Map.Entry<byte[], Map<String, List<Long>>> entry : resultFromCoprocessor
 .entrySet()) {
 Map<String, List<Long>> en = entry.getValue();
 // Iterate through results from each regionserver   ......
       }    
}
Endpoint Coprocessors can be assumed as stored procedure in RDBMS.
Observers : It provides a hook to override few default methods of HBase when a event occurs.
It can be at three sub-levels
a) RegionObserver : handles/override Get, Put, Delete, Scan, and so on. It can be of type pre or post (eg : preGet, postDelete etc.)
b) MasterObserver : handles table creation, deletion and alter events. eg : preCreateTable or postCreateTable.
c) WALObserver : handles write-ahead log creation events.
eg : preWALWrite or postWALWrite .

Observer Coprocessors can be assumed as triggers in RDBMS.