Monday, February 20, 2012

How to : working with HBase Delete API

  • org.apache.hadoop.hbase.client.Delete
     HBase provide Delete to perform delete on a column(s), Column-Family(s) or entire Row, when Delete object is instantiated with a rowkey. 
     Delete accepts a Long Timestamp as parameter with Column-Family and a qualifier, which deletes all versions having smaller time-stamps. Delete creates a tombstone for any column or its version been deleted, HBase does the final deletion later when it goes for major compaction. 
IMPORTANT : If you try to 'put' data with the same timestamp which has been deleted recently, you'll not see it until HBase does its compaction. Though you'll not get any error or exception while doing  a 'put' but the same time you'll not see any result with 'scan' or 'get' until compaction happen. 
   If you don't provide a timestamp, default is current system time in milliseconds. 
Currently Update is not supported in HBase tables. A 'Delete' with 'put' is required to achieve this. If Update is on a column having multiple versions then timestamp plays critical role in maintaining the version order. Design your HBase schema accordingly :)

     To delete multiple rows or bulk delete, use 
public void delete(List<Delete> deletes)
            throws IOException
method which is under HTable class.

No comments:

Post a Comment