hbase_logo

Moving from HBase 0.94 to Hbase 0.98

This entry was posted in Big Data, Blog, Hadoop, Hbase on by .   0 Comment[s]

Version difference between Hbase 0.94 – Hbase 0.96:

Hbase 0.96 is more than a year of making. Some of the major improvements in this version are

  • Improved Stability: The node count configurability, data sizing, duration and more turned up on more bugs when we try to do scan or fetch. This has been fixed by introducing the table locks for cross cluster alterations and cross-row transaction support

  • Mean Time to Recovery: Failure detection and recovery process took huge amount of time in HBase. First identifying that the node is down and recovering the writes in progress and then reassigning the regions. So is there is any way that we can reduce these time taken? Yes it is possible to lower the default timeout value. Hbase is configured with a 3 minutes zookeeper timeout. In production environment the reasonable minimum is around 20s.You can find some more additional information in Mean Time to Recovery (MTTR)
  • Operability: Due to development of the new tools, the hbase operators can identify though a new UI using the metrics emission. It is now even possible to trace the lagging calls down through the Hbase stack. For more detailed information look into Migration to New metrics
  • Freedom to Evolve: Hbase provides the support for both the hadoop 1 and hadoop 2. Standardizing serialization on protobufs, with well defined schema, makes the evolving versions
  • Minimal Disturbance to the API: Radical changes in the API help to downstream projects by cleaning up the API’s in dividing into user and developers API
  • New Region Balancer: Initially the region counts was the only attribute considered by the balancer, but now even the read/ write load, locality and few more attributes are also considered with the balance decision
  • Support for NameSpaces: The table namespaces concept, i.e. grouping the tables similar to mysql’s notation of database has been contributed so they can manage their multi-tenant deploys

Version difference between Hbase 0.96 – Hbase 0.98:

The next major release after the Hbase 0.96 is the Hbase 0.98 with usual bug fixes and some of the major improvements.

  • Reverse Scans: Hbase now allows the users to go with the inverted scan i.e. scan in the reverse mode with just a small percentile difference in the performance when compared with the forward scan
  • Cell Level ACL’s and visibility tags: By using the Access Control Lists and the Security Labels, Hbase can provide the access control similar to the Accumulo, by enforcing it per cell. You can find some more additional information in Hbase Cell Security
  • Improved Compaction: With a new compaction policy “Stripe Compactions”, Hbase brings a lot of contributions to the compactions by which each region is automatically sharded into sub ranges, where there are compacted separately. You can find some more additional information in Comapction In Hbase
  • MapReduce Over SnapShots: Similar to the short circuit hdfs reads, the client can bypass the whole Hbase server layer and can stream the scan results from the mapreduce. This feature leverages the Hbase snapshots to implement pure client side scanning from data files in HDFS. Through this the scan speed has can be increased with the 5x speed. To know more about it look into MapReduce over SnapShots
  • Transparent Server side encryption: The ability to store the Hfiles and the write ahead logs in encrypted format is what called as transparent server side encryption

The Hbase version 0.98 is wire compatible with 0.96 release and a rolling upgrade is enough for upgrading from 0.96  and 0.94 release also supports the upgrading but the cluster has to be shutdown and migrated.

 

Written by Amudhan, Data Engineer @ DataDotz.

DataDotz is a Chennai based BigData Team primarily focussed on consulting and training on technologies such as Apache Hadoop, Apache Spark , NoSQL(HBase, Cassandra, MongoDB), Search and Cloud Computing.