datadotzweekly

DataDotz Bigdata Weekly

MapR
=================

MapR has a great technical and architectural comparison of MapR-DB with Apache HBase and Apache Cassandra. The article spends some timing describing the trade-offs of Log Structured Merge (LSM) trees that power HBase and Cassandra, including read and (async) write amplification. Continue reading

Read More
datadotzweekly

DataDotz Bigdata Weekly

Deep Learning with Intel’s BigDL and Apache Spark
=================

The Deep Learning landscape is still evolving. On one hand we have veteran frameworks like Theano and Caffe, or more popular ones like Tensor Flow, while on the other we see emergence of JVM-based frameworks that can perform distributed deep learning using GPUs and CPUs. These JVM-based tools can leverage existing Spark clusters to parallelize the training of models. This post will discuss Intel’s BigDL, an open source distributed deep learning framework for Big Data platform using Apache Spark. Continue reading

Read More
datadotzweekly

DataDotz Bigdata Weekly

Monitoring Kafka Consumer Offsets
=================

In this blog post I show how to read Kafka consumer offsets, get them into Prometheus and visualize using Grafana. This is very useful if you’re running a streaming application reading from Kafka and want to know if your application is up to speed or lagging behind. Continue reading

Read More
datadotzweekly

DataDotz Bigdata Weekly

Accessing Secure Cluster from Web Applications
=================

Apache Hadoop clusters in ways other than through hue and Hadoop Command Line Interface (CLI) and integrate it closely with the applications they develop, we often get asked how to access their secure Hadoop cluster from within the custom applications. Many customers use a service account in their application and access the cluster with a fixed service account. However, other customers would like to access as the end users who have authenticated to the application. Continue reading

Read More