datadotzweekly

DataDotz Bigdata Weekly

Databricks
=================

This post describes how a unified analytics platform, such as Databricks, can power multiple use cases and developer personas. Using the Amazon public product ratings dataset, it shows how both an analyst and a data scientist can build reports and machine learning prediction algorithms (respectively) using the notebook features. Continue reading

Read More
datadotzweekly

DataDotz Bigdata Weekly

MapR
=================

MapR has a great technical and architectural comparison of MapR-DB with Apache HBase and Apache Cassandra. The article spends some timing describing the trade-offs of Log Structured Merge (LSM) trees that power HBase and Cassandra, including read and (async) write amplification. Continue reading

Read More
datadotzweekly

DataDotz Bigdata Weekly

Deep Learning with Intel’s BigDL and Apache Spark
=================

The Deep Learning landscape is still evolving. On one hand we have veteran frameworks like Theano and Caffe, or more popular ones like Tensor Flow, while on the other we see emergence of JVM-based frameworks that can perform distributed deep learning using GPUs and CPUs. These JVM-based tools can leverage existing Spark clusters to parallelize the training of models. This post will discuss Intel’s BigDL, an open source distributed deep learning framework for Big Data platform using Apache Spark. Continue reading

Read More
datadotzweekly

DataDotz Bigdata Weekly

Monitoring Kafka Consumer Offsets
=================

In this blog post I show how to read Kafka consumer offsets, get them into Prometheus and visualize using Grafana. This is very useful if you’re running a streaming application reading from Kafka and want to know if your application is up to speed or lagging behind. Continue reading

Read More