datadotzweekly

DataDotz Bigdata Weekly

This entry was posted in Uncategorized on by .   0 Comment[s]

APACHE NIFI
=================

IBM’s Data Science Experience (DSX) comes in multiple flavors: cloud, desktop, and local. In this post we cover an IoT trucking demo on DSX local, i.e. running on top of Hortonworks Data Platform (HDP). We train and deploy a model, and then we use that model to score simulated incoming trucking data in Apache NiFi. We closely follow a data science lifecycle process as we discuss all the steps.
https://hortonworks.com/blog/iot-data-science-trucking-demo-dsx-local-apache-nifi/

Apache Kafka
==========

The Landoop blog has an overview of their web and API-based tool, Lenses, for exploring data in Kafka. Based around the Lenses SQL Engine, it detects data types from streams and has support for real-time views, batch-queries, functions and “time traveling.” There are tabular, tree, and raw views, and Jupyter integration via the API.

http://www.landoop.com/blog/2017/11/lenses-how-to-view-kafka-topics-data/
Apache Spark
==========

The Qubole blog has a tutorial for how to use the Dist-Keras framework for deep learning as part of a Spark ML pipeline. While a small part of the post is Qubole specific, it’s predominantly generally applicable to anyone looking to use Spark for deep learning.

https://www.qubole.com/blog/distributed-deep-learning-keras-apache-spark/
Amazon Web Services
==========

A secure design for data access requires making tradeoffs. This post describes how to ensure security for Amazon Redshift with multiple accounts, which has some apparent inconveniences that can actually be automated. The walk-through describes loading data via Apache Spark on Amazon EMR and shuffling data across accounts (via assuming roles with Amazon STS).

https://aws.amazon.com/blogs/big-data/create-an-amazon-redshift-data-warehouse-that-can-be-securely-accessed-across-accounts/
Cloud-based Relational Database Management Systems
==========

Databricks and Microsoft have jointly developed a new cloud service called Microsoft Azure Databricks, which makes Apache Spark analytics fast, easy, and collaborative on the Azure cloud. Not only does this new service allow data scientists and data engineers to be more productive and work collaboratively with their respective teams, but it also gives them the ability to create and execute complex data pipelines without leaving the platform.

https://databricks.com/blog/2017/11/21/rdms-databricks-cloud-based.html
Apache Flink
==========

The Apache Flink 1.4.0 release is on track to happen in the next couple of weeks, and for all of the readers out there who haven’t been following the release discussion on Flink’s developer mailing list, we’d like to provide some details on what’s coming in Flink 1.4.0 as well as a preview of what the Flink community will save for 1.5.0.

https://data-artisans.com/blog/looking-ahead-apache-flink-1-4-1-5