Category Archives: Big Data

datadotzweekly

DataDotz Bigdata Weekly

Using Amazon S3 with Cloudera BDR
=================

More of you are moving to public cloud services for backup and disaster recovery purposes, and Cloudera has been enhancing the capabilities of Cloudera Manager and CDH to help you do that. Specifically, Cloudera Backup and Disaster Recovery (BDR) now supports backup to and restore from Amazon S3.BDR lets you replicate HDFS data from your on-premise cluster to or from Amazon S3 with full fidelity (all file and directory metadata is replicated along with the data). Continue reading

Read More
datadotzweekly

DataDotz Bigdata Weekly

Serverless Delivery with Databricks and AWS CodePipeline
=====================================

Databricks interactive workspace serves as an ideal environment for collaborative development and interactive analysis. The platform supports all the necessary features to make the creation of a continuous delivery pipeline not only possible but simple. Continue reading

Read More
datadotzweekly

DataDotz Bigdata Weekly

Server-Side Encryption for Amazon Kinesis Streams
==========

Amazon Kinesis Streams to ingest, process, and deliver data in real time from millions of devices or applications. Use cases for Kinesis Streams vary, but a few common ones include IoT data ingestion and analytics, log processing, clickstream analytics, and enterprise data bus architectures.Within milliseconds of data arrival, attached to a stream are continuously mining value or delivering data to downstream destinations. Continue reading

Read More
datadotzweekly

DataDotz Bigdata Weekly

Apache Kafka
==========

Spark Streaming integration with Kafka allows users to read messages from a single Kafka topic or multiple Kafka topics. A Kafka topic receives messages across a distributed set of partitions where they are stored. Each partition maintains the messages it has received in a sequential order where they are identified by an offset, also known as a position Continue reading

Read More
datadotzweekly

DataDotz Bigdata Weekly

AMAZON KINESIS VS APACHE KAFKA FOR BIG DATA ANALYSIS
==========

Data processing today is done in form of pipelines which include various steps like aggregation, sanitization, filtering and finally generating insights by applying various statistical models. Amazon Kinesis is a platform to build pipelines for streaming data at the scale of terabytes per hour. Continue reading

Read More
datadotzweekly

DataDotz Bigdata Weekly

Reading data securely from Apache Kafka
==========

The Cloudera Distribution of Apache Kafka 2.0.0 (based on Apache Kafka 0.9.0) introduced a new Kafka consumer API that allowed consumers to read data from a secure Kafka cluster. This allows administrators to lock down their Kafka clusters and requires clients to authenticate via Kerberos. Continue reading

Read More
datadotzweekly

DataDotz Bigdata Weekly

Apache Spark’s Structured Streaming
==========

This post from Databricks shows how powerful Spark’s Structured Streaming APIs are for doing windowed aggregations with support for late data/watermark calculations. The post describes and visualizes, at a high-level, the logic that is being abstracted by these APIs. Continue reading

Read More