Qubole & Snowflake with Spark
The blog series covers the use cases directly served by the Qubole–Snowflake integration. The first blog discussed how to get started with ML in Apache Spark using data stored in Snowflake. Continue reading
Apache Spot (incubating) is used to analyze network data to detect infosec threats. This post provides a good overview of the architecture, which is built on Apache Kafka (for ingestion), Apache Spark (for ingestion and ML analysis), Apache Hadoop (for ingestion and storage), and more. Continue reading
It may be counter intuitive at first, but there are some pretty compelling reasons to store multiple different types of events on the same Kafka topic. In particular, when implementing an event sourcing strategy, order of events is key for correctness. Continue reading
Data ingestion into Splunk
Amazon Web Services (AWS) jointly announced that Amazon Kinesis Data Firehose now supports Splunk Enterprise and Splunk Cloud as a delivery destination. This native integration between Splunk Enterprise, Splunk Cloud, and Amazon Kinesis Data Firehose is designed to make AWS data ingestion setup seamless, while offering a secure and fault-tolerant delivery mechanism. Continue reading
A data lake is an increasingly popular way to store and analyze data that addresses the challenges of dealing with massive volumes of heterogeneous data. A data lake allows organizations to store all their data structured and unstructured in one centralized repository. Continue reading
Using Amazon S3 with Cloudera BDR
More of you are moving to public cloud services for backup and disaster recovery purposes, and Cloudera has been enhancing the capabilities of Cloudera Manager and CDH to help you do that. Specifically, Cloudera Backup and Disaster Recovery (BDR) now supports backup to and restore from Amazon S3.BDR lets you replicate HDFS data from your on-premise cluster to or from Amazon S3 with full fidelity (all file and directory metadata is replicated along with the data). Continue reading
Serverless Delivery with Databricks and AWS CodePipeline
Databricks interactive workspace serves as an ideal environment for collaborative development and interactive analysis. The platform supports all the necessary features to make the creation of a continuous delivery pipeline not only possible but simple. Continue reading
Apache NIFI Installation
* Apache NiFi is a software project from the Apache Software Foundation which enables the automation of data flow between systems.
* It can be described as data logistics.
* Similar to how parcel services move and track packages like Apache NiFi helps move and track data.
Server-Side Encryption for Amazon Kinesis Streams
Amazon Kinesis Streams to ingest, process, and deliver data in real time from millions of devices or applications. Use cases for Kinesis Streams vary, but a few common ones include IoT data ingestion and analytics, log processing, clickstream analytics, and enterprise data bus architectures.Within milliseconds of data arrival, attached to a stream are continuously mining value or delivering data to downstream destinations. Continue reading
Copyright © 2014. DataDotz All rights reserved.