Category Archives: Uncategorized

DataDotz BigData Weekly

DataDotz Bigdata Weekly

Data Management Strategies for Computer Vision
============================================

Computer vision (CV) developers often find the biggest barrier to success relates to data management, and yet so much of what you’ll find about CV is about the algorithms, not the data. In this blog, I’ll describe three separate data management strategies I’ve used with applications that process images. Continue reading

Read More
DataDotz BigData Weekly

DataDotz Bigdata Weekly

Replication Guide On HDFS and Amazon Web Services
=======================

Hortonworks’ Data Lifecycle Manager (DLM), an extensible service built on the Hortonworks DataPlane Platform (DPS) provides a complete solution to replicate HDFS, Hive data, metadata and security policies between on-premises and Amazon S3. Continue reading

Read More
DataDotz BigData Weekly

DataDotz Bigdata Weekly

Spark core concepts visualized
=======================

Learning Spark is not an easy thing for a person with less background knowledge on distributed systems. Even though I have been using Spark for quite some time, I find it time-consuming to get a comprehensive grasp of all the core concepts in Spark. The official Spark documentation provides a very detailed explanation, yet it focuses more on the practical programming side. Continue reading

Read More
DataDotz BigData Weekly

DataDotz Bigdata Weekly

History of High Availability
=======================

In the days of yore, databases ran on single machines. There was only one node and it handled all reads and all writes. There was no such thing as a “partial failure”; the database was either up or down.Total failure of a single database was a two-fold problem for the internet; first, computers were being accessed around the clock, so downtime was more likely to directly impact users;. Continue reading

Read More
DataDotz BigData Weekly

DataDotz Bigdata Weekly

Apache Pulsar Tiered Storage with Amazon S3
========================================

Apache Pulsar’s tiered storage feature enables Pulsar to offload older messages on a topic to a long-term storage system, freeing up space in Apache BookKeeper and taking advantage of scalable low-cost storage options such as cloud storage. Continue reading

Read More