DataDotz BigData Weekly

DataDotz Bigdata Weekly

This entry was posted in Uncategorized on by .   0 Comment[s]

Cloudera Manager
================

Cloudera Enterprise BackUP and Disaster Recovery (BDR) enables you to replicate data across data centers for disaster recovery scenarios. As a lower cost solution to geographical redundancy or as a means to perform an on-premises to cloud migrationBDR can also replicate HDFS and Hive data to and from Amazon S3 or a Microsoft Azure Data Lake Store.
http://blog.cloudera.com/blog/2018/06/how-to-automate-replications-with-cloudera-manager-api/

Tracing Distributed JVM
=====================

Computing frameworks like Apache Spark have been widely adopted to build large-scale data applications. For Uber, data is at the heart of strategic decision-making and product development. To help us better leverage this data, we manage massive deployments of Spark across our global engineering offices.

https://eng.uber.com/jvm-profiler/
Uber’s Hadoop Distributed File System
==================================

Uber Engineering adopted Hadoop as the storage (HDFS) and compute (YARN) infrastructure for our organization’s big data analysis. This analysis powers our services and enables the delivery of more seamless and reliable user experiences.

https://eng.uber.com/scaling-hdfs/
Big Data
=======

Big data comes in a variety of shapes. The Extract-Transform-Load (ETL) workflows are more or less stripe-shaped (left panel in the figure above) and produce an output of a similar size to the input. Reporting workflows are funnel-shaped (middle panel in the figure above) and progressively reduce the data size by filtering and aggregating.

https://engineering.linkedin.com/blog/2017/06/managing–exploding–big-data
Apache Kafka
============

The main goal of this post is to demonstrate the concept of multi broker, partitioning and replication in Apache Kafka. At the end of this post, steps are included to setup multiple brokers along with partitioning and replication.If you are new to Apache kafka, you can refer below posts to understand Kafka quickly.

http://blog.cask.co/2018/05/cask-is-joining-google-cloud/
http://nverma-tech-blog.blogspot.com/2015/12/apache-kafka-multibroker-partitioning.html
Dynamic Machine Translation in the LinkedIn Feed
============================================

The need for economic opportunity is global, and that is represented by the fact that more than half of LinkedIn’s active members live outside of the U.S. Engagement across language barriers and borders comes with a certain set of challenges—one of which is providing a way for members to communicate in their native language. In fact, translation of member posts has been one of our most requested features, and now it’s finally here.

https://engineering.linkedin.com/blog/2018/06/dynamic-machine-translation-in-the-linkedin-feed-
Hulu: Migrating Hadoop cluster from one Datacenter to another datacenter.
==================================================================

Hulu which provides subscription video on demand service recently migrated their Hadoop clusters from one data center to another. In this tech blog, they have written all the challenges and possible solutions for the migration.Its a definitely an interesting read.

https://medium.com/hulu-tech-blog/migrating-hulus-hadoop-clusters-to-a-new-data-center-part-one-extending-our-hadoop-instance-b88c4bda61bc