DataDotz BigData Weekly

DataDotz Bigdata Weekly

This entry was posted in Uncategorized on by .   0 Comment[s]

Best Practices for Securing Amazon EMR
============================================

Amazon EMR is a managed Hadoop framework that you use to process vast amounts of data. One of the reasons that customers choose Amazon EMR is its security features. For example, customers like FINRA in regulated industries such as financial services, and in healthcare, choose Amazon EMR as part of their data strategy.They do so to adhere to strict regulatory requirements from entities such as the Payment Card Industry Data Security Standard (PCI) and the Health Insurance Portability and Accountability Act (HIPAA).
https://aws.amazon.com/blogs/big-data/best-practices-for-securing-amazon-emr/

ActiveMQ architecture and key metrics
======================================

Apache ActiveMQ is message-oriented middleware (MOM), a category of software that sends messages between applications. Using standards-based, asynchronous communication, ActiveMQ allows loose coupling of the elements in an IT environment, which is often foundational to enterprise messaging and distributed applications.

https://www.datadoghq.com/blog/activemq-architecture-and-metrics/
Built-in Image Data Source in Apache Spark 2.4
====================

Apache Spark 2.3 provided the ImageSchema.readImages API (see Microsoft’s post Image Data Support in Apache Spark), which was originally developed in the MMLSpark library. In Apache Spark 2.4, it’s much easier to use because it is now a built-in data source. Using the image data source, you can load images from directories and get a DataFrame with a single image column.

https://databricks.com/blog/2018/12/10/introducing-built-in-image-data-source-in-apache-spark-2-4.html
Teardown, Rebuild:Migrating from Hive to PySpark
================================================================

Machine Learning (ML) engineering and software development are both fundamentally about writing correct and robust algorithms. In ML engineering we have the extra difficulty of ensuring mathematical correctness and avoiding propagation of round-off errors in the calculations when working with floating-point representations of a number.

https://medium.com/@trivagotech/teardown-rebuild-migrating-from-hive-to-pyspark-324176a7ce5
Getting Started with Apache Pulsar and Data Collector
========================================================

https://streamsets.com/blog/getting-started-apache-pulsar-streamsets-data-collector/
Apache Kafka Security | Need and Components of Kafka
======================

There are a number of features added in the Kafka community, in release 0.9.0.0. There is a flexibility for their usage also, like either separately or together, that also enhances security in a Kafka cluster.

https://medium.com/@rinu.gour123/apache-kafka-security-need-and-components-of-kafka-52b417d3ca77