Google Cloud Data Engineering

GCP is broadly divided into the following categories Networking Compute Storage Bigdata and ML Products Compute The GCP offers the following compute services. Compute Engine Kubernetes Engine App Engine Cloud Functions Anthos Cloud Run Compute Engine is Google’s infrastructure-as-a-service (IaaS) offering. It is also the building block for other services that run on top of this compute resource. …

Java and Spark Memory Management

Memory usage in Spark largely falls under one of two categories: execution and storage. As spark jobs are run inside JVM, so it’s important to understand JVM memory management first. Java Memory Management In Java architecture there are three basic components: 1-Java Development Kit– JDK is a software development environment used for java applications. It …

Apache Spark- Performance Tuning

Apache Spark is a compute engine and it’s very important to use this engine in efficient ways. Before moving forward let us discuss few basic terms used in performance. Spark performance can be improved at the job level and another at the spark-SQL level. Spark job optimizations We can optimize the spark jobs by following …

Apache Hadoop

Apache Hadoop is the backbone of all hadoop based environments. In its latest release it has following projects: Hadoop Common: The common utilities that support the other Hadoop modules. Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data. Hadoop YARN: A framework for job scheduling and cluster resource …

Cloudera-Hotonworks–> Versions

CDH is Cloudera’s 100% open-source platform distribution, including Apache Hadoop, and built specifically to meet enterprise demands.  Cloudera Manager is available in the following releases:Cloudera Manager 5.16.2 is the current release of Cloudera Manager 5.16. Cloudera Manager 5.15.2. 5.14.4, 5.13.3, 5.12.2, 5.11.2, 5.10.2, 5.9.3, 5.8.5, 5.7.6, 5.6.1, 5.5.6, 5.4.10, 5.3.10, 5.2.7, 5.1.6, and 5.0.7 are previous stable releases of Cloudera Manager 5.14, 5.13, 5.12, 5.11, 5.10, 5.9, 5.8, 5.7, 5.6, 5.5, 5.4, 5.3, 5.2, 5.1, …