Tag «spark»

Java and Spark Memory Management

Memory usage in Spark largely falls under one of two categories: execution and storage. As spark jobs are run inside JVM, so it’s important to understand JVM memory management first. Java Memory Management In Java architecture there are three basic components: 1-Java Development Kit– JDK is a software development environment used for java applications. It …

Apache Hadoop

Apache Hadoop is the backbone of all hadoop based environments. In its latest release it has following projects: Hadoop Common: The common utilities that support the other Hadoop modules. Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data. Hadoop YARN: A framework for job scheduling and cluster resource …