Category «Big Data»

AWS Products

Product Name Category ServiceName Ref link Description Version Pricing Pricing link Apache MXNet on AWS Machine Learning https://aws.amazon.com/api-gateway a lean, flexible, and ultra-scalable deep learning framework that supports state of the art in deep learning models, including convolutional neural networks (CNNs) and long short-term memory networks (LSTMs) API Gateway Network and Content Delivery https://aws.amazon.com/api-gateway enables …

Apache Spark- Performance Tuning

Apache Spark is a compute engine and it’s very important to use this engine in efficient ways. Before moving forward let us discuss few basic terms used in performance. Spark performance can be improved at the job level and another at the spark-SQL level. Spark job optimizations We can optimize the spark jobs by following …

Apache Hadoop

Apache Hadoop is the backbone of all hadoop based environments. In its latest release it has following projects: Hadoop Common: The common utilities that support the other Hadoop modules. Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data. Hadoop YARN: A framework for job scheduling and cluster resource …

Cloudera-Hotonworks–> Versions

CDH is Cloudera’s 100% open-source platform distribution, including Apache Hadoop, and built specifically to meet enterprise demands.  Cloudera Manager is available in the following releases:Cloudera Manager 5.16.2 is the current release of Cloudera Manager 5.16. Cloudera Manager 5.15.2. 5.14.4, 5.13.3, 5.12.2, 5.11.2, 5.10.2, 5.9.3, 5.8.5, 5.7.6, 5.6.1, 5.5.6, 5.4.10, 5.3.10, 5.2.7, 5.1.6, and 5.0.7 are previous stable releases of Cloudera Manager 5.14, 5.13, 5.12, 5.11, 5.10, 5.9, 5.8, 5.7, 5.6, 5.5, 5.4, 5.3, 5.2, 5.1, …

Spark Basics

Apache Spark is an open-source distributed general-purpose cluster-computing framework. Apache Spark is a lightning-fast unified analytics engine for big data and machine learning. It was originally developed at UC Berkeley in 2009. When a cluster, or group of machines, pools the resources of many machines together allowing us to use all the cumulative resources as …