Monthly archives: May, 2020

Recommendation System

Humans need recommendation since childhood, he needs to know what is good for him, like where and how he spends his holidays, which food is good for him, what subjects are best for him, what are best colleges, etc.. In terms of implementation we can divide it into two categories: Memory-based: Use Statistical tool like …

Clustering

There are billions of stars in the galaxy and we are in the way of finding new constellations but how can we find them as there are no labels, well clustering is the solution. In order to solve unsupervised problems in machine learning, we use clustering algorithms. We can classify clustering as : Partitioned-based clustering …

PCA Principal Component Analysis

Principal component analysis (PCA) is used as a dimensionality reduction technique. When we have a lot of dimension in data it’s difficult to find the dimensions which are responsible for the results. Let us consider a scenario in which you have an army of 10,000 soldiers. How can we determine that the army will win …

Spark Basics

Apache Spark is an open-source distributed general-purpose cluster-computing framework. Apache Spark is a lightning-fast unified analytics engine for big data and machine learning. It was originally developed at UC Berkeley in 2009. When a cluster, or group of machines, pools the resources of many machines together allowing us to use all the cumulative resources as …