Upon completion of the week 4 lesson, you will be able to:
This week reading assignment is the course textbook chapter 4 (EMC Education Service (Eds). (2015) Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing, and Presenting Data, Indianapolis, IN: John Wiley & Sons, Inc).
Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group than those in other groups. In simple words, the aim is to segregate groups with similar traits and assign them into clusters.
Let’s understand this with an example. Suppose, you are the head of a rental store and wish to understand preferences of your costumers to scale up your business. Is it possible for you to look at details of each costumer and devise a unique business strategy for each one of them? Definitely not. But, what you can do is to cluster all of your costumers into say 10 groups based on their purchasing habits and use a separate strategy for costumers in each of these 10 groups. And this is what we call clustering.
Read Chapter 4 on Clustering Methods in Textbook:
EMC Education Service (Eds). (2015) Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing, and Presenting Data, Indianapolis, IN: John Wiley & Sons, Inc.
Also I recommend reading up on the Clustering methods from the following book:
An Introduction to Statistical Learning with Applications in R, Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani,
http://www-bcf.usc.edu/~gareth/ISL/index.html