Introduction to Unsupervised Learning

Things to Learn

  1. Definitions

    • Unsupervised Learning

    • Kmeans. What is k in K-means?

    • Hierarchial Clustering

    • Aggolomeration Clustering

    • DBScan

  2. Dimensionality Reduction( or Algorithms ) + Benefits

  3. Problems

    • What is Sentiment Analysis?

    • Can Decision trees be used for clustering?

  4. Specifics about algorithm

    • Data Cleaning and Missing Value Treatment

      • Most appropriate strategy for Data Cleaning before applying K-means

    • Number of Features (to use)

    • Initialization before Kmeans

    • Termination Condition

    • Optimizing Number of Clusters

      • Elbow Method

      • Manhattand Distance

      • GAP Analysis

      • Silhouette Analysis

    • Convergence

      • Local vs Global Minima (Kmeans)

    • Sensitivity to outliers

    • When will the algorithm fail?

    • Metrics to be used

    • Why feature scaling is necessary for K-means

  5. Visualization

    • Dendogram

    • Aggolomeration

  6. Soft Assignment

    • Fuzzy Kmeans

    • Gausian Mixture Models

    • Multinomial Mixture Models + Expectation Maximization

  7. Distances

    • Statistical Distances

  8. Problem Domains and Examples

    • Google News

    • Gene : How much a gene is present?

    • Organize Computer Cluster

    • Social Network Analysis

    • Market Segmentation

    • Astronomical Data Analysis

    • Cocktail Party problem