logo

Code First NLP

  • Welcome to Code First NLP

Getting Started

  • Introduction
  • Lay of Land - Spook Author Identification

FastAI

  • Lecture 1: Introduction to FastAI
  • Lecture2: Evidence and p-value

Statistics

  • Ipympl
  • Understanding Effect Size from Differences in Mean
  • Sampling
  • Hypothesis Testing

Bayes

  • Cookie Problem
  • Dice & German Tank Problem
  • EuroCoinProblem
  • Multi Arm Bandit Problem

Unsupervised Learning

  • Introduction to Unsupervised Learning
  • Facies Labelling
  • Introduction to Basic Customer Segmentation in Python
  • Cohort Analysis using Online Retail Dataset from UCI

References

  • Hacks
  • Changelog
  • nlphero
Powered by Jupyter Book
Contents
  • Imports
  • Basics Pmf
  • Cookie Problem

Cookie Problem¶

Imports¶

import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
import seaborn as sns
import empiricaldist
from empiricaldist import Pmf, Distribution
Copy to clipboard

Basics Pmf¶

d6 = Pmf(); d6
Copy to clipboard
probs
for i in range(6):
#     print(i+1)
    d6[i+1] = 1
    
d6
Copy to clipboard
probs
1 1
2 1
3 1
4 1
5 1
6 1
# Pmf??
Copy to clipboard
# Distribution??
Copy to clipboard
d6.normalize(); d6
Copy to clipboard
probs
1 0.166667
2 0.166667
3 0.166667
4 0.166667
5 0.166667
6 0.166667
# d6
Copy to clipboard
d6.mean()
Copy to clipboard
3.5
Copy to clipboard
d6.choice(size=10)
Copy to clipboard
array([5, 5, 5, 1, 6, 2, 6, 1, 6, 2])
Copy to clipboard
def decorate_dice(title):
    """Labels the axes
    title: string
    """
    
    plt.xlabel('Outcome')
    plt.ylabel('PMF')
    plt.title(title)
Copy to clipboard
# d6.bar(xlabel='Outcome')
d6.bar()
decorate_dice('One die')
Copy to clipboard
../_images/CookieProblem_13_0.png
twice = d6.add_dist(d6)
twice
Copy to clipboard
probs
2 0.027778
3 0.055556
4 0.083333
5 0.111111
6 0.138889
7 0.166667
8 0.138889
9 0.111111
10 0.083333
11 0.055556
12 0.027778
twice.bar()
decorate_dice('Two Dice')
Copy to clipboard
../_images/CookieProblem_15_0.png
d6.add_dist??
Copy to clipboard
d6.ps, d6.qs
Copy to clipboard
(array([0.16666667, 0.16666667, 0.16666667, 0.16666667, 0.16666667,
        0.16666667]),
 array([1, 2, 3, 4, 5, 6]))
Copy to clipboard
d6
Copy to clipboard
probs
1 0.166667
2 0.166667
3 0.166667
4 0.166667
5 0.166667
6 0.166667
twice.mean()
Copy to clipboard
7.000000000000002
Copy to clipboard
twice[twice.qs >3].mean()
Copy to clipboard
0.10185185185185187
Copy to clipboard
twice[twice.qs >3].plot.bar()
Copy to clipboard
<AxesSubplot:>
Copy to clipboard
../_images/CookieProblem_21_1.png
twice[twice.qs >3].mean()
Copy to clipboard
0.10185185185185187
Copy to clipboard
twice[1] = 0 
twice[2] = 0
twice.normalize()
twice.mean()
Copy to clipboard
7.142857142857141
Copy to clipboard
twice.bar()
decorate_dice('Two dice, greater than 3')
Copy to clipboard
../_images/CookieProblem_24_0.png
  • Pmf => Prior probability

  • Likelihood => Multiply each prior probability by the likelihood of data

  • Normalize => Add all up and divide by total

Cookie Problem¶

cookie = Pmf.from_seq(['B1', 'B2']); priors
Copy to clipboard
probs
B1 0.5
B2 0.5
cookie['B1']*=0.75
cookie['B2']*=0.5
cookie
Copy to clipboard
probs
B1 0.375
B2 0.250
cookie.normalize()
Copy to clipboard
0.625
Copy to clipboard
cookie
Copy to clipboard
probs
B1 0.6
B2 0.4
cookie['B1']*=0.25
cookie['B2']*=0.5

cookie.normalize()
Copy to clipboard
0.35
Copy to clipboard
cookie
Copy to clipboard
probs
B1 0.428571
B2 0.571429
cookie2 = Pmf.from_seq(["B1", "B2"])
cookie2['B1']*= (0.75*0.25)
cookie2['B2']*=(0.5*0.5)
cookie2.normalize()
Copy to clipboard
0.21875
Copy to clipboard
cookie2
Copy to clipboard
probs
B1 0.428571
B2 0.571429
# cookie['B1B1']*=0.75
# cookie['B1B2']*=0.
Copy to clipboard
d6.normalize??
Copy to clipboard
Hypothesis Testing Dice & German Tank Problem

By Rahul Saraf
© Copyright 2020.