ML/AI: Probability and statistics-II. Discrete distributions

Binomial mass distribution
For a discrete random variable X define the probability mass function

    \[p(a)=\wp(X=a)\]

If X can be only one of \{x_1,x_2,\cdots, x_n\} then p(x_i)\ge 0 for i=1,2,\cdots n and p(x)=0 if x\ne x_i. We also must have \sum_{i=1}^n p(x_i)=1.
A single Bernoulli random variable is one for which p(1)=\wp (success) and p(0)=1-\wp (failure).
We construct the generating function (in statistical physics this is the partition function)for multiple trials of a single Bernoulli process (such as shooting arrows at a bullseye) using x as a simple place-holder or enumerator x

    \[G_1(x)=((1-p)+xp), \qquad G(1)=1\]

and

    \[{dG_1(1)\over dx}=p=\wp(success)\]

Then since all trials are independent, all probabilities are multiplicative and

    \[G_n(x)=\Big((1-p)+xp\Big)^n=\sum_{m=0}^n x^m \wp(m\, \,  success)\]

    \[=\sum_m {n\choose m} p^m(1-p)^{n-m} x^m\]

This should look very familiar; this is the binomial distribution. Note that in a generating function the coefficient of x^m the the probability of event characterized by m.

Bernoulli-binomial random variable. The Bernoulli random variable i (a non-negative integer) is also called the binomial. Calling probability mass function p(b)=\wp\{X=b\}, and F(a)=\sum_{b\le a} p(b)

    \[p(i)={n\choose i} \, p^i \, (1-p)^{n-i}\]

gives the probability of i successes and n-i failures in n independent trials with individual success probability p. Note that (if p<1-p)

    \[\sum_{i=0}^n p(i)=\sum_{i=0}^n {n\choose i} \, p^i \, (1-p)^{n-i}=(1-p)^n\sum_{i=0}^n {n\choose i} \, \Big({p\over 1-p}\Big)^i\]

    \[=(1-p)^n\Big(1-{p\over 1-p}\Big)^n=1\]

Example. An archer has a probability of making a bull’s eye of 1/3 with each shot. What is her average score in a set of six shots?
The probability of m bull’s eyes in the set is

    \[\wp(m)= {6\choose m} (1/3)^m(2/3)^{6-m}\]

so her average will be

    \[\bar{m}=\sum_{m=0}^6 m \cdot \wp(m)= \sum_{m=0}^6 m \cdot{6\choose m} (1/3)^m(2/3)^{6-m}=2\]

You can see that performing the sum \bar{m}=\sum_m m\cdot\wp(m) could be mathematically difficult if \wp(m) is complicated.
Example The probability that a given atom out of N trapped in volume V will be in subvolume v is p(1)=\wp=v/V, p(0)=1-\wp=1-v/V. This was part of a qualifier problem.

Geometric random variable Consider independent trials, each with a success probability \wp(1)=p, are performed until success occurs. X is the number of trials required until success,

    \[\wp(X=n)=p(1-p)^{n-1}, \quad n=1,2,\cdots\]

Note that

    \[\wp(X\ge n)=p\sum_{m=n}^\infty (1-p)^{m-1}=p\sum_{q=0}^\infty (1-p)^{n-1+q}\]

    \[=p(1-p)^{n-1}\sum_{q=0}^\infty (1-p)^q=(1-p)^{n-1}\]

X is a geometric random variable. A distribution is memory-less if \wp(X\ge n+m|X>m)=\wp(X\ge n). The only distribution taking values in the positive integers that is memory-less is the geometric, since

    \[\wp(X\ge n+m|X>m)={\wp(X\ge n+m)\over \wp(X>m)}=\wp(X\ge n)\]

    \[ \wp(X\ge n+m)= \wp(X\ge n)\wp(X>m)\]

and clearly \wp(X\ge n)=(1-p)^{n-1} satisfies this.

Poisson random variable The Poisson variable i (a non-negative integer) is

    \[p(i)={\lambda^i\over i!} \, e^{-\lambda}\]

This is the limit of n\rightarrow\infty and p\rightarrow 0, np=\lambda of the Bernoulli variable.
Consider the simplest experiment that can be conceived of; within a vanishingly small time interval dt, a process either occurs (such as a nuclear decay), with probability p, or does not, with probability (1-p). The probability \wp(M) that within a macroscopic time t=N \, dt, that M such events occur, is generated by

    \[(pt +(1-p))^N =\phi(t)=\sum_{M=0}^N {N\choose M} p^M (1-p)^{N-M} \, t^M\]

    \[=\sum_{M=0}^N \wp(M) \, t^M\]

resulting in the binomial distribution

    \[\wp(M)={N\choose M} p^M (1-p)^{N-M}\]

Notice that we say \wp(M) rather than {d\wp(M)\over dM} for a discrete distribution.
We call \phi(t)=(pt +(1-p))^N the generating function of the distribution, which is very useful in obtaining moments.
The expectation or mean of this distribution is

    \[\mu=\sum_{M=0}^N M \, \wp(M) =\Big(\sum_{M=0}^N \wp(M) \, t^M\Big)_{t=1}\]

    \[=\Big(t{d\over dt}\sum_{M=0}^N \wp(M) \, t^M\Big)_{t=1}\]

    \[=t{d\over dt}\phi(t)\Big{|}_{t=1}=t  \, p \, N (pt +(1-p))^{N-1}=p \, N\]

This PDF has two important limits.
\bullet First if we consider the case of p\rightarrow 0 but with N\rightarrow\infty such that pN=\mu remains constant, we get

    \[\lim_{p\rightarrow 0}\lim_{N\rightarrow\infty}(pt +(1-p))^N =\lim_{N\rightarrow\infty}(1+(t-1){\mu\over N})^N=e^{\mu t} e^{-\mu}\]

and then

    \[\sum_{M=0}^\infty \wp(M) \, t^M=e^{\mu t} e^{-\mu}=\sum_{M=0}^\infty {\mu^M\over M!} e^{-\mu} \, t^M\]

results in the Poisson distribution

    \[\wp(M)={\mu^M\over M!} e^{-\mu}=P(\mu, M)\]

with parameter \mu.
Example The probability that an ISP gets a call from a subscriber in any given hour is 0.01. The ISP has 300 subscribers. What is the probability that the ISP will get 4 calls in the next hour?
First we see that the most probable number of calls is

    \[\mu=(300)(0.01)=Np=3\]

in an hour. Then

    \[\wp(4)={3^4\over 4}e^{-3}=0.168\]

\bullet The second important limit of the binomial theorem is used when we want to explore the region around the maximum value \mu=np, which is the most probable value. We will expand the natural log of the distribution around \mu=np by treating the discrete variable M as if it were continuous, calling it \xi for consistency with our previous examples;

    \[\ln \wp(\xi)=\ln \wp(\mu)+{d \ln \wp(\xi)\over d\xi}\Big{|}_{\xi=\mu}(\xi-np)\]

    \[+{1\over 2}{d^2 \ln \wp(\xi)\over d \xi^2}\Big{|}_{\xi=\mu}(\xi-np)^2+\cdots\]

however the second term vanishes since \mu=np is an extreme point of the distribution. Take the logarithm of the probability

    \[\ln \wp(\xi)=\ln n! -\ln \xi! -\ln(n-\xi)! +\xi\ln p +(n-\xi)\ln(1-p)\]

To evaluate the derivatives we use Stirling’s approximation

    \[{d \ln \xi!\over d\xi}={d\over d\xi} (-\xi+\xi\ln \xi+\ln\sqrt{2\pi})=\ln \xi, \qquad  {d^2 \ln \xi!\over d\xi^2}={1\over \xi}\]

    \[ {d^2 \ln \xi!\over d\xi^2}\Big{|}_{\xi=np}={1\over np}\]

and so

    \[\ln \wp(\xi)\approx \ln \wp(\mu) +{1\over 2}\Big(-{1\over \xi}-{1\over n-\xi}\Big)\Big{|}_{\xi=np}(\xi-np)^2+\cdots, \]

    \[ \wp(\xi)\approx \wp(\mu)  \, e^{-{(\xi-np)^2\over 2np(1-p)}}\]

which we can normalize via

    \[1=\int_{-\infty}^{\infty} \wp(\xi)  \, d\xi, \qquad  \wp(\xi)={1\over \sqrt{2\pi np(1-p)}} \, e^{-{(\xi-np)^2\over 2np(1-p)}}\]

which we recognize as the Bell curve (the Normal distribution), peaked around x=\mu=np with standard deviation \sigma=\sqrt{np(1-p)}. Notice that the Maxwell-Boltzmann distribution is a normal distribution in velocity.
Fun fact: if p<<1, then \sigma=\sqrt{np(1-p)}\approx \sqrt{np}=\sqrt{\mu}.
Example (old format qualifier problem) A researcher performs counting of nuclear decays from a sample, making ten counts for ten seconds each, getting

    \[\{2,\quad 3, \quad 2,\quad ,4, \quad 1,\quad 2, \quad 5, \quad 3, \quad 1, \quad 2\}\]

counts. For how many seconds should she count in order to measure the decay rate accurate to 5\%?

Nuclear decays are Poissonian random deviates. We want to have {\sigma\over \mu}={\sqrt{\mu}\over \mu}={1\over \sqrt{\mu}}=0.05, so we should count long enough so that the average number of counts gotten is {1\over (0.05)^2}=400. We are getting {2.5 \, counts \over 10 \, s} now, so we should count for {2.5\over 10}={400\over T}, T=1600 \, s.

Example (Gibbs distribution in Physics 415/715).
Let the universe U consist of a system s and a reservoir r, so U=s\bigcup r. The reservoir r has N_r boxes that can each hold 0 or 1 particle, and the system s has N_s such boxes. How many states does a universe containing N<N_s+N_r particles have?
The number of ways that N objects can be put into N_s+N_r locations is

    \[\Omega(N)={N_s+N_r\choose N}={(N_s+N_r)!\over N!(N_s+N_r-N)!}\]

which is the coefficient of t^N in

    \[(1+t)^{N_r}(1+t)^{N_s}=(1+t)^{N_r+N_s}=\sum_{m=0}^{N_s+N_r}{N_s+N_r\choose m}t^m\]

Carefully expand the left side;

    \[(1+t)^{N_r}(1+t)^{N_s}=\Big(\sum_{p=0}^{N_r}{N_r\choose p}t^p\Big)\Big(\sum_{q=0}^{N_s}{N_s\choose q}t^q\Big)\]

    \[=\sum_{p,q}{N_r\choose p}{N_s\choose q}t^{p+q}\]

By comparing these two expressions and picking out the coefficient of t^N on each side we obtain

    \[\Omega(N)={(N_s+N_r)!\over N!(N_s+N_r-N)!}=\sum_{p+q=N}{N_r\choose p}{N_s\choose q}\]

    \[=\sum_{q=0^N}{N_r\choose N-q}{N_s\choose q}=\sum_{q=0}^N \Omega_r(N-q)\Omega_s(q)\]

where \Omega_s(q) is the number of states of the system s when populated with q particles and \Omega_r(N-q) is the number of states of the reservoir r when populated with N-q particles. These are mass functions since N is a discrete variable, we would use {d\Omega\over dN} if N were continuous.

What is the probability that s contains exactly m particles?
This will be the ratio of the number of states with m particles in s and N-m in r to the total number of states

    \[\wp(m)={{N_r\choose N-m}{N_s\choose m}\over {N_s+N_r\choose N}}\]

The Boltzmann entropy of the system containing m particles is S=k\ln\wp(m), and {\bf Boltzmann’s maximal entropy principle} says that the equilibrium population m is the most probable. Use Stirling’s formula to evaluate this: maximize the entropy with respect to m

    \[S=-k_B\ln \wp, \qquad 0=\ln\Big({N_s-m\over m}\Big)-\ln\Big({N_r-N+m\over N-m}\Big)\]

taking anti-logs

    \[{N_s\over m}-1={N_r\over N-m}-1, \qquad {m\over N_s}=\rho_s={N-m\over N_r}=\rho_r\]

we obtain the result that the most likely distribution of particles between system s and reservoir r is the one that makes their {\bf densities} \rho_r and \rho_s equal! Define the {\bf chemical potential}

    \[-\mu=T\Big({\partial S\over \partial m}\Big)\]

which represents the work needed to move a particle from r to s when r and s are in thermal and matter equilibrium, and r is at temperature T. Then for s

    \[-{\mu_s\over T}=k \ln\Big({N_s-m\over m}\Big)=k\ln\Big({1\over \rho_s}-1\Big)\]

    \[\rho_s={1\over e^{-\mu_s/kT}+1}\]

This is an interesting and exact result. Notice that the equilibrium condition says equilibrium between r and s occurs when

    \[\mu_r=\mu_s\]

then there is no energy cost to move particles from r to s. Another observation is that if \mu_s<0, then the density of s (the number of particles in s) is low. Under those conditions matter would migrate from r into s until \rho_r=\rho_s, matter migrates from high to low \mu.

Stirling’s approximation
The gamma function is an analytic continuation of the factorial to non-integer values

    \[\Gamma(s)=\int_0^\infty e^{-x} \, x^{s-1} \, dx\]

If s=n>0 an integer, then \Gamma(n)=(n-1)!, which is easy to prove by integration by parts or by induction.
Recall

    \[I=\int_{-\infty}^\infty e^{-a x^2} \, dx=\sqrt{{\pi\over a}},\]

    \[ \Gamma(s)=\int_0^\infty e^{-x+(s-1) \, \ln x} \, dx=\int_0^\infty e^{-f(x)} \, dx\]

find the x-value x_* that makes f(x) minimal, it contributes the most to the integral, and expand f around it

    \[f(x)=f(x_*)+(x-x_*) \, f'(x_*)+{1\over 2} (x-x_*)^2 \, f''(x_*)+\cdots\]

    \[=f(x_*)+{1\over 2} (x-x_*)^2 \, f''(x_*)+\cdots\]

put this back into \Gamma(s), and use the integral I to obtain an approximate formula for \Gamma(s)

    \[\Gamma(s)=(s-1)!\approx \sqrt{2\pi} \, s^{s-{1\over 2}} \, e^{-s}\]

when s is very large. Finally you will need to use

    \[\int_0^\infty G(x-x_*) \, dx=\int_{-x_*}^\infty G(y) \, dy\approx \int_{-\infty}^\infty G(y) \, dy\]

for x_* very large.

Cumulants
Let \{X\} denote an event X. Cumulative distribution function

    \[F(b)=\wp\{X\le b\}\]

is non-decreasing, approaches 0 for b\rightarrow -\infty and 1 for b\rightarrow \infty and is right-continuous

    \[\lim_{\delta\rightarrow 0}F(b+\delta)=F(b), \qquad \delta \ge 0\]

Note that

    \[\{X\le b\}=\{X\le a\}\cup\{a<X\le b\}\implies \wp\{X\le b\}\]

    \[=\wp\{X\le a\}+\wp\{a<X\le b\}\]

or

    \[\wp\{a<X\le b\}=F(b)-F(a)\]

and calculation that X<b is a simple limit

    \[\wp\{X<b\}=\lim_{n\rightarrow \infty}\wp\{-\infty<X\le b-{1\over n}\}\]

    \[=\lim_{n\rightarrow \infty}\Big(F(b-{1\over n})-F(-\infty)\Big)=\lim_{n\rightarrow \infty}F(b-{1\over n})\]

Example (Problem 4.1)
Two balls are randomly chosen from an urn containing 8 white, 4 black and 2 orange balls. You win 1 dollar for each black, and lose 1 dollar for each white selected. Let X denote your winnings. What are the possible values of X, and find the mass function for the distribution of this variable.
Probability of 2 white (-2): \wp(2w)=\Big({8\over 14}\Big)\Big({7\over 14}\Big)={28\over 91}, probability of 2 black (4): \wp(2b)=\Big({4\over 14}\Big)\Big({3\over 14}\Big)={6\over 91}, probability of 1 white no black (-1): \wp(1w0b)={2\choose 1}\Big({8\over 14}\Big)\Big({2\over 14}\Big)={16\over 91}, probability of 1 black no white (2): \wp(0w1b)={2\choose 1}\Big({4\over 14}\Big)\Big({2\over 14}\Big)={8\over 91}, probability of no white no black (0): \wp(0w0b)=\Big({2\over 14}\Big)\Big({1\over 14}\Big)={1\over 91}, probability of 1 white 1 black (1): \wp(1w1b)={2\choose 1}\Big({8\over 14}\Big)\Big({4\over 14}\Big)={32\over 91}.
Example
Two fair dice are rolled. Let X equal the product of the values. Compute \wp(X=i) for each i.
Rendered by QuickLaTeX.com

Example
Suppose that the number of accidents occurring on a highway each day is a Poisson random variable with parameter \lambda=3.
A. Find the probability of 3 or more accidents per day.
B. Find the probability of 3 or more accidents per day provided at least one accident has occurred.

    \[\wp(3)={3^3\over 3!} e^{-3}, \quad \wp\{X\ge 3\}=1-\wp\{X\le 2\}=1-\Big({3^0\over 0!}e^{-3}+{3^1\over 1!}e^{-3}+{3^2\over 2!}e^{-3}\Big)\]

    \[1-{17\over 2}e^{-3}\]

Let E be the even of 3 or more accidents, and F the event of 1 or more accidents

    \[\wp(E)=1-{17\over 2}e^{-3}=\wp(EF), \qquad \wp(F)=1-\wp(0)=1-e^{-3}\]

    \[\wp(E|F)={\wp(EF)\over \wp(F)}={1-{17\over 2}e^{-3}\over 1-e^{-3}}\]

Example
If you buy a lottery ticket in 50 lotteries, in each of which your chances of winning a prize is {1\over 100}, what is the probability that you will win a prize at least once, exactly once, at east twice?
This is the limit of a binomial (Bernoulli) random number; many trials each with small probability

    \[\wp(i \, wins)={50\choose i}\Big({1\over 100}\Big)^i\Big(1-{1\over 100}\Big)^{50-i}\]

    \[\rightarrow {\lambda^i\over i!}e^{-\lambda}, \qquad \lambda=(50)({1\over 100})={1\over 2}\]

    \[\wp(i\ge 1)=1-\wp(0)=1-{(1/2)^0\over 0!} e^{-1/2}=1-e^{-1/2}\]

    \[\wp(1)={(1/2)^1\over 1!} e^{-1/2}={e^{-1/2}\over 2}\]

    \[\wp(i\ge 2)-1-\wp(0)-\wp(1)=1-{e^{-1/2}\over 2!}-{(1/2)^0\over 0} e^{-1/2}=1-{3\over 2}e^{-1/2}\]

Example
From the generic grand canonical partition function for identical particles with single-particle partition functions Z determine the distribution function for the number of particles n in the system when in equilibrium with a heat and particle reservoir. What is \langle n\rangle, the average value?

    \[\mathcal{Z}=\sum_n {Z^n\over n!} e^{\beta n\mu}=e^{Z e^{\beta\mu}}, \qquad \bar{n}=-{\partial \Omega\over \partial \mu}= Z e^{\beta\mu}\]

    \[1=\sum_{n}\wp(n)=\sum_n \Big({(ze^{\beta\mu})^n\over n!}e^{-Z e^{\beta\mu}}\Big)=\sum_n {\bar{n}^n\over n!} e^{-\bar{n}}\]

    \[\mbox{so}\quad \wp(n)={\bar{n}^n\over n!} e^{-\bar{n}}, \quad \mbox{a Poisson distribution}\]

This is a common qualifier problem these days.

Home 2.0
error: Content is protected !!