ML/AI: Probability and statistics-III. Continuous random variables

Let X be a random variable with the property that there exists a non-negative function f(x) defined on [-\infty, \infty] such that

    \[\wp(X\in B)=\int_B f(x)\, dx\]

then X is a continuous random variable and f(x) is its probability distribution function or PDF (f(x)={d\wp(x)\over dx}). An example is |\psi(x)|^2 is the probability that a quantum x measurement returns a value between x and x+dx.
We have some simple properties

    \[\wp(a\le X\le b)=\int_a^b f(x)\, dx, \qquad \wp(X=a)=\int_a^a f(x)\, dx=0\]

and

    \[\wp(X\le a)=\int_{-\infty}^a f(x)\, dx\]

The cumulant is a very useful construction
The cumulant is a very useful construction

    \[F(a)=\wp(-\infty<X<a)=\int_{-\infty}^a f(x)\, dx\]

Uniform random variable

    \[f(x)=\left\{\begin{array}{ll} 1 & 0<x<1\\ 0 & \mbox{otherwise}\end{array}\right., \qquad \wp(a\le X\le b)=\int_a^b f(x)\, dx=b-a\]

Microcanonical distribution
In statistical mechanics the microcanonical distribution applies to a system of constant energy and assumes that all states, which are specification of coordinate q and conjugate momentum p for which H(q,p)=E are equally likely, so f(E)={1\over \Omega(E)}. For a harmonic oscillator H={p^2\over 2m}+{m\omega^2\over 2}q^2 and the number of states of energy between E and E+dE is

    \[{d\Omega\over dE}=\int \delta(E-{p^2\over 2m}+{m\omega^2\over 2}q^2)\, {dp dq\over h}\]

Restrict attention to x\ge 0 and p\ge 0, then for fixed E there is only a single independent degree of freedom, either x or p, which is not uniformly distributed. Using

    \[\int \, f(x)\, \delta(ax)\, dx=\int \, f(x)\, \delta(x)\, {dx\over a}\]

    \[ \int \, f(x)\, \delta(x^2-y^2)\, dx=\int \, f(x)\, \delta(x-|y|)\, {dx\over 2|y|}\]

the number of states accessible per unit energy is

    \[{d\Omega\over dE}=\int \delta(E-{p^2\over 2m}+{m\omega^2\over 2}q^2)\, {dp dq\over h}\]

    \[={2m\over h}\int dp dq \delta(p^2-(2mE-m^2\omega^2q^2))\]

    \[={m\over h}\int dp dq {\delta(p-\sqrt{(2mE-m^2\omega^2q^2)})\over \sqrt{(2mE-m^2\omega^2q^2)}}\]

    \[={m\over h}\int {dq\over \sqrt{(2mE-m^2\omega^2q^2)}}={\pi\over 2h\omega}\]

For the full phase space we have \Omega(E)={4\pi E\over 2h\omega}={E\over \hbar\omega} which increases with energy (what we call a normal system).
If we sample the q-state of an oscillator we are most likely to get a q value that the oscillator spends most of its time in, E={m\over 2}\dot{q}^2+{m\omega^2\over 2} q^2

    \[f_q(q)\, dq={d\wp\over dq}\, dq={dt\over T}={1\over T}{dq\over |\dot{q}|}={m\over T}{dq\over \sqrt{2mE-m^2\omega^2q^2}}\]

    \[={2\pi m\over \omega}{dq\over \sqrt{2mE-m^2\omega^2q^2}}\]

Normal random variable

    \[f(x)={1\over \sqrt{2\pi}\, \sigma}e^{-(x-\mu)^2/2\sigma^22}\]

One of its most important properties is that if X is normally distributed with parameters \mu, \sigma^2, then Y=aX+b is normally distributed with parameters a\mu+b, a^2\sigma^2. This illustrates the use of the cumulant

    \[F_Y(c)=\wp(Y\le c)=\wp(X\le {(c-b)\over a})=F_X({(c-b)\over a})\]

    \[=\int_{-\infty}^{(c-b)/a}{1\over \sqrt{2\pi}\, \sigma}e^{-(x-\mu)^2/2\sigma^22}\]

    \[=\int_{-\infty}^c {1\over \sqrt{2\pi}\, a\sigma}e^{-(y-(a\mu-b))^2/2a^2\sigma^22}\, dy\]

Exponential and Gamma distributions
The exponential distribution has a parameter \lambda>0, and pdf and cumulant

    \[f(x)=\left\{\begin{array}{ll} \lambda e^{-\lambda x} & x\ge 0\\ 0 & x<0\end{array}\right.\]

    \[ F(a)=\wp(X\le a)=\int_0^a \lambda e^{-\lambda x}\, dx=1-e^{-\lambda a}\]

The exponential distribution is memoryless, if X represents a “waiting time”, until an event occurs, there is no dependence on how much time has already elapsed (i.e. \wp(X>a+b|X>a)=\wp(X>b)). The only continuous memory-less distributions are exponential.
The Gamma distribution also has parameters \lambda>0, t>0,

    \[f(x)=\left\{\begin{array}{ll} {\lambda\over \Gamma(t)} (\lambda x)^{t-1}\, e^{-\lambda x} & x\ge 0\\ 0 & x<0\end{array}\right.\]

An example is the canonical distribution for a normal system of N particles, one whose entropy increases with energy as a power of the energy

    \[f(E)={1\over Z} e^{S(E)/k_B}e^{-E/k_BT}={C\over Z} E^{{3N\over 2}-1} e^{-E/k_BT}\]

\lambda=1/k_BT, t={3N\over 2}

Normal random variable cumulant
Begin with the pdf (probability distribution)

    \[f(x)={1\over \sqrt{2\pi\sigma^2}} \, e^{-{(x-\mu)^2\over 2\sigma^2}}\]

for which (let \xi=(x-\mu)/\sigma, dx=\sigma \, d\xi)

    \[F(a)=\wp\{x\le a\}={1\over \sqrt{2\pi\sigma^2}} \int_{-\infty}^a \, e^{-{(x-\mu)^2\over 2\sigma^2}} \, dx\]

    \[={1\over \sqrt{2\pi}} \, \int_{-\infty}^{{a-\mu\over \sigma}} \, e^{-\xi^2/2} \, d\xi=\Phi({a-\mu\over \sigma})\equiv\Phi_{\mu,\sigma}(a)\]

erf(z) is called the error function

    \[erf(z)={2\over\sqrt{\pi}} \, \int_0^z e^{-\xi^2} \, d\xi\]

    \[ erfc(z)={2\over\sqrt{\pi}} \, \int_z^\infty e^{-\xi^2} \, d\xi=1-erf(z)\]

Note that

    \[\int_{-\infty}^\infty e^{-\xi^2} \, d\xi=2\int_{-\infty}^0 e^{-\xi^2} \, d\xi=\sqrt{\pi}\]

The relation between the two is

    \[\Phi(z)={1\over 2}\Big(1+erf(z/\sqrt{2})\Big), \qquad erf(z)=2\Phi(\sqrt{2} \, z)-1\]

    \[F(a)=\Phi({a-\mu\over \sigma})={1\over 2}\Big(1+erf({a-\mu\over \sqrt{2} \, \sigma})\Big)\]

The error function is included in the function libraries of most CAS systems such as REDUCE, Maple and Mathematica.
Example
X is normally distributed with \mu=3, \sigma=3. Compute \wp(2<X<5). We will get \Phi(2/3)-(1-\Phi(1/3)).

% In REDUCE...
on rounded;
load_package specfn;
% Check table on Ross p. 131
0.5*(1+erf(3.4/sqrt(2)));
%example 3a in Chapter 5 of Ross
0.5*(1+erf((5-3)/(3*sqrt(2))))-0.5*(1+erf((2-3)/(3*sqrt(2))));
% returns 0.3779

Cauchy distribution
The cumulative probability for the Cauchy distribution is simple

    \[F(a)={1\over \pi}\int_{-\infty}^a {dx\over 1+(x-\theta)^2} ={1\over \pi}\int_{-\infty}^{a-\theta} {d\xi\over 1+\xi^2}\]

    \[={1\over \pi}\Big(\tan^{-1}(a-\theta)-\tan^{-1}(-\infty)\Big)={1\over 2}+{1\over \pi} \tan^{-1}(a-\theta)\]

Beta distribution
X is beta distributed if

    \[f(x)={1\over \beta(a,b)} x^{a-1}(1-x)^{b-1}, \qquad 0<x<1\]

For the record, the Beta distribution comes up in the “q-model” of the distribution of loads in a column of beads. Note that

    \[\beta(a,b)=\int_0^1 x^{a-1}(1-x)^{b-1} \, dx={\Gamma(a)\Gamma(b)\over \Gamma(a+b)}\]

CAS systems also have the beta and gamma functions, but the integral for the cumulative probability is something that must be numerically computed

load_package specfn;
procedure betacumulative(a,b,xmax);
begin
dx:=xmax/1000;
retval:=for n:=1:1000 sum (n*dx)^(a-1)*(1-n*dx)^(b-1)*dx;
return(retval/beta(a,b));
end;

betacumulative(3,3,1);
betacumulative(3,3,0.3);

Cauchy/Student’s distributions
Let X and Y be independent, normally distributed. Let T=X/Y

(1)   \begin{eqnarray*}f_t(T)&=&\int \delta(T-X/Y) \, f_x(X) \, f_y(Y) \, dX \, dY\nonumber\\ &=&\int \delta(T-u) {1\over 2\pi}e^{-u^2Y^2/2} \, e^{-Y^2/2} Y \, du \, dy\nonumber\\ &=& \int Y \, e^{-(1+T^2)Y^2/2} \, {dY\over 2\pi}={1\over\pi(1+T^2)}\nonumber\end{eqnarray*}

Student’s t-distribution may be a little less obvious. The t-statistic can be written as t={x\over \sqrt{y/n}} in which x is normally distributed being a sum/difference of means, and y has an independent n-variable \chi^2 distribution

    \[f_x(X)={1\over \sqrt{2\pi}} \, e^{-X^2/2}, \qquad f_y(Y)={1\over 2^{n/2}\Gamma(n/2)} \, Y^{n/2-1} \, e^{-Y^2/2}\]

so that t is distributed as

(2)   \begin{eqnarray*}f_t(T)&=&\int \delta(T-{X\over \sqrt{Y/n}}) \, {1\over \sqrt{2\pi}} \, e^{-X^2/2} \, {1\over 2^{n/2}\Gamma(n/2)} \, Y^{n/2-1} \, e^{-Y^2/2} \, dX \, dY\nonumber\\ &=&{1\over \sqrt{2\pi}}{1\over 2^{n/2}\Gamma(n/2)}\int \sqrt{{Y\over n}} \, e^{-YT^2/2n} \, Y^{n/2-1} \, e^{-Y^2/2} \,  dY\nonumber\\ &=& {1\over \sqrt{n\pi} \Gamma(n/2)}{1\over (1+T^2/n)^{(n+1)/2}} \int_0^\infty v^{(n+1)/2-1} \, e^{-v} \, dv\nonumber\\ &=& {\Gamma((n+1)/2)\over \sqrt{n\pi} \Gamma(n/2)}{1\over (1+T^2/n)^{(n+1)/2}}\nonumber\end{eqnarray*}

This is a modified Cauchy distribution.

Functions of random variables

    \[F(y)=\wp_x(x\le y)=\int_{-\infty}^y f_x(\xi) \, d\xi, \qquad f_x(y)={d\over dy}\wp_x(x\le y)={dF\over dy}(y)\]

This is used to obtain the PDF of the function g(x) of a random variable x from the PDF of x, consider that

    \[\wp_g(g\le \eta)=\int_{-\infty}^\eta f_g(\xi) \, d\xi\]

however this is also the probability the x will be less than g^{-1}(\eta);

    \[\wp_g(g\le \eta)=\wp_x(x\le g^{-1}(\eta))=\int_{-\infty}^{g^{-1}(\eta)} f_x(\xi) \, d\xi\]

Now just take the derivative using Newton’s law to get the PDF of g (and thus proving Theorem 6.1);

    \[f_g(\eta)={d\over d\eta}\wp_g(g\le \eta)={d\over d\eta}\int_{-\infty}^{g^{-1}(\eta)} f_x(\xi) \, d\xi=\Big{|}{d g^{-1}(\eta)\over d\eta}\Big{|} \, f_x(g^{-1}(\eta))\]

This is the basis of the so-called inverse method for computing random deviates with some particular PDF.
Example
Let x be a uniform deviate on the interval 0\le x\le 1, meaning that f_x(\xi)=1. This is the type of built-in random number generator computer operating systems have. Find the PDF for a random number y=x^n.
First we get our cumulant

    \[\wp_y(y\le \eta)=\wp_x(x\le \eta^{{1\over n}})=\int_0^{\eta^{{1\over n}}} 1 \cdot d\xi=\eta^{{1\over n}}\]

and finally

    \[f_y(\eta)={d\over d\eta} \wp_y(y\le \eta)={1\over n} \eta^{{1\over n}-1}\]

if 0\le \eta\le 1.
Example
If X is uniformly distributed over [0,1], find the probabilty density function of Y=e^X.
Let y=e^x

    \[\wp\{y\le a\}=\wp\{x<\ln a\}=\int_0^{\ln a} 1 \, dx=\ln a, \qquad f_y(\xi)={d\over d\xi}\ln(\xi)\Big{|}={1\over \xi}\]

Example
Find the distribution of R=A\sin\theta, where A is a fixed constant and \theta is uniformly distributed on [-\pi/2,\pi/2]. This arises in ballistics. if a projectile is launched at speed v and angle \theta from the horizontal, its range is R={v^2\over g}\sin(2\theta).
Use the example of the inverse method, for g(x);

    \[f_g(\eta)=\Big{|} {dg^{-1}(\eta)\over d\eta}\Big{|} \, f_x(g^{-1}(\eta))\]

but we have x=\theta, f_x(\eta)={1\over \pi}, and

    \[g(\eta)=R(\eta)={v_0^2\over 2} \, \sin 2\eta\]

and so

    \[g^{-1}(\eta)={1\over 2} \sin^{-1}\Big({g\eta\over v_0^2}\Big), \qquad {dg^{-1}(\eta)\over d\eta}={1\over 2} {1\over\sqrt{({v_0^2\over g})^2-\eta^2}}\]

and so the ranges are distributed according to

    \[f_R(\eta)=f_g(\eta)={1\over 2\pi} {1\over\sqrt{({v_0^2\over g})^2-\eta^2}}\]

The cumulative probability F(R) will be an inverse sine function (verify).

Example (Problem 5.16 in Ross)
In 10,000 independent tosses of a coin the coin landed heads 5800 times. Is it reasonable to assume that the coin is fair?
What is the probability of landing within one \sigma of \mu the mean for a normal distribution?

    \[\wp\{X<\mu+\sigma\}-\wp\{X<\mu-\sigma\}\]

    \[={1\over 2}\Big(erf\Big({(\mu+\sigma-\mu)\over \sqrt{2} \, \sigma}\Big)-erf\Big({(\mu-\sigma-\mu)\over \sqrt{2} \, \sigma}\Big)=0.682689492137\]

The probability of landing within two \sigma of \mu the mean is

    \[\wp\{X<\mu+2\sigma\}-\wp\{X<\mu-2\sigma\}\]

    \[={1\over 2}\Big(erf\Big({(\mu+2\sigma-\mu)\over \sqrt{2} \, \sigma}\Big)-erf\Big({(\mu-2\sigma-\mu)\over \sqrt{2} \, \sigma}\Big)=0.954499736104\]

For 10,000 coin tosses the probability of X heads will be normally distributed about the mean \mu=(10,000)(1/2)=5000 with \sigma=\sqrt{(10000)(1/2)(1/2)}=50. This coin is turning up heads 16 standard deviations beyond the mean, this is not a fair coin.
Example
If X is uniformly distributed on [0,1] find the pdf of Y=e^X.
Let y=e^x

    \[\wp\{y\le a\}=\wp\{x<\ln a\}=\int_0^{\ln a} 1 \, dx=\ln a,\]

    \[ f_y(\xi)={d\over d\xi}\ln(\xi)\Big{|}={1\over \xi}\]

Home 2.0
error: Content is protected !!