ML/AI: Probability and statistics-III. Continuous random variables

Let $X$ be a random variable with the property that there exists a non-negative function $f(x)$ defined on $[-\infty, \infty]$ such that

$\wp(X\in B)=\int_B f(x)\, dx$

then $X$ is a continuous random variable and $f(x)$ is its probability distribution function or PDF ( $f(x)={d\wp(x)\over dx}$ ). An example is $|\psi(x)|^2$ is the probability that a quantum $x$ measurement returns a value between $x$ and $x+dx$ .
We have some simple properties

$\wp(a\le X\le b)=\int_a^b f(x)\, dx, \qquad \wp(X=a)=\int_a^a f(x)\, dx=0$

and

$\wp(X\le a)=\int_{-\infty}^a f(x)\, dx$

The cumulant is a very useful construction
The cumulant is a very useful construction

$F(a)=\wp(-\infty<X<a)=\int_{-\infty}^a f(x)\, dx$

Uniform random variable

$f(x)=\left\{\begin{array}{ll} 1 & 0<x<1\\ 0 & \mbox{otherwise}\end{array}\right., \qquad \wp(a\le X\le b)=\int_a^b f(x)\, dx=b-a$

Microcanonical distribution
In statistical mechanics the microcanonical distribution applies to a system of constant energy and assumes that all states, which are specification of coordinate $q$ and conjugate momentum $p$ for which $H(q,p)=E$ are equally likely, so $f(E)={1\over \Omega(E)}$ . For a harmonic oscillator $H={p^2\over 2m}+{m\omega^2\over 2}q^2$ and the number of states of energy between $E$ and $E+dE$ is

${d\Omega\over dE}=\int \delta(E-{p^2\over 2m}+{m\omega^2\over 2}q^2)\, {dp dq\over h}$

Restrict attention to $x\ge 0$ and $p\ge 0$ , then for fixed $E$ there is only a single independent degree of freedom, either $x$ or $p$ , which is not uniformly distributed. Using

$\int \, f(x)\, \delta(ax)\, dx=\int \, f(x)\, \delta(x)\, {dx\over a}$

$\int \, f(x)\, \delta(x^2-y^2)\, dx=\int \, f(x)\, \delta(x-|y|)\, {dx\over 2|y|}$

the number of states accessible per unit energy is

${d\Omega\over dE}=\int \delta(E-{p^2\over 2m}+{m\omega^2\over 2}q^2)\, {dp dq\over h}$

$={2m\over h}\int dp dq \delta(p^2-(2mE-m^2\omega^2q^2))$

$={m\over h}\int dp dq {\delta(p-\sqrt{(2mE-m^2\omega^2q^2)})\over \sqrt{(2mE-m^2\omega^2q^2)}}$

$={m\over h}\int {dq\over \sqrt{(2mE-m^2\omega^2q^2)}}={\pi\over 2h\omega}$

For the full phase space we have $\Omega(E)={4\pi E\over 2h\omega}={E\over \hbar\omega}$ which increases with energy (what we call a normal system).
If we sample the $q$ -state of an oscillator we are most likely to get a $q$ value that the oscillator spends most of its time in, $E={m\over 2}\dot{q}^2+{m\omega^2\over 2} q^2$

$f_q(q)\, dq={d\wp\over dq}\, dq={dt\over T}={1\over T}{dq\over |\dot{q}|}={m\over T}{dq\over \sqrt{2mE-m^2\omega^2q^2}}$

$={2\pi m\over \omega}{dq\over \sqrt{2mE-m^2\omega^2q^2}}$

Normal random variable

$f(x)={1\over \sqrt{2\pi}\, \sigma}e^{-(x-\mu)^2/2\sigma^22}$

One of its most important properties is that if $X$ is normally distributed with parameters $\mu, \sigma^2$ , then $Y=aX+b$ is normally distributed with parameters $a\mu+b, a^2\sigma^2$ . This illustrates the use of the cumulant

$F_Y(c)=\wp(Y\le c)=\wp(X\le {(c-b)\over a})=F_X({(c-b)\over a})$

$=\int_{-\infty}^{(c-b)/a}{1\over \sqrt{2\pi}\, \sigma}e^{-(x-\mu)^2/2\sigma^22}$

$=\int_{-\infty}^c {1\over \sqrt{2\pi}\, a\sigma}e^{-(y-(a\mu-b))^2/2a^2\sigma^22}\, dy$

Exponential and Gamma distributions
The exponential distribution has a parameter $\lambda>0$ , and pdf and cumulant

$f(x)=\left\{\begin{array}{ll} \lambda e^{-\lambda x} & x\ge 0\\ 0 & x<0\end{array}\right.$

$F(a)=\wp(X\le a)=\int_0^a \lambda e^{-\lambda x}\, dx=1-e^{-\lambda a}$

The exponential distribution is memoryless, if $X$ represents a “waiting time”, until an event occurs, there is no dependence on how much time has already elapsed (i.e. $\wp(X>a+b|X>a)=\wp(X>b)$ ). The only continuous memory-less distributions are exponential.
The Gamma distribution also has parameters $\lambda>0, t>0$ ,

$f(x)=\left\{\begin{array}{ll} {\lambda\over \Gamma(t)} (\lambda x)^{t-1}\, e^{-\lambda x} & x\ge 0\\ 0 & x<0\end{array}\right.$

An example is the canonical distribution for a normal system of $N$ particles, one whose entropy increases with energy as a power of the energy

$f(E)={1\over Z} e^{S(E)/k_B}e^{-E/k_BT}={C\over Z} E^{{3N\over 2}-1} e^{-E/k_BT}$

$\lambda=1/k_BT$ , $t={3N\over 2}$

Normal random variable cumulant
Begin with the pdf (probability distribution)

$f(x)={1\over \sqrt{2\pi\sigma^2}} \, e^{-{(x-\mu)^2\over 2\sigma^2}}$

for which (let $\xi=(x-\mu)/\sigma$ , $dx=\sigma \, d\xi$ )

$F(a)=\wp\{x\le a\}={1\over \sqrt{2\pi\sigma^2}} \int_{-\infty}^a \, e^{-{(x-\mu)^2\over 2\sigma^2}} \, dx$

$={1\over \sqrt{2\pi}} \, \int_{-\infty}^{{a-\mu\over \sigma}} \, e^{-\xi^2/2} \, d\xi=\Phi({a-\mu\over \sigma})\equiv\Phi_{\mu,\sigma}(a)$

$erf(z)$ is called the error function

$erf(z)={2\over\sqrt{\pi}} \, \int_0^z e^{-\xi^2} \, d\xi$

$erfc(z)={2\over\sqrt{\pi}} \, \int_z^\infty e^{-\xi^2} \, d\xi=1-erf(z)$

Note that

$\int_{-\infty}^\infty e^{-\xi^2} \, d\xi=2\int_{-\infty}^0 e^{-\xi^2} \, d\xi=\sqrt{\pi}$

The relation between the two is

$\Phi(z)={1\over 2}\Big(1+erf(z/\sqrt{2})\Big), \qquad erf(z)=2\Phi(\sqrt{2} \, z)-1$

$F(a)=\Phi({a-\mu\over \sigma})={1\over 2}\Big(1+erf({a-\mu\over \sqrt{2} \, \sigma})\Big)$

The error function is included in the function libraries of most CAS systems such as REDUCE, Maple and Mathematica.
Example
$X$ is normally distributed with $\mu=3, \sigma=3$ . Compute $\wp(2<X<5)$ . We will get $\Phi(2/3)-(1-\Phi(1/3))$ .

% In REDUCE...
on rounded;
load_package specfn;
% Check table on Ross p. 131
0.5*(1+erf(3.4/sqrt(2)));
%example 3a in Chapter 5 of Ross
0.5*(1+erf((5-3)/(3*sqrt(2))))-0.5*(1+erf((2-3)/(3*sqrt(2))));
% returns 0.3779

Cauchy distribution
The cumulative probability for the Cauchy distribution is simple

$F(a)={1\over \pi}\int_{-\infty}^a {dx\over 1+(x-\theta)^2} ={1\over \pi}\int_{-\infty}^{a-\theta} {d\xi\over 1+\xi^2}$

$={1\over \pi}\Big(\tan^{-1}(a-\theta)-\tan^{-1}(-\infty)\Big)={1\over 2}+{1\over \pi} \tan^{-1}(a-\theta)$

Beta distribution
$X$ is beta distributed if

$f(x)={1\over \beta(a,b)} x^{a-1}(1-x)^{b-1}, \qquad 0<x<1$

For the record, the Beta distribution comes up in the “q-model” of the distribution of loads in a column of beads. Note that

$\beta(a,b)=\int_0^1 x^{a-1}(1-x)^{b-1} \, dx={\Gamma(a)\Gamma(b)\over \Gamma(a+b)}$

CAS systems also have the beta and gamma functions, but the integral for the cumulative probability is something that must be numerically computed

load_package specfn;
procedure betacumulative(a,b,xmax);
begin
dx:=xmax/1000;
retval:=for n:=1:1000 sum (n*dx)^(a-1)*(1-n*dx)^(b-1)*dx;
return(retval/beta(a,b));
end;

betacumulative(3,3,1);
betacumulative(3,3,0.3);

Cauchy/Student’s distributions
Let $X$ and $Y$ be independent, normally distributed. Let $T=X/Y$

(1) $\begin{eqnarray*}f_t(T)&=&\int \delta(T-X/Y) \, f_x(X) \, f_y(Y) \, dX \, dY\nonumber\\ &=&\int \delta(T-u) {1\over 2\pi}e^{-u^2Y^2/2} \, e^{-Y^2/2} Y \, du \, dy\nonumber\\ &=& \int Y \, e^{-(1+T^2)Y^2/2} \, {dY\over 2\pi}={1\over\pi(1+T^2)}\nonumber\end{eqnarray*}$

Student’s $t$ -distribution may be a little less obvious. The $t$ -statistic can be written as $t={x\over \sqrt{y/n}}$ in which $x$ is normally distributed being a sum/difference of means, and $y$ has an independent $n$ -variable $\chi^2$ distribution

$f_x(X)={1\over \sqrt{2\pi}} \, e^{-X^2/2}, \qquad f_y(Y)={1\over 2^{n/2}\Gamma(n/2)} \, Y^{n/2-1} \, e^{-Y^2/2}$

so that $t$ is distributed as

(2) $\begin{eqnarray*}f_t(T)&=&\int \delta(T-{X\over \sqrt{Y/n}}) \, {1\over \sqrt{2\pi}} \, e^{-X^2/2} \, {1\over 2^{n/2}\Gamma(n/2)} \, Y^{n/2-1} \, e^{-Y^2/2} \, dX \, dY\nonumber\\ &=&{1\over \sqrt{2\pi}}{1\over 2^{n/2}\Gamma(n/2)}\int \sqrt{{Y\over n}} \, e^{-YT^2/2n} \, Y^{n/2-1} \, e^{-Y^2/2} \, dY\nonumber\\ &=& {1\over \sqrt{n\pi} \Gamma(n/2)}{1\over (1+T^2/n)^{(n+1)/2}} \int_0^\infty v^{(n+1)/2-1} \, e^{-v} \, dv\nonumber\\ &=& {\Gamma((n+1)/2)\over \sqrt{n\pi} \Gamma(n/2)}{1\over (1+T^2/n)^{(n+1)/2}}\nonumber\end{eqnarray*}$

This is a modified Cauchy distribution.

Functions of random variables

$F(y)=\wp_x(x\le y)=\int_{-\infty}^y f_x(\xi) \, d\xi, \qquad f_x(y)={d\over dy}\wp_x(x\le y)={dF\over dy}(y)$

This is used to obtain the PDF of the function $g(x)$ of a random variable $x$ from the PDF of $x$ , consider that

$\wp_g(g\le \eta)=\int_{-\infty}^\eta f_g(\xi) \, d\xi$

however this is also the probability the $x$ will be less than $g^{-1}(\eta)$ ;

$\wp_g(g\le \eta)=\wp_x(x\le g^{-1}(\eta))=\int_{-\infty}^{g^{-1}(\eta)} f_x(\xi) \, d\xi$

Now just take the derivative using Newton’s law to get the PDF of $g$ (and thus proving Theorem 6.1);

$f_g(\eta)={d\over d\eta}\wp_g(g\le \eta)={d\over d\eta}\int_{-\infty}^{g^{-1}(\eta)} f_x(\xi) \, d\xi=\Big{|}{d g^{-1}(\eta)\over d\eta}\Big{|} \, f_x(g^{-1}(\eta))$

This is the basis of the so-called inverse method for computing random deviates with some particular PDF.
Example
Let $x$ be a uniform deviate on the interval $0\le x\le 1$ , meaning that $f_x(\xi)=1$ . This is the type of built-in random number generator computer operating systems have. Find the PDF for a random number $y=x^n$ .
First we get our cumulant

$\wp_y(y\le \eta)=\wp_x(x\le \eta^{{1\over n}})=\int_0^{\eta^{{1\over n}}} 1 \cdot d\xi=\eta^{{1\over n}}$

and finally

$f_y(\eta)={d\over d\eta} \wp_y(y\le \eta)={1\over n} \eta^{{1\over n}-1}$

if $0\le \eta\le 1$ .
Example
If $X$ is uniformly distributed over $[0,1]$ , find the probabilty density function of $Y=e^X$ .
Let $y=e^x$

$\wp\{y\le a\}=\wp\{x<\ln a\}=\int_0^{\ln a} 1 \, dx=\ln a, \qquad f_y(\xi)={d\over d\xi}\ln(\xi)\Big{|}={1\over \xi}$

Example
Find the distribution of $R=A\sin\theta$ , where $A$ is a fixed constant and $\theta$ is uniformly distributed on $[-\pi/2,\pi/2]$ . This arises in ballistics. if a projectile is launched at speed $v$ and angle $\theta$ from the horizontal, its range is $R={v^2\over g}\sin(2\theta)$ .
Use the example of the inverse method, for $g(x)$ ;

$f_g(\eta)=\Big{|} {dg^{-1}(\eta)\over d\eta}\Big{|} \, f_x(g^{-1}(\eta))$

but we have $x=\theta$ , $f_x(\eta)={1\over \pi}$ , and

$g(\eta)=R(\eta)={v_0^2\over 2} \, \sin 2\eta$

and so

$g^{-1}(\eta)={1\over 2} \sin^{-1}\Big({g\eta\over v_0^2}\Big), \qquad {dg^{-1}(\eta)\over d\eta}={1\over 2} {1\over\sqrt{({v_0^2\over g})^2-\eta^2}}$

and so the ranges are distributed according to

$f_R(\eta)=f_g(\eta)={1\over 2\pi} {1\over\sqrt{({v_0^2\over g})^2-\eta^2}}$

The cumulative probability $F(R)$ will be an inverse sine function (verify).

Example (Problem 5.16 in Ross)
In $10,000$ independent tosses of a coin the coin landed heads $5800$ times. Is it reasonable to assume that the coin is fair?
What is the probability of landing within one $\sigma$ of $\mu$ the mean for a normal distribution?

$\wp\{X<\mu+\sigma\}-\wp\{X<\mu-\sigma\}$

$={1\over 2}\Big(erf\Big({(\mu+\sigma-\mu)\over \sqrt{2} \, \sigma}\Big)-erf\Big({(\mu-\sigma-\mu)\over \sqrt{2} \, \sigma}\Big)=0.682689492137$

The probability of landing within two $\sigma$ of $\mu$ the mean is

$\wp\{X<\mu+2\sigma\}-\wp\{X<\mu-2\sigma\}$

$={1\over 2}\Big(erf\Big({(\mu+2\sigma-\mu)\over \sqrt{2} \, \sigma}\Big)-erf\Big({(\mu-2\sigma-\mu)\over \sqrt{2} \, \sigma}\Big)=0.954499736104$

For $10,000$ coin tosses the probability of $X$ heads will be normally distributed about the mean $\mu=(10,000)(1/2)=5000$ with $\sigma=\sqrt{(10000)(1/2)(1/2)}=50$ . This coin is turning up heads $16$ standard deviations beyond the mean, this is not a fair coin.
Example
If $X$ is uniformly distributed on $[0,1]$ find the pdf of $Y=e^X$ .
Let $y=e^x$