Expectation (also called mean or average) is computed as follows
To compute this for a discrete distribution introduce generating functions
Binomial:
Geometric:
Note that is the probability of success on the try (failure up until then) for independent trials.
Poisson:
For the continuous distributions we perform integrations to compute expectations.
Exponential:
Beta:
Normal:
Expectations of functions
and of course
but we have the Law of the Unconscious Statistician
Note that the expectation is linear
and for sums of jointly distributed random variables
Variance
Variance (expectation of the squared deviation from expected value) is defined as
For example the for the binomial mass function
and for the Poisson
the properties are simple:
Relevance to quantum computation
Expectations and unbiased estimators of the mean are important because of the
Law of large numbers: Let be a time series for i.i.d random numbers (independently perform same experiment)
rate of convergence is influenced by large or small variance of .
Weak law of large numbers; for any specified margin , for a sufficiently large sample the probability that the mean is within the margin around the expectation tends to one.
Proof. Let for all , then
Chebyshevs’s inequality (a weak bound, see Ross)
then says
Strong law of large numbers: the probability that the mean converges on the expectation as is one
To extract an answer from a quantum computer something must be measured, and the outcome of a quantum measurement is a random number, so the answer gotten is “correct” with some probability. The strategy then, since such computers can be very fast, is to repeat the experiment to get to a high probability of getting a correct answer.
Suppose that measurements are made (independently), each with a probability of being correct within some tolerance. If the outcome is verifiable to be correct, such as the case of a factorization of a number, the probability that the algorithm must be run exactly times in order to get a correct answer (within the acceptable tolerance) is
(the number of repetitions needed is a discrete geometric random variable), the cumulative probability or probability that we will geta correct answer within trials is
In order to attain a correct answer with probability with an algorithm that produces a verifiably correct answer with probability we need 5 runs
If the answer can’t be verified, we make many measurements and work under the assumption that the average of these is an estimator of the “correct” value . Apply an alpha-trimmed mean filter as in digital signal processing (DSP) to get rid of outliers
Place a window over element;
Pick up elements;
Order elements;
Discard elements at the beginning and at the end of the got ordered set;
Take the unweighted average sum up the remaining elements and divide the sum by their number.
so in our case we order the set from smallest to largest and discard the first elements and the last elements. The distribution of the number of correct results among the is binomial
so the probability that more than of the measurements are within tolerance is
with
for which . To obtain a correct answer with probability from a quantum algorithm in this case, with probability of correct results per run and we must have
run the algorithm about times ().
Moment generators
In many examples of counting problems used to find the size of a sample space, or to construct a simple means of computing probabilities or expectations, we used generating functions such as
which is a standard combinatorial method. Ross’s section 7.7 introduces such a gadget as the generating function for moments
(we select the sign needed to make the sum or integral converge. Then
This expectation is called the moment of . The moment is called the skew, and moment the kurtosis of the distribution. A list of moment generating functions is created between pages 223 and 229 of Ross, it is very useful in the problems, you should try to make use of it to compute variances.
Bernoulli.
Binomial.
Poisson
Exponential
Normal
Properties
If and are independent
This generalizes to sums of random variables
The generating function lets us say something sensible about the Cauchy distribution: when it was introduced in the text it was shown that you can’t compute its expectation by standard methods, but consider
(using Cauchy’s theorem from Complex variables), which clearly converges and any moment can be computed from it
which in fact should be obvious from a graph of the distribution, it is sharply maximal for . If you need to work with the Cauchy distribution, the moment generating function lets you do it.
You can see that the moment generating function is either a Laplace or Fourier transformation of the distribution function
which are invertible transformation (which you discover in your DEQ courses). If you are given , you can recover by inverting the transform. If you are given only the integer moments you {\bf cannot in general} reconstruct from them unless has finite support: only for for finite. This is called the classical moment problem.
Correlation
Recall the definitions of variance (standard deviation squared) and covariance
and correlation
measures the extent to which is a linear function of , so an increase/decrease in results in an increase/decrease in .
Example (Ross theoretical exercise 7.20). Let , then
therefore
For two variables , if then they are strongly linearly related ( and increasing or decreasing function of ), if they are not so related. In statistical and quantum physics we call the covariance the correlation function (we don’t use the denominator)
An important example
For two jointly distributed variables
and if they are Bernoulli, then and so
and of course in the previous (-pairing) example we saw how the pool is diminished by two with each pairing, and if a pair is an -pair, the pool of -types is diminished as well, so for our pairing problem
and we can use this to get
Another important example
(1)
This linearity extends
substitute one into the other
Ross problem 7.32
Let be the event of die roll producing a , be the event of die roll producing a , then , both are Bernoulli variables, and are independent (uncorrelated) if .
This is a very good example of simply correlated variables.