# Confidence Intervals

An **interval** is a set of (real) numbers between two values. So, for example, (0, 1) is all of the real numbers x such that 0 < x < 1 .

If we have an unknown parameter, we may find an estimator for this parameter and use for the parameter. However, how reliable this estimate is we do not know. This is where confidence intervals come in. Instead of estimating the parameter, we say that there is a 95% (or some other percentage) chance that a given interval contains the parameter.

For example, suppose we have an unknown parameter q. If the probability that the interval (a,b) contains q is 0.95, i.e. if P(a < q < b) = 0.95, then (a,b) is known as the 95% confidence interval for q.

N.B. q is fixed and it is the interval which varies. It is therefore __incorrect__ to say that there is a 95% chance that q lies in the interval. Rather, there is a 95% chance the interval contains q (a subtle, but important difference!).

**Constructing Confidence Intervals**

If the random variable X has a normal distribution with mean m and variance s^{2}, then the sample mean

also has a normal distribution with mean m, but with variance s^{2}/n (see random samples). In other words,

~ N(m,s^{2}/n) .

In fact, if we have a random variable X which has **any** distribution (not necessarily normal), by the central limit theorem the distribution of

will be approximately normal with mean m and with variance s^{2}/n, for large n.

So standardising this, we get:

(since the mean is m and the standard deviation is s / √ n).

From the Normal Distribution section, we know that P(-1.96 < Z < 1.96) = 0.95.

Rearranging this we get P[

- 1.96(s / ? n) < m <

+ 1.96(s / ?n) ] = 0.95

Hence the 95% confidence interval for m is:

(

- 1.96(s / ?n) ,

+ 1.96(s / ?n) )

N.B. in the confidence interval we use x not X.