Central Limit Theorem: Using the C.L.T. (modified R. Bloom) 1.4 2008/11/25 12:48:58 US/Central 2008/12/16 12:32:27.240 US/Central Roberta Bloom bloomroberta@deanza.edu Roberta Bloom bloomroberta@deanza.edu elementary statistics This module has examples illustrating use of the Central Limit Theorem is used. This revision of the original module in the collection Collaborative Statistics by S. Dean and Dr. B. Illowsky includes examples only for the CLT for means, and omits their material for the CLT for sums. The second example in this section has been changed to correct errors in the earlier versions of this module. It is important to understand when to use the CLT. Use the CLT for means or averages when you are asked to find the probability for a sample average or mean, or when working with percentiles for sample averages. (If you are being asked to find the probability or percentile of a sum or total, use the CLT for sums.) If you are being asked to find the probability of an individual value, do not use the CLT. Use the distribution of its random variable.
Law of Large Numbers The Law of Large Numbers says that if you take samples of larger and larger size from any population, then the mean x of the sample gets closer and closer to μ. From the Central Limit Theorem, we know that as n gets larger and larger, the sample averages follow a normal distribution. The larger n gets, the smaller the standard deviation gets. (Remember that the standard deviation for X is σ n .) This means that the sample mean x must be close to the population mean μ. We can say that μ is the value that the sample averages approach as n gets larger. The Central Limit Theorem illustrates the Law of Large Numbers. A study involving stress is done on a college campus among the students. The stress scores follow a continuous uniform distribution with the lowest stress score equal to 1 and the highest equal to 5. Using a sample of 75 students, find: aThe probability that the average stress score for the 75 students is less than 2. bThe 90th percentile for the average stress score for the 75 students. Let X = the stress score for one individual studentThe individual stress scores follow a continuous uniform distribution, X ~ U(1, 5) where a=1 and b=5 (See the chapter on Continuous Random Variables). μ X = a + b 2 = 1 + 5 2 = 3 σ X = ( b - a ) 2 12 = ( 5 - 1 ) 2 12 = 1.15 Problems a and b ask you to find a probability or a percentile for an average or mean. The sample size, n, is equal to 75.Let X = the average stress score for the 75 students. For the average stress score, use the CLT which tells us that X ~ N ( μ , σ n ) X ~ N ( 3 , 1.15 75 ) where n = 75. Find P ( X 2 ) . Draw the graph. P ( X 2 ) = 0 The probability that the average stress score is less than 2 is about 0. normalcdf ( 1 , 2 , 3 , 1.15 75 ) = 0 The smallest stress score is 1. Therefore, the smallest average for 75 stress scores is 1. Find the 90th percentile for the sample average of 75 stress scores. Draw a graph. Let k = the 90th precentile. Find k where P ( X k ) = 0.90 . k = 3.17 using invNorm ( .90 , 3 , 1.15 75 ) = 3.17 The 90th percentile for the sample average of 75 scores is about 3.17. This means that 90% of all the averages of samples of 75 stress scores are at most 3.17 and 10% of the sample averages are at least 3.17 . Suppose that a market research analyst for a cell phone company conducts a study of their customers who exceed the time allowance included on their basic cell phone contract; the analyst finds that for those people who exceed the time included in their basic contract, the excess time used follows an exponential distribution with a mean of 22 minutes. Consider a random sample of 80 customers who exceed the time allowance included in their basic cell phone contract.Let X = the excess time used by one INDIVIDUAL cell phone customer who exceeds his contracted time allowance. X ~ Exp(122) From Chapter 5, we know that μ=22 and σ=22. Let X = the AVERAGE excess time used by a sample of n = 80 customers who exceed their contracted time allowance. X ~ N ( 22 , 22 80 ) by the CLT for Sample Means or Averages Using the CLT to find Probability: aFind the probability that the average excess time used by the 80 customers in the sample is longer than 20 minutes. This is asking us to find P ( X 20 ) Draw the graph. b Suppose that one customer who exceeds the time limit for his cell phone contract is randomly selected. Find the probability that this individual customer's excess time is longer than 20 minutes. This is asking us to find P(X 20) c Explain why the probabilities in (a) and (b) are different. Part a.Find: P ( X 20 ) P ( X 20 ) = 0.7919 using normalcdf ( 20 , 1E99 , 22 , 22 80 ) The probability is 0.7919 that the average excess time used is more than 20 minutes, for a sample of 80 customers who exceed their contracted time allowance. 1E99 = 10 99 and -1E99 = - 10 99 . Press the EE key for E. Or just use 10^99 instead of 1E99.Part b.Find P(X>20) . Remember to use the exponential distribution for an individual: X~Exp(1/22). P(X>20) = e^(–(1/22)*20) or e^(–.04545*20) = 0.4029 Part c. Explain why the probabilities in (a) and (b) are different. P ( X 20 ) = 0.4029 but P ( X 20 ) = 0.7919 The probabilities are not equal because we use different distributions to calculate the probability for individuals and for averages. When asked to find the probability of an individual value, use the stated distribution of its random variable; do not use the CLT. Use the CLT with the normal distribution when you are being asked to find the probability for an average. Using the CLT to find Percentiles:Find the 95th percentile for the sample average excess time for samples of 80 customers who exceed their basic contract time allowances. Draw a graph. Let k = the 95th percentile. Find k where P ( X k ) = 0.95 k = 26.0 using invNorm ( .95 , 22 , 22 80 ) = 26.0 The 95th percentile for the sample average excess time used is about 26.0 minutes for random samples of 80 customers who exceed their contractual allowed time. 95% of such samples would have averages under 26 minutes; only 5% of such samples would have averages above 26 minutes.
Average A number that describes the central tendency of the data. There are a number of specialized averages, including the arithmetic mean, weighted mean, median, mode, and geometric mean. Central Limit Theorem Given a random variable (RV) with known mean μ and known variance σ 2 size 12{ {} rSup { size 8{2} } } {}, we are sampling with size n and we are interested in two new RV - sample mean, Xˉ size 12{ { bar {X}}} {},and sample sum,Σ X size 12{X} {}. If the size n of the sample is sufficiently large, then Xˉ size 12{ { bar {X}}} {} N σ 2 n and ΣX size 12{X} {}N n σ 2 . In words, if the size n of the sample is sufficiently large, then the distribution of the sample means and the distribution of the sample sums will approximate a normal distribution regardless of the shape of the population. And even more, the mean of the sampling distribution will equal the population mean and mean of sampling sums will equal n times the population mean. The standard deviation of the distribution of the sample means, σ n , is called standard error of the mean. Exponential Distribution Continuous random variable (RV) that appears when we are interested in intervals of time between some random events, for example, the length of time between emergency arrivals at a hospital. Notation: X~Exp(m) size 12{X "~" ital "Exp" \( m \) } {}; the mean is μ=1m size 12{μ= { {1} over {m} } } {}, and the variance is σ 2 = 1 m 2 , the probability density function is f(x)=memx, size 12{f \( x \) = ital "me" rSup { size 8{- ital "mx"} } ," "} {} x 0 and cumulative distribution is P(Xx)=1emx size 12{P \( X <= x \) =1-e rSup { size 8{- ital "mx"} } } {}. Mean A number to measure the central tendency (average), shortening from arithmetic mean. By definition, the mean for a sample (usually denoted by Xˉ size 12{ { bar {X}}} {}) is Xˉ=Sum of all values in the sampleNumber of values in the sample size 12{ { bar {X}}= { {"Sum of all values in the sample"} over {"Number of values in the sample"} } } {}, and the mean for a population (usually denoted by m size 12{m} {}) is m=Sum of all values in the populationNumber of values in the population size 12{m= { {"Sum of all values in the population"} over {"Number of values in the population"} } } {}.