PracticeTest 2 http://staging2.cnx.org/content new PracticeTest 2 **new** 2013/10/18 03:23:26.032 GMT-5 2013/10/18 03:25:34.087 GMT-5 Words Numbers Words Numbers techsupport@cnx.org words_stats words_stats words_stats Mathematics and Statistics en
<para>Probability Distribution Function (PDF) for a Discrete Random Variable</para> Use the following information to answer the next five exercises. You conduct a survey among a random sample of students at a particular university. The data collected includes their major, the number of classes they took the previous semester, and amount of money they spent on books purchased for classes in the previous semester. If X = student’s major, then what is the domain of X? The domain of X = {English, Mathematics,….], i.e., a list of all the majors offered at the university, plus “undeclared.” If Y = the number of classes taken in the previous semester, what is the domain of Y? The domain of Y = {0, 1, 2, …}, i.e., the integers from 0 to the upper limit of classes allowed by the university. If Z = the amount of money spent on books in the previous semester, what is the domain of Z? The domain of Z = any amount of money from 0 upwards. Why are X, Y, and Z in the previous example random variables? Because they can take any value within their domain, and their value for any particular case is not known until the survey is completed. After collecting data, you find that for one case, z = –7. Is this a possible value for Z? No, because the domain of Z includes only positive numbers (you can’t spend a negative amount of money). Possibly the value –7 is a data entry error, or a special code to indicated that the student did not answer the question. What are the two essential characteristics of a discrete probability distribution? The probabilities must sum to 1.0, and the probabilities of each event must be between 0 and 1, inclusive. Use this discrete probability distribution represented in this table to answer the following six questions. The university library records the number of books checked out by each patron over the course of one day, with the following result: Define the random variable X for this example. x P(x) 0 0.20 1 0.45 2 0.20 3 0.10 4 0.05
Let X = the number of books checked out by a patron.
What is P(x > 2)? P(x > 2) = 0.10 + 0.05 = 0.15 What is the probability that a patron will check out at least one book? P(x ≥ 0) = 1 – 0.20 = 0.80 What is the probability a patron will take out no more than three books? P(x ≤ 3) = 1 – 0.05 = 0.95 If the table listed P(x) as 0.15, how would you know that there was a mistake? The probabilities would sum to 1.10, and the total probability in a distribution must always equal 1.0. What is the average number of books taken out by a patron? x ¯ = 0(0.20) + 1(0.45) + 2(0.20) + 3(0.10) + 4(0.05) = 1.35
Mean or Expected Value and Standard Deviation Use the following information to answer the next four exercises. Three jobs are open in a company: one in the accounting department, one in the human resources department, and one in the sales department. The accounting job receives 30 applicants, and the human resources and sales department 60 applicants. If X = the number of applications for a job, use this information to fill in the table below. x P(x) xP(x)
x P(x) xP(x) 30 0.33 9.90 40 0.33 13.20 60 0.33 19.80
What is the mean number of applicants? x ¯ = 9.90 + 13.20 + 19.80 = 42.90 What is the PDF for X? P(x = 30) = 0.33 P(x = 40) = 0.33 P(x = 60) = 0.33 Add a fourth column to the table, for (xμ)2P(x). x P(x) xP(x) (xμ)2P(x) 30 0.33 9.90 (30-42.90)2(0.33) = 54.91 40 0.33 13.20 (40-42.90)2(0.33) = 2.78 60 0.33 19.90 (60-42.90)2(0.33) = 96.49
What is the standard deviation of X? σ x = 54.91+2.78+96.49 =12.42
Binomial Distribution In a binomial experiment, if p = 0.65, what does q equal? q = 1 – 0.65 = 0.35 What are the required characteristics of a binomial experiment? There are a fixed number of trials. There are only two possible outcomes, and they add up to 1. The trials are independent and conducted under identical conditions. Joe conducts an experiment to see how many times he has to flip a coin before he gets four heads in a row. Does this qualify as a binomial experiment? No, because there are not a fixed number of trials Use the following information to answer the next three exercises. In a particularly community, 65 percent of households include at least one person who has graduated from college. You randomly sample 100 households in this community. Let X = the number of households including at least one college graduate Describe the probability distribution of X. X ~ B(100, 0.65) What is the mean of X? μ = np = 100(0.65) = 65 What is the standard deviation of X? σ x = npq = 100(0.65)(0.35) =4.77 Use the following information to answer the next four exercises. Joe is the star of his school’s baseball team. His batting average is 0.400, meaning that for every ten times he comes to bat (an at-bat), four of those times he gets a hit. You decide to track his batting performance his next 20 at-bats. Define the random variable X in this experiment. X = Joe gets a hit in one at-bat (in one occasion of his coming to bat) Assuming Joe’s probability of getting a hit is independent and identical across all 20 at-bats, describe the distribution of X. X ~ B(20, 0.4) Given this information, what number of hits do you predict Joe will get? μ = np = 20(0.4) = 8 What is the standard deviation of X? σ x = npq = 20(0.40)(0.60) =2.19
4.4: Geometric Distribution What are the three major characteristics of a geometric experiment? A series of Bernoulli trials are conducted until one is a success, and then the experiment stops. At least one trial is conducted, but there is no upper limit to the number of trials. The probability of success or failure is the same for each trial. You decide to conduct a geometric experiment by flipping a coin until it comes up heads. This takes five trials. Represent the outcomes of this trial, using H for heads and T for tails. T T T T H You are conducting a geometric experiment by drawing cards from a normal 52-card pack, with replacement, until you draw the Queen of Hearts. What is the domain of X for this experiment? The domain of X = {1, 2, 3, 4, 5, ….n}. Because you are drawing with replacement, there is no upper bound to the number of draws that may be necessary. You are conducting a geometric experiment by drawing cards from a normal 52-card deck, without replacement, until you draw a red card. What is the domain of X for this experiment? The domain of X = {1, 2, 3, 4, 5, 6, 7, 8., 9, 10, 11, 12…27}. Because you are drawing without replacement, and 26 of the 52 cards are red, you have to draw a red card within the first 17 draws. Use the following information to answer the next three exercises. In a particular university, 27 percent of students are engineering majors. You decide to select students at random until you choose one that is an engineering major. Let X = the number of students you select until you find one that is an engineering major. What is the probability distribution of X? X ~ G(0.24) What is the mean of X? μ=  1 p =  1 0.27 =3.70 What is the standard deviation of X? σ=  1p p 2 =  10.27 0.27 2 =3.16
4.5: Hypergeometric Distribution You draw a random sample of ten students to participate in a survey, from a group of 30, consisting of 16 boys and 14 girls. You are interested in the probability that seven of the students chosen will be boys. Does this qualify as a hypergeometric experiment? List the conditions and whether or not they are met. Yes, because you are sampling from a population composed of two groups (boys and girls), have a group of interest (boys), and are sampling without replacement (hence, the probabilities change with each pick, and you are not performing Bernoulli trials). You draw five cards, without replacement, from a normal 52-card deck of playing cards, and are interested in the probability that two of the cards are spades. What are the group of interest, size of the group of interest, and sample size for this example? The group of interest is the cards that are spades, the size of the group of interest is 13, and the sample size is five.
4.6: Poisson Distribution What are the key characteristics of the Poisson distribution? A Poisson distribution models the number of events occurring in a fixed interval of time or space, when the events are independent and the average rate of the events is known. Use the following information to answer the next three exercises. The number of drivers to arrive at a toll booth in an hour can be modeled by the Poisson distribution. If X = the number of drivers, and the average numbers of drivers per hour is four, how would you express this distribution? X ~ P(4) What is the domain of X? The domain of X = {0, 1, 2, 3, …..) i.e., any integer from 0 upwards. What are the mean and standard deviation of X? μ=4 σ= 4 =2
5.1: Continuous Probability Functions You conduct a survey of students to see how many books they purchased the previous semester, the total amount they paid for those books, the number they sold after the semester was over, and the amount of money they received for the books they sold. Which variables in this survey are discrete, and which are continuous? The discrete variables are the number of books purchased, and the number of books sold after the end of the semester. The continuous variables are the amount of money spent for the books, and the amount of money received when they were sold. With continuous random variables, we never calculate the probability that X has a particular value, but always speak in terms of the probability that X has a value within a particular range. Why is this? Because for a continuous random variable, P(x = c) = 0, where c is any single value. Instead, we calculate P(c < x < d), i.e., the probability that the value of x is between the values c and d. For a continuous random variable, why are P(x < c) and P(xc) equivalent statements? Because P(x = c) = 0 for any continuous random variable. For a continuous probability function, P(x < 5) = 0.35. What is P(x > 5), and how do you know? P(x > 5) = 1 – 0.35 = 0.65, because the total probability of a continuous probability function is always 1. Describe how you would draw the continuous probability distribution described by the function f(x)= 1 10 for 0x10 . What type of a distribution is this? This is a uniform probability distribution. You would draw it as a rectangle with the vertical sides at 0 and 20, and the horizontal sides at 1 10 and 0. For the continuous probability distribution described by the function f(x)= 1 10 for 0x10 , what is the P(0 < x < 4)? P( 0 <x <4 )=( 40 )( 1 10 )= 0.4
5.2: The Uniform Distribution For the continuous probability distribution described by the function f(x)= 1 10 for 0x10 , what is the P(2 < x < 5)? P( 2 <x <5 )=( 52 )( 1 10 )= 0.3 Use the following information to answer the next four exercises. The number of minutes that a patient waits at a medical clinic to see a doctor is represented by a uniform distribution between zero and 30 minutes, inclusive. If X equals the number of minutes a person waits, what is the distribution of X? X ~ U(0, 15) Write the probability density function for this distribution. f(x)= 1 ba for (axb) so f(x)= 1 30 for (0x30) What is the mean and standard deviation for waiting time? μ=  a+b 2 =  0+30 5 =15.0 σ=  (ba) 2 12 =  ( 300 ) 2 12 =8.66 What is the probability that a patient waits less than ten minutes? P( x<10 )=( 10 )( 1 30 )= 0.33
5.3: The Exponential Distribution The distribution of the variable X, representing the average time to failure for an automobile battery, can be written as: X ~ Exp(m). Describe this distribution in words. X has an exponential distribution with decay parameter m and mean and standard deviation 1 m . In this distribution, there will be a relatively large numbers of small values, with values becoming less common as they become larger. If the value of m for an exponential distribution is ten, what are the mean and standard deviation for the distribution? μ=σ= 1 m = 1 10 =0.1 Write the probability density function for a variable distributed as: X ~ Exp(0.2). f(x) = 0.2e–0.2x where x ≥ 0.
6.1: The Standard Normal Distribution Translate this statement about the distribution of a random variable X into words: X ~ (100, 15). The random variable X has a normal distribution with a mean of 100 and a standard deviation of 15. If the variable X has the standard normal distribution, express this symbolically. X ~ N(0,1) Use the following information for the next six exercises. According to the World Health Organization, distribution of height in centimeters for girls aged five years and no months has the distribution: X ~ N(109, 4.5). What is the z-score for a height of 112 inches? z= xμ σ so z= 112109 4.5 =0.67 What is the z-score for a height of 100 centimeters? z= xμ σ so z= 100109 4.5 =2.00 Find the z-score for a height of 105 centimeters and explain what that means In the context of the population.  z= 105109 4.5 =0.89 This girl is shorter than average for her age, by 0.89 standard deviations. What height corresponds to a z-score of 1.5 in this population? 109 + (1.5)(4.5) = 115.75 cm Using the empirical rule, we expect about 68 percent of the values in a normal distribution to lie within one standard deviation above or below the mean. What does this mean, in terms of a specific range of values, for this distribution? We expect about 68 percent of the heights of girls of age five years and zero months to be between 104.5 cm and 113.5 cm. Using the empirical rule, about what percent of heights in this distribution do you expect to be between 95.5 cm and 122.5 cm? We expect 99.7 percent of the heights in this distribution to be between 95.5 cm and 122.5 cm, because that range represents the values three standard deviations above and below the mean. Use the following information to answer the next four exercises. The distributor of lotto tickets claims that 20 percent of the tickets are winners. You draw a sample of 500 tickets to test this proposition.
6.2: Using the Normal Distribution Can you use the normal approximation to the binomial for your calculations? Why or why not. Yes, because both np and nq are greater than five. np = (500)(0.20) = 100 and nq = 500(0.80) = 400 What are the expected mean and standard deviation for your sample, assuming the distributor’s claim is true? μ=np=(500)(0.20)=100 σ= npq = 500(0.20)(0.80) =8.94 What is the probability that your sample will have a mean greater than 100? Fifty percent, because in a normal distribution, half the values lie above the mean. If the z-score for your sample result is –2.00, explain what this means, using the empirical rule. The results of our sample were two standard deviations below the mean, suggesting it is unlikely that 20 percent of the lotto tickets are winners, as claimed by the distributor, and that the true percent of winners is lower. Applying the Empirical Rule, If that claim were true, we would expect to see a result this far below the mean only about 2.5 percent of the time.
7.1: The Central Limit Theorem for Sample Means (Averages) What does the central limit theorem state with regard to the distribution of sample means? The central limit theorem states that if samples of sufficient size drawn from a population, the distribution of sample means will be normal, even if the distribution of the population is not normal. The distribution of results from flipping a fair coin is uniform: heads and tails are equally likely on any flip, and over a large number of trials, you expect about the same number of heads and tails. Yet if you conduct a study by flipping 30 coins and recording the number of heads, and repeat this 100 times, the distribution of the mean number of heads will be approximately normal. How is this possible? The sample size of 30 is sufficiently large in this example to apply the central limit theorem. This theorem ] states that for samples of sufficient size drawn from a population, the sampling distribution of the sample mean will approach normality, regardless of the distribution of the population from which the samples were drawn. The mean of a normally-distributed population is 50, and the standard deviation is four. If you draw 100 samples of size 40 from this population, describe what you would expect to see in terms of the sampling distribution of the sample mean. You would not expect each sample to have a mean of 50, because of sampling variability. However, you would expect the sampling distribution of the sample means to cluster around 50, with an approximately normal distribution, so that values close to 50 are more common than values further removed from 50. X is a random variable with a mean of 25 and a standard deviation of two. Write the distribution for the sample mean of samples of size 100 drawn from this population. X ¯ N(25,0.2) because X ¯ N( μ x , σ x n ) Your friend is doing an experiment drawing samples of size 50 from a population with a mean of 117 and a standard deviation of 16. This sample size is large enough to allow use of the central limit theorem, so he says the standard deviation of the sampling distribution of sample means will also be 16. Explain why this is wrong, and calculate the correct value. The standard deviation of the sampling distribution of the sample means can be calculated using the formula ( σ x n ) , which in this case is ( 16 50 ) . The correct value for the standard deviation of the sampling distribution of the sample means is therefore 2.26. You are reading a research article that refers to “the standard error of the mean.” What does this mean, and how is it calculated? The standard error of the mean is another name for the standard deviation of the sampling distribution of the sample mean. Given samples of size n drawn from a population with standard deviation σx, the standard error of the mean is ( σ x n ) . Use the following information to answer the next six exercises. You repeatedly draw samples of n = 100 from a population with a mean of 75 and a standard deviation of 4.5. What is the expected distribution of the sample means? X ~ N(75, 0.45) One of your friends tries to convince you that the standard error of the mean should be 4.5. Explain what error your friend made. Your friend forgot to divide the standard deviation by the square root of n. What is the z-score for a sample mean of 76? z=  x ¯   μ x σ x =  7675 4.5 =2.2 What is the z-score for a sample mean of 74.7? z= x ¯   μ x σ x =  74.775 4.5 =0.67 What sample mean corresponds to a z-score of 1.5? 75 + (1.5)(0.45) = 75.675 If you decrease the sample size to 50, will the standard error of the mean be smaller or larger? What would be its value? The standard error of the mean will be larger, because you will be dividing by a smaller number. The standard error of the mean for samples of size n = 50 is: ( σ x n )=  4.5 50 =0.64 Use the following information to answer the next two questions. We use the empirical rule to analyze data for samples of size 60 drawn from a population with a mean of 70 and a standard deviation of 9. What range of values would you expect to include 68 percent of the sample means? You would expect this range to include values up to one standard deviation above or below the mean of the sample means. In this case: 70+ 9 60 =71.16 and 70 9 60 =68.84 so you would expect 68 percent of the sample means to be between 68.84 and 71.16. If you increased the sample size to 100, what range would you expect to contain 68 percent of the sample means, applying the empirical rule? 70+ 9 100 =70.9 and 70 9 100 =69.1 so you would expect 68 percent of the sample means to be between 69.1 and 70.9. Note that this is a narrower interval due to the increased sample size.
7.2: The Central Limit Theorem for Sums How does the central limit theorem apply to sums of random variables? For a random variable X, the random variable ΣX will tend to become normally distributed as the size n of the samples used to compute the sum increases. Explain how the rules applying the central limit theorem to sample means, and to sums of a random variable, are similar. Both rules state that the distribution of a quantity (the mean or the sum) calculated on samples drawn from a population will tend to have a normal distribution, as the sample size increases, regardless of the distribution of population from which the samples are drawn. If you repeatedly draw samples of size 50 from a population with a mean of 80 and a standard deviation of four, and calculate the sum of each sample, what is the expected distribution of these sums? ΣXN( n μ x ,( n )( σ x ) ) so ΣXN(4000,28.3) Use the following information to answer the next four exercises. You draw one sample of size 40 from a population with a mean of 125 and a standard deviation of seven. Compute the sum. What is the probability that the sum for your sample will be less than 5,000? The probability is 0.50, because 5,000 is the mean of the sampling distribution of sums of size 40 from this population. Sums of random variables computed from a sample of sufficient size are normally distributed, and in a normal distribution, half the values lie below the mean. If you drew samples of this size repeatedly, computing the sum each time, what range of values would you expect to contain 95 percent of the sample sums? Using the empirical rule, you would expect 95 percent of the values to be within two standard deviations of the mean. Using the formula for the standard deviation is for a sample sum: ( n )( σ x )=( 40 )(7)=44.3 so you would expect 95 percent of the values to be between 5,000 + (2)(44.3) and 5,000 – (2)(44.3), or between 4,911.4 and 5,88.6. What value is one standard deviation below the mean? μ( n )( σ x )=5000( 40 )( 7 )=4955.7 What value corresponds to a z-score of 2.2? 5000+( 2.2 )( 40 )( 7 )=5097.4
7.3: Using the Central Limit Theorem What does the law of large numbers say about the relationship between the sample mean and the population mean? The law of large numbers says that as sample size increases, the sample mean tends to get nearer and nearer to the population mean. Applying the law of large numbers, which sample mean would expect to be closer to the population mean, a sample of size ten or a sample of size 100? You would expect the mean from a sample of size 100 to be nearer to the population mean, because the law of large numbers says that as sample size increases, the sample mean tends to approach the population mea. Use this information for the next three questions. A manufacturer makes screws with a mean diameter of 0.15 cm (centimeters) and a range of 0.10 cm to 0.20 cm; within that range, the distribution is uniform. If X = the diameter of one screw, what is the distribution of X? X ~ N(0.10, 0.20) Suppose you repeatedly draw samples of size 100 and calculate their mean. Applying the central limit theorem, what is the distribution of these sample means? x ¯ N( μ x , σ x n ) and the standard deviation of a uniform distribution is ba 12 . In this example, the standard deviation of the distribution is ba 12 = 0.10 12 =0.03 so X ¯ N( 0.15,0.003 ) Suppose you repeatedly draw samples of 60 and calculate their sum. Applying the central limit theorem, what is the distribution of these sample sums? ΣXN((n)( μ x ),( n )( σ x )) so ΣXN(9.0,0.23)