Practice Test 4 http://staging2.cnx.org/content new Practice Test 4 **new** 2013/10/18 21:47:29.711 GMT-5 2013/10/18 21:47:29.765 GMT-5 Words Numbers Words Numbers techsupport@cnx.org words_stats words_stats words_stats Mathematics and Statistics en
12.1 Linear Equations Which of the following equations is/are linear? y = –3x y = 0.2 + 0.74x y = –9.4 – 2x A and B A, B, and C e. A, B, and C. All three are linear equations of the form y = mx + b. To complete a painting job requires four hours setup time plus one hour per 1,000 square feet. How would you express this information in a linear equation? Let y = the total number of hours required, and x the square footage, measured in units of 1,000. The equation is: y = x + 4 A statistics instructor is paid a per-class fee of $2,000 plus $100 for each student in the class. How would you express this information in a linear equation? Let y = the total payment, and x the number of students in a class. The equation is: y = 100(x) + 2,000 A tutoring school requires students to pay a one-time enrollment fee of $500 plus tuition of $3,000 per year. Express this information in an equation. Let y = the total cost of attendance, and x the number of years enrolled. The equation is: y = 3,000(x) + 500
12.2: Slope and Y-intercept of a Linear Equation Use the following information to answer the next four exercises. For the labor costs of doing repairs, an auto mechanic charges a flat fee of $75 per car, plus an hourly rate of $55. What are the independent and dependent variables for this situation? The independent variable is the hours worked on a car. The dependent variable is the total labor charges to fix a car. Write the equation and identify the slope and intercept. Let y = the total charge, and x the number of hours required. The equation is: y = 55x + 75 The slope is 55 and the intercept is 75. What is the labor charge for a job that takes 3.5 hours to complete? y = 55(3.5) + 75 = 267.50 One job takes 2.4 hours to complete, while another takes 6.3 hours. What is the difference in labor costs for these two jobs? Because the intercept is included in both equations, while you are only interested in the difference in costs, you do not need to include the intercept in the solution. The difference in number of hours required is: 6.3 – 2.4 = 3.9. Multiply this difference by the cost per hour: 55(3.9) = 214.5. The difference in cost between the two jobs is $214.50.
12.3: Scatter Plots Describe the pattern in this scatter plot, and decide whether the X and Y variables would be good candidates for linear regression.
The X and Y variables have a strong linear relationship. These variables would be good candidates for analysis with linear regression.
Describe the pattern in this scatter plot, and decide whether the X and Y variables would be good candidates for linear regression.
The X and Y variables have a strong negative linear relationship. These variables would be good candidates for analysis with linear regression.
Describe the pattern in this scatter plot, and decide whether the X and Y variables would be good candidates for linear regression.
There is no clear linear relationship between the X and Y variables, so they are not good candidates for linear regression.
Describe the pattern in this scatter plot, and decide whether the X and Y variables would be good candidates for linear regression.
The X and Y variables have a strong positive relationship, but it is curvilinear rather than linear. These variables are not good candidates for linear regression.
12.4: The Regression Equation Use the following information to answer the next four exercises. Height (in inches) and weight (In pounds) in a sample of college freshman men have a linear relationship with the following summary statistics: x ¯ = 68.4 y ¯ =141.6 sx = 4.0 sy = 9.6 r = 0.73 Let Y = weight and X = height, and write the regression equation in the form: y ^ =a+bx What is the value of the slope? r( s y s x )=0.73( 9.6 4.0 )=1.7521.75 What is the value of the y intercept? a= y ¯ b x ¯ =141.61.752(68.4)=21.763221.76 Write the regression equation predicting weight from height in this data set, and calculate the predicted weight for someone 68 inches tall. y ^ =21.76+1.75(68)=140.76
12.5: Correlation Coefficient and Coefficient of Determination The correlation between body weight and fuel efficiency (measured as miles per gallon) for a sample of 2,012 model cars is –0.56. Calculate the coefficient of determination for this data and explain what it means. The coefficient of determination is the square of the correlation, or r2. For this data, r2 = (–0.56)2 = 0.3136 ≈ 0.31 or 31%. This means that 31 percent of the variation in fuel efficiency can be explained by the bodyweight of the automobile. The correlation between high school GPA and freshman college GPA for a sample of 200 university students is 0.32. How much variation in freshman college GPA is not explained by high school GPA? The coefficient of determination = 0.322 = 0.1024. This is the amount of variation in freshman college GPA that can be explained by high school GPA. The amount that cannot be explained is 1 – 0.1024 = 0.8976 ≈ 0.90. So about 90 percent of variance in freshman college GPA in this data is not explained by high school GPA. Rounded to two decimal places what correlation between two variables is necessary to have a coefficient of determination of at least 0.50? r= r 2 0.5 =0.7071067810.71 You need a correlation of 0.71 or higher to have a coefficient of determination of at least 0.5.
12.6: Testing the Significance of the Correlation Coefficient Write the null and alternative hypotheses for a study to determine if two variables are significantly correlated. H0: ρ = 0 Ha: ρ ≠ 0 In a sample of 30 cases, two variables have a correlation of 0.33. Do a t-test to see if this result is significant at the α = 0.05 level. Use the formula: t= r n2 1 r 2 t= r n2 1 r 2 = 0.33 302 1 0.33 2 =1.85 The critical value for α = 0.05 for a two-tailed test using the t29 distribution is 2.045. Your value is less than this, so you fail to reject the null hypothesis and conclude that the study produced no evidence that the variables are significantly correlated. Using the calculator function tcdf, the p-value is 2tcdf(1.85, 10^99, 29) = 0.0373. Do not reject the null hypothesis and conclude that the study produced no evidence that the variables are significantly correlated. In a sample of 25 cases, two variables have a correlation of 0.45. Do a t-test to see if this result is significant at the α = 0.05 level. Use the formula: t= r n2 1 r 2 t= r n2 1 r 2 = 0.45 252 1 0.45 2 =2.417 The critical value for α = 0.05 for a two-tailed test using the t24 distribution is 2.064. Your value is greater than this, so you reject the null hypothesis and conclude that the study produced evidence that the variables are significantly correlated. Using the calculator function tcdf, the p-value is 2tcdf(2.417, 10^99, 24) = 0.0118. Reject the null hypothesis and conclude that the study produced evidence that the variables are significantly correlated.
12.7: Prediction Use this information for the next two questions. A study relating the grams of potassium (Y) to the grams of fiber (X) per serving in enriched flour products (bread, rolls, etc.) produced the equation: y ^ =25+16x For a product with five grams of fiber per serving, what are the expected grams of potassium per serving? y ^ =25+16(5)=105 Comparing two products, one with three grams of fiber per serving and one with six grams of fiber per serving, what is the expected difference in grams of potassium per serving? Because the intercept appears in both predicted values, you can ignore it in calculating a predicted difference score. The difference in grams of fiber per serving is 6 – 3 = 3 and the predicted difference in grams of potassium per serving is (16)(3) = 48.
12.8: Outliers In the context of regression analysis, what is the definition of an outlier, and what is a rule of thumb to evaluate if a given value in a data set is an outlier? An outlier is an observed value that is far from the least squares regression line. A rule of thumb is that a point more than two standard deviations of the residuals from its predicted value on the least squares regression line is an outlier. In the context of regression analysis, what is the definition of an influential point, and how does an influential point differ from an outlier? An influential point is an observed value in a data set that is far from other points in the data set, in a horizontal direction. Unlike an outlier, an influential point is determined by its relationship with other values in the data set, not by its relationship to the regression line. The least squares regression line for a data set is y ^ =5+0.3x and the standard deviation of the residuals is 0.4. Does a case with the values x = 2, y = 6.2 qualify as an outlier? The predicted value for y is: y ^ =5+0.3x=5.6 . The value of 6.2 is less than two standard deviations from the predicted value, so it does not qualify as an outlier. Residual for (2, 6.2): 6.2 – 5.6 = 0.6 (0.6 < 2(0.4)) The least squares regression line for a data set is y ^ =2.30.1x and the standard deviation of the residuals is 0.13. Does a case with the values x = 4.1, y = 2.34 qualify as an outlier? The predicted value for y is: y ^ = 2.3 – 0.1(4.1) = 1.89. The value of 2.32 is more than two standard deviations from the predicted value, so it qualifies as an outlier. Residual for (4.1, 2.34): 2.32 – 1.89 = 0.43 (0.43 > 2(0.13))
13.1: One-Way ANOVA What are the five basic assumptions to be met if you want to do a one-way ANOVA? Each sample is drawn from a normally distributed population All samples are independent and randomly selected. The populations from which the samples are draw have equal standard deviations. The factor is a categorical variable. The response is a numerical variable. You are conducting a one-way ANOVA comparing the effectiveness of four drugs in lowering blood pressure in hypertensive patients. What are the null and alternative hypotheses for this study? H0: μ1 = μ2 = μ3 = μ4 Ha: At least two of the group means μ1, μ2, μ3, μ4 are not equal. What is the primary difference between the independent samples t-test and one-way ANOVA? The independent samples t-test can only compare means from two groups, while one-way ANOVA can compare means of more than two groups. You are comparing the results of three methods of teaching geometry to high school students. The final exam scores X1, X2, X3, for the samples taught by the different methods have the following distributions: X1 ~ N(85, 3.6) X1 ~ N(82, 4.8) X1 ~ N(79, 2.9) Each sample includes 100 students, and the final exam scores have a range of 0–100. Assuming the samples are independent and randomly selected, have the requirements for conducting a one-way ANOVA been met? Explain why or why not for each assumption. Each sample appears to have been drawn from a normally distributed populations, the factor is a categorical variable (method), the outcome is a numerical variable (test score), and you were told the samples were independent and randomly selected, so those requirements are met. However, each sample has a different standard deviation, and this suggests that the populations from which they were drawn also have different standard deviations, which is a violation of an assumption for one-way ANOVA. Further statistical testing will be necessary to test the assumption of equal variance before proceeding with the analysis. You conduct a study comparing the effectiveness of four types of fertilizer to increase crop yield on wheat farms. When examining the sample results, you find that two of the samples have an approximately normal distribution, and two have an approximately uniform distribution. Is this a violation of the assumptions for conducting a one-way ANOVA? One of the assumptions for a one-way ANOVA is that the samples are drawn from normally distributed populations. Since two of your samples have an approximately uniform distribution, this casts doubt on whether this assumption has been met. Further statistical testing will be necessary to determine if you can proceed with the analysis.
13.2: The <emphasis effect="italics">F</emphasis> Distribution Use the following information to answer the next seven exercises. You are conducting a study of three types of feed supplements for cattle to test their effectiveness in producing weight gain among calves whose feed includes one of the supplements. You have four groups of 30 calves (one is a control group receiving the usual feed, but no supplement). You will conduct a one-way ANOVA after one year to see if there are difference in the mean weight for the four groups. What is SSwithin in this experiment, and what does it mean? SSwithin is the sum of squares within groups, representing the variation in outcome that cannot be attributed to the different feed supplements, but due to individual or chance factors among the calves in each group. What is SSbetween in this experiment, and what does it mean? SSbetween is the sum of squares between groups, representing the variation in outcome that can be attributed to the different feed supplements. What are k and i for this experiment? k = the number of groups = 4 n1 = the number of cases in group 1 = 30 n = the total number of cases = 4(30) = 120 If SSwithin = 374.5 and SStotal = 621.4 for this data, what is = SSbetween? SStotal = SSwithin + SSbetween so SSbetween = SStotalSSwithin 621.4 – 374.5 = 246.9 What are MSbetween, and MSwithin, for this experiment? The mean squares in an ANOVA are found by dividing each sum of squares by its respective degrees of freedom (df). For SStotal, df = n – 1 = 120 – 1 = 119. For SSbetween, df = k – 1 = 4 – 1 = 3. For SSwithin, df = 120 – 4 = 116. MSbetween = 246.9 3 = 82.3 MSwithin = 374.5 116 = 3.23 What is the F Statistic for this data? F= M S between M S within = 82.3 3.23 =25.48 If there had been 35 calves in each group, instead of 30, with the sums of squares remaining the same, would the F Statistic be larger or smaller? It would be larger, because you would be dividing by a smaller number. The value of MSbetween would not change with a change of sample size, but the value of MSwithin would be smaller, because you would be dividing by a larger number (dfwithin would be 136, not 116). Dividing a constant by a smaller number produces a larger result.
13.3: Facts About the F Distribution Which of the following numbers are possible F Statistics? 2.47 5.95 –3.61 7.28 0.97 All but choice c, –3.61. F Statistics are always greater than or equal to 0. Histograms F1 and F2 below display the distribution of cases from samples from two populations, one distributed F3,15 and one distributed F5,500. Which sample came from which population?
As the degrees of freedom increases in an F distribution, the distribution becomes more nearly normal. Histogram F2 is closer to a normal distribution than histogram F1, so the sample displayed in histogram F1 was drawn from the F3,15 population, and the sample displayed in histogram F2 was drawn from the F5,500 population.
The F Statistic from an experiment with k = 3 and n = 50 is 3.67. At α = 0.05, will you reject the null hypothesis? Using the calculator function Fcdf, p-value = Fcdf(3.67, 1E, 3,50) = 0.0182. Reject the null hypothesis. The F Statistic from an experiment with k = 4 and n = 100 is 4.72. At α = 0.01, will you reject the null hypothesis? Using the calculator function Fcdf, p-value = Fcdf(4.72, 1E, 4, 100) = 0.0016 Reject the null hypothesis.
13.4: Test of Two Variances What assumptions must be met to perform the F test of two variances? The samples must be drawn from populations that are normally distributed, and must be drawn from independent populations. You believe there is greater variance in grades given by the math department at your university than in the English department. You collect all the grades for undergraduate classes in the two departments for a semester, and compute the variance of each, and conduct an F test of two variances. What are the null and alternative hypotheses for this study? Let σ M 2 = variance in math grades, and σ E 2 = variance in English grades. H0: σ M 2 σ E 2 Ha: σ M 2 > σ E 2