Before we take up the discussion of linear regression and correlation, we need to examine a way to display the relation between two variables *x* and *y*. The most common and easiest way is a **scatter plot**. The following example illustrates a scatter plot.

In Europe and Asia, m-commerce is popular. M-commerce users have special mobile phones that work like electronic wallets as well as provide phone and Internet services. Users can do everything from paying for parking to buying a TV set or soda from a machine to banking to checking sports scores on the Internet. For the years 2000 through 2004, was
there a relationship between the year and the number of m-commerce users? Construct a scatter plot. Let *x* = the year and let *y* = the number of m-commerce users, in millions.

$x$ (year) | $y$ (# of users) |

2000 | 0.5 |

2002 | 20.0 |

2003 | 33.0 |

2004 | 47.0 |

A scatter plot shows the **direction** of a relationship between the variables. A clear direction happens when there is either:

- High values of one variable occurring with high values of the other variable or low values of one variable occurring with low values of the other variable.
- High values of one variable occurring with low values of the other variable.

You can determine the **strength** of the relationship by looking at the scatter plot and seeing how close the points are to a line, a power function, an exponential function,
or to some other type of function. For a linear relationship there is an exception. Consider a scatter plot where all the points fall on a horizontal line providing a "perfect fit." The horizontal line would in fact show no relationship.

When you look at a scatterplot, you want to notice the **overall pattern** and any **deviations** from the pattern. The following scatterplot examples illustrate these concepts.

In this chapter, we are interested in scatter plots that show a linear pattern. Linear patterns are quite common. The linear relationship is strong if the points are close to a straight line, except in the case of a horizontal line where there is no relationship. If we think that the points show a linear relationship, we would like to draw a line on the scatter plot. This line can be calculated through a process called linear regression. However, we only calculate a regression line if one of the variables helps to explain or predict the other variable. If *x* is the independent variable and *y* the dependent variable,
then we can use a regression line to predict *y* for a given value of *x*

# Chapter Review

Scatter plots are particularly helpful graphs when we want to see if there is a linear relationship among data points. They indicate both the direction of the relationship between the *x* variables and the *y* variables, and the strength of the relationship. We calculate the strength of the relationship between an independent variable and a dependent variable using linear regression.

Does the scatter plot appear linear? Strong or weak? Positive or negative?

The data appear to be linear with a strong, positive correlation.

Does the scatter plot appear linear? Strong or weak? Positive or negative?

Does the scatter plot appear linear? Strong or weak? Positive or negative?

The data appear to have no correlation.

# Homework

The Gross Domestic Product Purchasing Power Parity is an indication of a country’s currency value compared to another country. [link] shows the GDP PPP of Cuba as compared to US dollars. Construct a scatter plot of the data.

Year | Cuba’s PPP | Year | Cuba’s PPP |

1999 | 1,700 | 2006 | 4,000 |

2000 | 1,700 | 2007 | 11,000 |

2002 | 2,300 | 2008 | 9,500 |

2003 | 2,900 | 2009 | 9,700 |

2004 | 3,000 | 2010 | 9,900 |

2005 | 3,500 |

Check student’s solution.

The following table shows the poverty rates and cell phone usage in the United States. Construct a scatter plot of the data

Year | Poverty Rate | Cellular Usage per Capita |

2003 | 12.7 | 54.67 |

2005 | 12.6 | 74.19 |

2007 | 12 | 84.86 |

2009 | 12 | 90.82 |

Does the higher cost of tuition translate into higher-paying jobs? The table lists the top ten colleges based on mid-career salary and the associated yearly tuition costs. Construct a scatter plot of the data.

School | Mid-Career Salary (in thousands) | Yearly Tuition |

Princeton | 137 | 28,540 |

Harvey Mudd | 135 | 40,133 |

CalTech | 127 | 39,900 |

US Naval Academy | 122 | 0 |

West Point | 120 | 0 |

MIT | 118 | 42,050 |

Lehigh University | 118 | 43,220 |

NYU-Poly | 117 | 39,565 |

Babson College | 117 | 40,400 |

Stanford | 114 | 54,506 |

For graph: check student’s solution. Note that tuition is the independent variable and salary is the dependent variable.

If the level of significance is 0.05 and the *p*-value is 0.06, what conclusion can you draw?

If there are 15 data points in a set of data, what is the number of degree of freedom?

13

- Introductory Statistics
- Preface
- Sampling and Data
- Descriptive Statistics
- Introduction
- Stem-and-Leaf Graphs (Stemplots), Line Graphs, and Bar Graphs
- Histograms, Frequency Polygons, and Time Series Graphs
- Measures of the Location of the Data
- Box Plots
- Measures of the Center of the Data
- Skewness and the Mean, Median, and Mode
- Measures of the Spread of the Data
- Descriptive Statistics

- Probability Topics
- Discrete Random Variables
- Introduction
- Probability Distribution Function (PDF) for a Discrete Random Variable
- Mean or Expected Value and Standard Deviation
- Binomial Distribution
- Geometric Distribution
- Hypergeometric Distribution
- Poisson Distribution
- Discrete Distribution (Playing Card Experiment)
- Discrete Distribution (Lucky Dice Experiment)

- Continuous Random Variables
- The Normal Distribution
- The Central Limit Theorem
- Confidence Intervals
- Hypothesis Testing with One Sample
- Hypothesis Testing with Two Samples
- The Chi-Square Distribution
- Linear Regression and Correlation
- F Distribution and One-Way ANOVA
- Appendix A: Review Exercises (Ch 3-13)
- Appendix B: Practice Tests (1-4) and Final Exams
- Appendix C: Data Sets
- Appendix D: Group and Partner Projects
- Appendix E: Solution Sheets
- Appendix F: Mathematical Phrases, Symbols, and Formulas
- Appendix G: Notes for the TI-83, 83+, 84, 84+ Calculators
- Appendix H: Tables