Giáo trình

# Introductory Statistics

Mathematics and Statistics

## Stem-and-Leaf Graphs (Stemplots), Line Graphs, and Bar Graphs

Tác giả: OpenStaxCollege

One simple graph, the stem-and-leaf graph or stemplot, comes from the field of exploratory data analysis. It is a good choice when the data sets are small. To create the plot, divide each observation of data into a stem and a leaf. The leaf consists of a final significant digit. For example, 23 has stem two and leaf three. The number 432 has stem 43 and leaf two. Likewise, the number 5,432 has stem 543 and leaf two. The decimal 9.3 has stem nine and leaf three. Write the stems in a vertical line from smallest to largest. Draw a vertical line to the right of the stems. Then write the leaves in increasing order next to their corresponding stem.

For Susan Dean's spring pre-calculus class, scores for the first exam were as follows (smallest to largest):
33; 42; 49; 49; 53; 55; 55; 61; 63; 67; 68; 68; 69; 69; 72; 73; 74; 78; 80; 83; 88; 88; 88; 90; 92; 94; 94; 94; 94; 96; 100

 Stem Leaf 3 3 4 2 9 9 5 3 5 5 6 1 3 7 8 8 9 9 7 2 3 4 8 8 0 3 8 8 8 9 0 2 4 4 4 4 6 10 0

The stemplot shows that most scores fell in the 60s, 70s, 80s, and 90s. Eight out of the 31 scores or approximately 26% $\left(\frac{8}{31}\right)$ were in the 90s or 100, a fairly high number of As.

The stemplot is a quick way to graph data and gives an exact picture of the data. You want to look for an overall pattern and any outliers. An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500) while others may indicate that something unusual is happening. It takes some background information to explain outliers, so we will cover them in more detail later.

The data are the distances (in kilometers) from a home to local supermarkets. Create a stemplot using the data:
1.1; 1.5; 2.3; 2.5; 2.7; 3.2; 3.3; 3.3; 3.5; 3.8; 4.0; 4.2; 4.5; 4.5; 4.7; 4.8; 5.5; 5.6; 6.5; 6.7; 12.3

Do the data seem to have any concentration of values?

The leaves are to the right of the decimal.

The value 12.3 may be an outlier. Values appear to concentrate at three and four kilometers.

 Stem Leaf 1 1 5 2 3 5 7 3 2 3 3 5 8 4 0 2 5 5 7 8 5 5 6 6 5 7 7 8 9 10 11 12 3

A side-by-side stem-and-leaf plot allows a comparison of the two data sets in two columns. In a side-by-side stem-and-leaf plot, two sets of leaves share the same stem. The leaves are to the left and the right of the stems. [link] and [link] show the ages of presidents at their inauguration and at their death. Construct a side-by-side stem-and-leaf plot using this data.

 Ages at Inauguration Ages at Death 9 9 8 7 7 7 6 3 2 4 6 9 8 7 7 7 7 6 6 6 5 5 5 5 4 4 4 4 4 2 1 1 1 1 1 0 5 3 6 6 7 7 8 9 5 4 4 2 1 1 1 0 6 0 0 3 3 4 4 5 6 7 7 7 8 7 0 0 1 1 1 4 7 8 8 9 8 0 1 3 5 8 9 0 0 3 3

 President Age President Age President Age Washington 57 Lincoln 52 Hoover 54 J. Adams 61 A. Johnson 56 F. Roosevelt 51 Jefferson 57 Grant 46 Truman 60 Madison 57 Hayes 54 Eisenhower 62 Monroe 58 Garfield 49 Kennedy 43 J. Q. Adams 57 Arthur 51 L. Johnson 55 Jackson 61 Cleveland 47 Nixon 56 Van Buren 54 B. Harrison 55 Ford 61 W. H. Harrison 68 Cleveland 55 Carter 52 Tyler 51 McKinley 54 Reagan 69 Polk 49 T. Roosevelt 42 G.H.W. Bush 64 Taylor 64 Taft 51 Clinton 47 Fillmore 50 Wilson 56 G. W. Bush 54 Pierce 48 Harding 55 Obama 47 Buchanan 65 Coolidge 51
 President Age President Age President Age Washington 67 Lincoln 56 Hoover 90 J. Adams 90 A. Johnson 66 F. Roosevelt 63 Jefferson 83 Grant 63 Truman 88 Madison 85 Hayes 70 Eisenhower 78 Monroe 73 Garfield 49 Kennedy 46 J. Q. Adams 80 Arthur 56 L. Johnson 64 Jackson 78 Cleveland 71 Nixon 81 Van Buren 79 B. Harrison 67 Ford 93 W. H. Harrison 68 Cleveland 71 Reagan 93 Tyler 71 McKinley 58 Polk 53 T. Roosevelt 60 Taylor 65 Taft 72 Fillmore 74 Wilson 67 Pierce 64 Harding 57 Buchanan 77 Coolidge 60

Another type of graph that is useful for specific data values is a line graph. In the particular line graph shown in [link], the x-axis (horizontal axis) consists of data values and the y-axis (vertical axis) consists of frequency points. The frequency points are connected using line segments.

In a survey, 40 mothers were asked how many times per week a teenager must be reminded to do his or her chores. The results are shown in [link] and in [link].

 Number of times teenager is reminded Frequency 0 2 1 5 2 8 3 14 4 7 5 4

Bar graphs consist of bars that are separated from each other. The bars can be rectangles or they can be rectangular boxes (used in three-dimensional plots), and they can be vertical or horizontal. The bar graph shown in [link] has age groups represented on the x-axis and proportions on the y-axis.

By the end of 2011, Facebook had over 146 million users in the United States. [link] shows three age groups, the number of users in each age group, and the proportion (%) of users in each age group. Construct a bar graph using this data.

 Age groups Number of Facebook users Proportion (%) of Facebook users 13–25 65,082,280 45% 26–44 53,300,200 36% 45–64 27,885,100 19%

The columns in [link] contain: the race or ethnicity of students in U.S. Public Schools for the class of 2011, percentages for the Advanced Placement examine population for that class, and percentages for the overall student population. Create a bar graph with the student race or ethnicity (qualitative data) on the x-axis, and the Advanced Placement examinee population percentages on the y-axis.

 Race/Ethnicity AP Examinee Population Overall Student Population 1 = Asian, Asian American or Pacific Islander 10.3% 5.7% 2 = Black or African American 9.0% 14.7% 3 = Hispanic or Latino 17.0% 17.6% 4 = American Indian or Alaska Native 0.6% 1.1% 5 = White 57.1% 59.2% 6 = Not reported/other 6.0% 1.7%

# References

Burbary, Ken. Facebook Demographics Revisited – 2001 Statistics, 2011. Available online at http://www.kenburbary.com/2011/03/facebook-demographics-revisited-2011-statistics-2/ (accessed August 21, 2013).

“9th Annual AP Report to the Nation.” CollegeBoard, 2013. Available online at http://apreport.collegeboard.org/goals-and-findings/promoting-equity (accessed September 13, 2013).

“Overweight and Obesity: Adult Obesity Facts.” Centers for Disease Control and Prevention. Available online at http://www.cdc.gov/obesity/data/adult.html (accessed September 13, 2013).

# Chapter Review

A stem-and-leaf plot is a way to plot data and look at the distribution. In a stem-and-leaf plot, all data values within a class are visible. The advantage in a stem-and-leaf plot is that all values are listed, unlike a histogram, which gives classes of data values. A line graph is often used to represent a set of data values in which a quantity varies with time. These graphs are useful for finding trends. That is, finding a general pattern in data sets including temperature, sales, employment, company profit or cost over a period of time. A bar graph is a chart that uses either horizontal or vertical bars to show comparisons among categories. One axis of the chart shows the specific categories being compared, and the other axis represents a discrete value. Some bar graphs present bars clustered in groups of more than one (grouped bar graphs), and others show the bars divided into subparts to show cumulative effect (stacked bar graphs). Bar graphs are especially useful when categorical data is being used.

# Homework

Student grades on a chemistry exam were: 77, 78, 76, 81, 86, 51, 79, 82, 84, 99

1. Construct a stem-and-leaf plot of the data.
2. Are there any potential outliers? If so, which scores are they? Why do you consider them outliers?

[link] contains the 2010 obesity rates in U.S. states and Washington, DC.

 State Percent (%) State Percent (%) State Percent (%) Alabama 32.2 Kentucky 31.3 North Dakota 27.2 Alaska 24.5 Louisiana 31.0 Ohio 29.2 Arizona 24.3 Maine 26.8 Oklahoma 30.4 Arkansas 30.1 Maryland 27.1 Oregon 26.8 California 24.0 Massachusetts 23.0 Pennsylvania 28.6 Colorado 21.0 Michigan 30.9 Rhode Island 25.5 Connecticut 22.5 Minnesota 24.8 South Carolina 31.5 Delaware 28.0 Mississippi 34.0 South Dakota 27.3 Washington, DC 22.2 Missouri 30.5 Tennessee 30.8 Florida 26.6 Montana 23.0 Texas 31.0 Georgia 29.6 Nebraska 26.9 Utah 22.5 Hawaii 22.7 Nevada 22.4 Vermont 23.2 Idaho 26.5 New Hampshire 25.0 Virginia 26.0 Illinois 28.2 New Jersey 23.8 Washington 25.5 Indiana 29.6 New Mexico 25.1 West Virginia 32.5 Iowa 28.4 New York 23.9 Wisconsin 26.3 Kansas 29.4 North Carolina 27.8 Wyoming 25.1
1. Use a random number generator to randomly pick eight states. Construct a bar graph of the obesity rates of those eight states.
2. Construct a bar graph for all the states beginning with the letter "A."
3. Construct a bar graph for all the states beginning with the letter "M."
1. Example solution for using the random number generator for the TI-84+ to generate a simple random sample of 8 states. Instructions are as follows.
2. Number the entries in the table 1–51 (Includes Washington, DC; Numbered vertically)
3. Press MATH
4. Arrow over to PRB
5. Press 5:randInt(
6. Enter 51,1,8)
7. Eight numbers are generated (use the right arrow key to scroll through the numbers). The numbers correspond to the numbered states (for this example: {47 21 9 23 51 13 25 4}. If any numbers are repeated, generate a different number by using 5:randInt(51,1)). Here, the states (and Washington DC) are {Arkansas, Washington DC, Idaho, Maryland, Michigan, Mississippi, Virginia, Wyoming}.

Corresponding percents are {30.1, 22.2, 26.5, 27.1, 30.9, 34.0, 26.0, 25.1}.