Eclectic Data

Here are some interesting data sets. Look at them. Play with them. See what they tell us.
All of these data are in text files that are either tab-delimited (separated by tabs) or comma-delimited (separated by commas). After you click on the format you want, the data will appear in your browser window. You can then use your browser's Save or Save As command to save the data. You can open this saved file with the Macintosh version of Smith's Statistical Package and with virtually all commercial statistical, spreadsheet, and graphics programs.
The 1993 Philadelphia Election    tab    comma
In a 1993 election in the Second Senatorial District in Philadelphia, which would determine which party controlled the Pennsylvania State Senate, the Republican candidate won among the votes cast on election day, 19,691 to 19,127. However, the Democrat won the absentee ballots, 1,391 to 366, thereby winning the election by 461 votes, and Republicans charged that many absentee ballots had been illegally solicited or cast. A federal judge ruled that the Democrats had engaged in a "civil conspiracy" to win the election and declared the Republican the winner. Among the evidence presented was an analysis of the statistical relationship between absentee votes and election-day votes in 21 other state senatorial elections in Philadelphia during the period 1982-1993. Estimate the simple regression model with absentee votes as the dependent variable and election-day votes as the explanatory variable. Is the estimated slope plausible? Is the relationship statistically significant? In the disputed 1993 election, the difference in the election-day votes was -564. Find a 95 percent prediction interval for the absentee votes. Is 1025, the actual value for the absentee votes, inside this prediction interval? This data set contains the absentee and election-day votes in the 21 other elections:
election day
absentee
These data are the differences (Democrat minus Republican) between the number of votes cast for the Democrat and Republican candidates in 21 elections other than the disputed one.
The Political Business Cycle    tab    comma
Do economic events affect presidential elections? Do presidential elections affect economic policies? Estimate a simple regression model with the percentage of the total vote for major-party presidential candidates received by the incumbent party as the dependent variable and the change in the unemployment rate during the presidential election year as the explanatory variable. Are the values of the estimated slope and intercept plausible? Is the relationship statistically significant? Use the unemployment data to test these null hypotheses: (a) the unemployment rate is equally likely to increase or decrease in a presidential election year; and (b) the average change in the unemployment rate in a presidential election year is 0. This data set contains unemployment and voting data for 25 presidential elections, 1990 through 1996:
year
unemployment
incumbent
The year is the year that the presidential election was held. Unemployment is the percentage-point difference between the civilian unemployment rate during the presidential election year and the preceding year. In the 1996 election, for example, the unemployment rate was 5.6% in 1995 and 5.4% in 1996, giving a difference of -0.2%. Incumbent is the percentage of the total vote for the major-party presidential candidates that was received by the incumbent party in that election. The incumbent party is the political party of the U.S. President at the time of the election.
Interest Rates and Inflation    tab    comma
Are interest rates related to the rate of inflation? Is the real interest rate (the nominal interest rate minus the rate of inflation) constant? This data set contains the following U.S. annual data, 1926-1997:
year
T-bill rate
rate of inflation
The T-bill rate is the actual return on 1-year Treasury bills (T-bills). The financial press traditionally calculates the returns on T-bills on a discount basis, relative to the face value rather than the purchase price. If you buy a 1-year $10,000 T-bill for $9,400, this $600 discount is a 6 percent discount from the $10,000 face value and the T-bill rate is conventionally reported as 6 percent. But from the standpoint of the investor, this is a $600 return on an investment of $9,400, not $10,000, and the actual rate of return is $600/$9,400 = 0.0638 (6.38 percent). The data here are the actual percentage rates of return on 1-year T-bills purchased at the beginning of each year. The rate of inflation is the annual percentage change in the consumer price index for all urban consumers from the beginning of the year to the end.
Real Dividends    tab    comma
Have corporate stock prices and dividends kept pace with inflation? See whether there has been any long-term trend in real prices or real dividends and look for any breaks in such a trend. Also see if there is a statistically persuasive relationship between stock prices ad dividends. This data set contains the following U.S. annual data, 1950-1997:
year
real stock prices
real dividends
These stock price, dividend, and consumer price data are annual averages of monthly data. The real stock prices is the value of the S&P 500 index of stock prices, divided by the value of the consumer price index for all urban consumers. The real dividend is the aggregate annual dividends paid by the stocks in the S&P 500, divided by the value of the consumer price index. Because all of these series are indexes, the real stock price data have been rescaled so that the 1950 value is equal to 100; the real dividend data were scaled proportionately to maintain the correct ratio of stock prices to dividends.
The Fed's Stock Valuation Model    tab    comma
The Federal Reserve Board (the Fed) uses a simple stock valuation model to gauge whether stocks are cheap or expensive. Are these 'fair values' closely correlated with stock prices? Do large deviations between fair value and actual prices accurately predict changes in stock prices? This data set contains the following U.S. month-end data, January 1979 through January 1998:
actual stock prices
Fed fair-values
The actual stock price is the level of the S&P 500 index of stock prices. The fair value is the average of private stock analysts' estimates of the earnings over the next 12 months for the companies in the S&P 500 index, divided by the interest rate on 10-year Treasury bonds.
Student Heights and Weights    tab    comma
Is the distribution of a small sample of college student heights and weights roughly bell-shaped? Are there important differences between the female and male distributions? (Can you explain the odd blip in the male distribution? Are weights related to heights? Are heights related to the parents' heights? What about weights? This data set contains the following results of a January 1999 survey of 33 students in an introductory statistics class at Pomona College:
gender
student's height
student's weight
mother's height
mother's weight
father's height
father's weight
The gender variable equals 1 if the student is female, 0 if male. The heights are in inches, the weights in pounds. The parental data are for the student's biological parents.
Frosh 15    tab    comma
College folklore has it that students gain an average of 15 pounds during their first year in college--perhaps due to the inexpensive, starchy food served in college dining halls? Is the distribution of reported weight gains roughly bell-shaped? Estimate 95% confidence intervals for females, males, and both genders combined. Do they include 15? Test these null hypotheses: The average weight gain is 0. The average weight gain is 15. The average weight gain is the same for female and male students. This data set contains data from a Spring 1997 random sample of 100 second-year students at Pomona College who were asked how many pounds they had gained or lost during their first year at college:
gender (1 if female, 0 if male)
weight gain
The reported weight gains are in pounds. We don't know if the students surveyed made unbiased estimates.
Old Semi-Faithful    tab    comma
One of the most famous geysers in the world is Old Faithful in Yellowstone National Park. Several times each day, it has a stunning eruption that lasts 1 to 5 minutes and sends hot water and steam more than 100 feet in the air. The National Park Service Rangers post a sign predicting when the next eruption will occur. Is the interval of time between eruptions well described by the bell-shaped normal distribution? Can this interval be predicted with reasonable accuracy from the duration of the preceding eruption? This data set contains 112 observations made during the daylight hours for two weeks in September 1995:
interval
duration
The interval between eruption is in minutes; the duration of the preceding eruption is in seconds.
Los Angeles Rainfall    tab    comma
Is the annual rainfall in Los Angeles well-described by a bell-shaped histogram? Are there any discernible trends over the past century. Is the rainfall one year correlated positively, negatively, or not at all with the rainfall the previous year? This data set contains annual rainfall at the Los Angeles Civic Center for 117 years, from 1878 through 1994:
year
rainfall
All rainfall data are in inches, for the calendar year.
Weather Predictions    tab    comma
How highly correlated are the daily high and low temperatures in Los Angeles during the winter? Which are more accurate, the predicted high or predicted low temperatures? Do the daily high and low prediction errors come from populations with a 0 mean? Are the daily high and low prediction errors independent or each other? Of the errors the day before? This data set contains these daily weather data at the Los Angeles Civic Center for 101 days, from November 7, 1996 through February 15, 1997:
low temperature
high temperature
predicted low temperature
predicted high temperature
All temperatures are in degrees Fahrenheit. The forecasts were made one day earlier and printed in The Los Angeles Times on the day in question.
Batting and Earned Run Averages    tab    comma
Are the season batting averages of major league baseball players well described by the bell-shaped normal distribution? Can a player's batting average be predicted with reasonable accuracy from his batting average the preceding season? Do batting averages regress toward the mean? What about pitchers' earned run averages? This data set contains the 1997 and 1998 batting averages of 379 batters and the earned run averages of 292 pitchers:
1997 batting averages
1998 batting averages
1997 earned run averages
1998 earned run averages
These batting averages are for all major league players who had at least 50 times at bat or 25 innings pitched in both 1997 and 1998. A high batting average is good for batters; a low earned run average is good for pitchers.

StatGames | Interactive Quiz | Software Download | Links to Data
Project Ideas | Report Guidelines | Letters to the Editor
Magazine Cover (Home)