Arts & Culture

Key statistical ideas celebrate birthdays

5 min read

Stephen Stigler speaks at Saturday conference, Monday colloquium

University of Chicago statistics professor Stephen M. Stigler, a frequent visitor to Harvard, has a favorite movie — “Magic Town,” a black-and-white flick from 1947. It stars James Stewart as a pollster who discovers a magical place: a heartland town whose citizens have a range of opinions that are a near-perfect composite of the whole United States.

In eight or nine street interviews, Stigler said, Stewart’s character gets poll results that otherwise would require hundreds or thousands of interrogatory encounters.

Stigler is a historian of statistics, the science that uses complicated mathematics — a world of scatter plots and curve fitting — in order to extract useful information from data. It’s employed to analyze information, infer probability, and estimate uncertainty.

Last Saturday (Sept. 27), Stigler delivered a paper on the 100th anniversary of a groundbreaking paper by W.S. “Student” Gosset, “The Probable Error of a Mean.” His audience of 125 statistics professionals and students was gathered at the Radcliffe Gymnasium for “Quintessential Contributions,” a daylong series of talks. The event, sponsored by Harvard’s Department of Statistics, celebrated the “birthdays” of key statistical ideas and their inventors.

All of the birthdays — except that of Gosset’s idea — recognized Harvard statisticians who are still active in the profession.

Donald B. Rubin, the John L. Loeb Professor of Statistics, turned 65 this year, and his own breakthrough study, “Multiple Imputations in Sample Surveys,” turned 30. Worldwide, he’s one of the top-10 most-cited writers in mathematics.

Professor of statistics Carl N. Morris turned 70 this year, and his paper “Parametric Empirical Bayes Inference” turned 25. He’s an expert in analytical methods designed for public policy, health care, and sports.

Herman Chernoff, professor emeritus of applied mathematics and statistics, turned 85 this year — and celebrating its 35th birthday was his paper “The Uses of Faces to Represent Points in K-Dimensional Space Graphically.” He’s best known for “Chernoff faces,” a statistical tool for representing high-dimensional data, including the multitude of subtle variables used to map the human face.

This year, six of 12 faculty members in the department of statistics have birthdays divisible by five, said chair Xiao-Li Meng, a student of the humorously unusual. (To introduce what he called “the birthday boys” Saturday, he showed the results of a Google image search on each of the names — including a beautiful blond the search engine had mysteriously linked to the name “Don Rubin.”)

Later on Saturday, conference-goers gathered at the Cambridge Queen’s Head in Harvard’s Memorial Hall to celebrate the modern fruits of what in the 1930s was Gosset’s day job: head brewer for Guinness beer in London.

Stigler later calculated that with all the events to be celebrated in the history of statistics, “there’s always a good reason to have a party.”

On Monday (Sept. 29), he stayed in town to address about 50 students and professors in a crowded third-floor classroom in the Science Center. Stigler’s talk, in professional terms, was inflammatory: “The Five Most Consequential Ideas in the History of Statistics.” The session was one of several colloquia sponsored this fall by the Statistics Department.

To qualify on this shortlist, the ideas must have lasted a while, he said, and must have had demonstrable consequences for statistics.

The first idea was to combine observations in order to arrive at a simple mean. This “ species of averaging,” said Stigler, found expression in 1635, through the work of English curate and astronomer Henry Gellibrand.

“By combining observations, you actually increase the amount of information you have,” said Stigler of an idea that came late to science. “It may seem like ancient history, but it’s not.”

Even today, he said, there’s “determined resistance” to the idea of combining observations, because it pushes “individuality out of sight” in pursuit of a broader idea.

The “root-N rule” is the second consequential idea, said Stigler. That’s the notion, first articulated in 1730, that the accuracy of your conclusions increases relative to the rate you accumulate observations. Specifically, to double that accuracy, you have to increase the number of observations fourfold.

Third on the list is the idea of “the hypothesis test,” the statistical notion that mathematical tests can determine the probability of an outcome. This idea (though not the sophisticated math now associated with it) was in place by 1248, said Stigler, when the London Mint began periodically to test its product for composition and weight.

The fourth and fifth consequential ideas in statistics both had the same source, said Stigler — an 1869 book by Victorian polymath Francis Galton. “Hereditary Genius” was a mathematical examination of how talent is inheritable.

Galton discovered through a study of biographical compilations that a “level of eminence” within populations is steady over time and over various disciplines (law, medicine). Of the one in 4,000 people who made it into such a compilation, one-tenth had a close relative on the same list.

This led to what Stigler called the fourth consequential idea: the innovative notion that statistics can be evaluated in terms of internal measurements of variability — the percentiles of bell curves (in statistics terms, “normal distribution”) that in 1869 Galton started to employ as scales for talent.

The fifth idea was based upon an empirical finding. In a series of studies between 1869 and 1889, Galton was the first to observe the phenomenon of regression toward the mean.

Essentially, the idea posits that in most realistic situations over time — Galton studied familial height variations, for example — the most extreme observed values tend to “regress” toward the center, or mean.

If he could extend his list of consequential ideas in statistics, Stigler said he would include random sampling, statistical design, the graphical display of data, chi-squared distribution, and modern computation and simulation.

A century from now, the big ideas in statistics will still help transform and expand knowledge, said Stigler. “Basic statistical concepts, whether you put them in your top five or not, [are] important to the way we think about things.”