Panelists Nate Silver (from left) of FiveThirtyEight, Nate Cohn of The Upshot, and David Rothschild of Microsoft Research spoke during a conference about the field of data analytics and its potential applications to politics.

Kris Snibbe/Harvard Staff Photographer

Nation & World

The puzzles for pollsters

5 min read

Even as data analytics become more critical to election success, campaign trends throw them a curve

Since a little-known senator from Illinois named Barack Obama marched to the presidency in 2008 thanks in part to cutting-edge number crunching, the use of data to identify and target specific voters with specific information has become an essential campaign tool, as fundamental as traditional polling and focus groups.

But even as the emerging analytics field becomes more mainstream, the bizarre twists and turns of the 2016 primary season, particularly on the Republican side with front-runner Donald Trump, has made predicting the next president more difficult than ever.

Because the U.S. primary system embraces a fraction of the electorate, “anyone who can drive extreme messages that stimulate turnout can game the system,” said Mark Penn, a former pollster and strategist who has consulted for Bill and Hillary Clinton, during a conference on politics and data analytics Friday organized by Harvard’s Center for American Political Studies.

“If there are 130,000,000 people who we expect to vote and there are 50,000,000 in the primary process, the 70,000,000 who really decide the presidential election are not in the primary process, and that’s driving the media to observe a country that doesn’t exist,” he said. “Until we fix the system, Donald Trump will not be the exception, Donald Trump will be the rule.”

In explaining the rise of Trump, statistician Nate Silver, the founder and editor in chief of FiveThirtyEight.com, said the 2016 race has shown how small sample sizes often yield volatile predictions. Even so, it’s unlikely that Trump’s broad demographic and geographic appeal would have been accurately pinpointed earlier on because no one has built a model for predicting outcomes of the primary nomination process because of its complexity.

“The groups that Trump appeals to are groups that we would have had trouble identifying certainly before the election,” Silver said, because they are “not the typical combination of the red-blue map we’re used to.”

A packed lecture hall listens as moderator Anthony Salvanto (from left) leads panelists Silver,  Cohn, Rothschild, and Clare Malone. Kris Snibbe/Harvard Staff Photographer

While professional sports has embraced data metrics for years to better evaluate player and team performance and predict wins and losses, conference organizers Ryan Enos, an associate professor in Harvard’s government department, and Kirk Goldsberry, a visiting scholar at Harvard’s Center for Geographic Analysis, say political analytics doesn’t have anything like the well-known MIT Sloan Sports Analytics Conference.

So they brought together many of the top minds in data analytics, high-level political professionals from both parties, and political journalists such as Silver, opinion research consultant Kristen Soltis Anderson, retired U.S. Rep. Barney Frank, MSNBC reporter Steve Kornacki, and CBS News director of elections Anthony Salvanto, to evaluate the analytics field’s strengths and weaknesses, assess the nomination races on both sides, and, naturally, talk about how to use numbers to determine who wins and loses.

“We can read anything online now, but it’s about getting people in the same room where they can have a conversation where we think that the field can be pushed forward,” said Enos. They plan to host the conference again next year.

“One thing that amazes me is that you would think after all the lessons of Obama in ’08 and Obama in 2012, where Republicans supposedly learned the importance of having a turnout operation … these campaigns don’t really have turnout operations,” said Nate Cohn, who writes The New York Times column The Upshot. “[Ted] Cruz has one that’s maybe one-quarter as good as a Democratic turnout operation, whereas everyone else has nothing, [just] a few kids working in an office somewhere.”

Perhaps surprisingly, betting markets are one area where predictions on political winners and losers remain solid, said David Rothschild, an economist for Microsoft Research who studies how users engage with online data and polling. Soon the field of polling will be entirely Internet-based and done on mobile devices, he said. “We’re not that far away from the point of the idea of a telephone poll is going to be ridiculous,” he said. That evolution will raise new methodological challenges for the industry, since the data will include more variables and so will demand more skillful analysis.

Despite the abundance of data from which analysts today can draw, like voter files, polling results, voter lists, and consumer data, such information can be both expensive and hard to get at because companies like Google and Twitter keep proprietary data such as searches close to the vest, Cohn said.

“I’m not sure that there’s a way for all this new data to end up integrated into a coherent framework for thinking about how elections might evolve from where they are now.”