It took a novel tack to discover an obesity gene

9 min read

Biostatistician sailed into new waters

The racing sailboat
was small, and Christoph
wanted to be sure he didn’t capsize and
plunge into the Charles River again, as he’d
done half a dozen times that spring. Using his blue sailing shoes for
leverage, he carefully arranged himself on the craft’s cramped
bench and reached for the tiller. The day was mild; the wind barely ruffled
the dank water lapping the edges of the ramp that led to the Harvard
Sailing Center, in Cambridge, where Lange, assistant professor of biostatistics
at the Harvard School of Public Health, had been a member since 2000.
“Perfect for thinking,” he noted to himself.

Lange’s mind drifted to
a knotty research problem he and scientists
worldwide had been tangling with for years, one that severely hampered
their ability to identify genes associated with complex diseases such
as asthma, obesity, and cancer. Conditions like these arise from the
interplay of DNA and a host of environmental factors, from air pollutants
to diet.

Linking genes and disease conditions wouldn’t
seem to have much in common with games of chance. But from a statistical
standpoint, the
two are—to use a Mendelian allusion–like peas in a pod.

How so? Consider
this scenario: At a poker tournament, there are 500,000 players,
and 500,000 decks of cards. Each of the players draws five cards
from his or her own deck. Quite a few of the players get a royal flush,
just by chance. Now, suppose that 20 of those decks are stacked. Again
each of the 500,000 players draws five cards from his or her own deck.
Again, quite a few get a royal flush. But now a mystery has arisen:
Which of the royal flushes are due to chance (the ordinary decks),
and which
ones are due to stacked decks?

Scientists looking for troublesome genes
play a similar “game”: Using statistical tests, they compute the relationship
of a person’s
DNA–all 3 billion subunits of it–to 500,000 known variations
in the human genome that signal troublesome genes to see if any of
the subunits match up with the variations. Typically, each of the statistical
tests will have a chance probability of 5 percent. That means that,
the 500,000 variations,
there will be 25,000 chance matches. The mystery arises again: Which
of those matches are due just to chance, and which are “real”–that
is, due to a stacked deck (a troublesome gene)?

How to sift through those
25,000 “hits” to separate the real from the chance ones has long been
the Achilles’ heel of genetic
studies. It is known in scientific circles as the “multiple-comparison

Christoph Lange was mulling over this
conundrum on that spring day in 2003 as the faint wind damped down
to a whimper, leaving him
mid-river. And then it hit–his “Eureka!” moment: a
way to cull the variations so only the most promising ones remained.
It would be like thinning out a haystack, so the needles would glint
in the sun.

Lange nearly leaped out of the boat. “I
was curious to see if the idea would work in practice, but couldn’t
get to shore for an
because there was no wind,” he says. “When I finally made
it, I raced to my apartment, opened my laptop, and tried it. It worked
fantastically. It was scary: It made sense that such an idea would
work, but it seemed too good to be true.”

In short order, Lange and HSPH colleagues
Professor of Biostatistics Nan
and then-postdoctoral fellow Kristal
Van Steen developed a
statistical methodology that fundamentally changes the way scientists
approach the multiple-comparison problem. And in a paper published
in Science, they’ve identified a single gene
associated with obesity–and proved that their strategy is reliable.

The gene hunt

Uncovering genes that contribute to complex diseases could have life-altering
consequences. People found to have a “genetic predisposition” to
such diseases might be motivated to make smarter choices, reaching
for a filet of fish rather than a burger, say, if obesity were written
their DNA. In the future, physicians may be able to screen individuals
for susceptibility to a condition just by analyzing their genetic
makeup, which could open the door to modifying lifestyles from
an early age.

Alas, the relationship between genes and complex diseases can be as
murky as the Charles River. It’s not that a troublesome gene causes a
complex disease; rather, in concert with other troublesome genes, it
contributes to the possibility that, with the right environmental triggers,
a person harboring it could develop a condition. But trying to pinpoint
those genes along the full length of the human genome would be like
to locate, say, Chicago, Reno, or Minneapolis on a cross-country drive
from Boston to San Francisco with neither a map nor road signs, just
miles of meandering highways. So scientists identified a series of
landmarks–sites of common tiny variations in the genome called SNPs
or single nucleotide polymorphisms–that mark the troublesome genes.
All told, scientists have identified 8 million SNPs. Today, they can
track a whopping 500,000 of them dotting the genome’s landscape–beacons
of light shining on possible trouble spots.

The most fruitful genetic
studies rely on family members as subjects rather than the population
at large. That’s because families share
much of their DNA, so the number of genetic variables is reduced from
the get-go. In these studies, researchers compare all 3 billion subunits
of close relatives (say, parents and children) to each other, to see
which of the 500,000 SNPs they share. Then, they cross-match those
shared SNPs to the appearance of a trait–obesity, for instance–in
the children. The degree of relatedness indicates how likely it is
that the trait was inherited.

Here’s where chance muddies the water.
A scientist comparing the 3 billion subunits of a family member’s DNA
to the 500,000 SNPs
will invariably get a match, simply by chance. Chance matches turn
gene searches into hit-and-miss propositions. While various studies
have turned
up putative genes for obesity, for example, their findings can’t
be replicated.

The insight that smacked Christoph Lange
in the head out on the Charles takes chance out of the picture.

Lange wondered, if researchers performed some statistical gymnastics
before testing suspected genes against a trait, in order to whittle
those 500,000 SNPs down to the 10 most likely to highlight the troublesome
genes? By so doing, they could reduce the multiple-comparison problem
to very manageable proportions.

Here’s how Lange proposed to do that:
First, researchers would pretend that the genetic makeup of the children
was missing, and use
genetic information only from the parents to surmise–using classic
Mendel’s laws–what it might be. They could then calculate
the likelihood of each of the 500,000 SNPs’ being passed on. “We
wanted to estimate the heritability of each SNP,” says Laird. They
would use the degree of heritability to calculate how much influence
a gene associated with each SNP would have on a trait. Finally, they
would rank the SNPs in order of influence.

Selecting the 10 SNPs with
the biggest influence, they could use just those to actually test against
the trait in the children. The software
program PBAT–which was developed at HSPH–was used to crunch
those numbers.

“Nan and Christoph
are doing fundamental work in genetic epidemiology,” says James Ware,
HSPH Dean for Academic Affairs and Frederick Mosteller Professor
of Biostatistics. “By perfecting a method for solving the multiple-comparison
problem, they have overcome a major statistical obstacle in the interpretation
of whole genome scans.”

Proof of concept

A paper in Science – powered by Lange and Laird’s statistical
muscle and led by Boston University Medical School’s Alan Herbert
and Michael Christman – identified a single gene variant associated
with adult and childhood obesity. Its name is insulin-induced gene,
or INSIG2, and it was present in 10 percent of the population that
was tested.

Searching for genes common to obese people,
the researchers followed two generations of families enrolled in the
Framingham Heart

using data collected by the study’s originators. That information
included the subjects’ genetic makeup and traits–in particular, their
body mass index (BMI), a ratio of weight to height commonly used
obesity. Individuals with BMI greater than or equal to 25 kg/m2 are
considered overweight, and those with a BMI greater than or equal
to 30 kg/m2 are
considered obese.

Even before the Science paper, Laird and
Lange had applied their new method to two studies at the Channing Laboratory
Boston’s Brigham
and Women’s Hospital
, both of which sought–and found–genes
associated with chronic obstructive pulmonary disease. “Without
Nan and Christoph’s statistical methodology, we may not have identified
the associations,” says the Channing’s Edwin K. Silverman,
senior author on both papers. But those studies considered a few
hundred SNPs apiece. The obesity study looked at huge numbers–116,204,
to be precise.

“That was the proof of concept,” says
Lange. The pi`ece de résistance
was being able to replicate the results of the obesity study in four subsequent
trials, each with a very different population–more than 10,000 individuals
in samples of Western European ancestry, African Americans, and children. “There
are no other common obesity gene-variant associations that are reproducible,”
says Helen Lyon, of the Hirschhorn Laboratory at Children’s Hospital Boston,
who led two of the replication studies. Adds James Ware, “Their methodology
sets a new standard for documenting an association between genetic makeup and
a health outcome.”

The multiple-comparison problem has always been present in
familial genetic studies, but its ability to muck up the works multiplied exponentially
as the capabilities
of DNA-SNP matching technology grew. As recently as 2003, testing 10,000 SNPs–which
cover a small proportion of the genome–against human DNA was considered

Today, researchers can investigate 500,000 SNPs at once, sweeping the
entire genome like a Geiger counter sweeping soil. At their fingertips
are computer-readable
chips from Affymetrix, Inc., with 500,000 tiny test wells on their surface,
about 20 for each SNP. The researchers simply apply sample DNA to the chemically
wells and track the signals that go off, to see whether a SNP in the DNA
coincides with a SNP in a particular well.

Laird and Lange’s biostatistical breakthrough
may have come just in time. Chips that can analyze a million SNPs at once
are not far off.

The two HSPH biostatisticians are deeply
ensconced in unraveling the double helix further. They are working
to find genes
implicated in bipolar disorder,
disease, and other complex diseases that develop in the place where genes
and the environment meet. 

“What’s the mission of public health?” asks Laird rhetorically. “One
mandate is to understand the causes of human disease and disorders. To
me, our findings address the research piece of that mission.”