At the Vanguard of a Revolution in Computational Genetics
If Itsik Pe’er had a time machine, he would probably beam himself into the 11th century and collect DNA samples from the small community of Ashkenazi Jews living in Eastern Europe. That would be the easiest way to identify the genetic mutations that predispose their millions of descendants to maladies like Tay-Sachs, Crohn’s and Parkinson’s disease.
But until time travel is possible, Pe’er, an associate professor of computer science at Columbia Engineering, will rely on computational genetics.
Using mathematics and computer analytics, Pe’er is identifying the genetic makeup of the founding Ashkenazi Jews by analyzing the full DNA sequences of hundreds of their descendants in the New York City area.
“All the Ashkenazis living today are essentially mixes of a small number of individuals who lived hundreds of years ago,” explains Pe’er. “Because the population has been relatively isolated, the gene pool is relatively small, which makes it possible to catalog an entire population.”
Doing so will allow Pe’er to compare these genomes to reference sets of non-Ashkenazi DNA—collected through the Human Genome Project and other initiatives—and zero in on Ashkenazi-specific genetic mutations associated with different diseases.
Pe’er, who arrived at Columbia in 2006, is taking advantage of a flood of new data unleashed by technological advances that allow scientists to read the entire 3-billion-nucleotide sequence of individual genomes at high speed and low cost.
By examining similarities in DNA segments shared by large numbers of related individuals, his lab developed statistical models that allow him to make generalizations about entire populations. The mix of genes that every child inherits from each parent travels in long sequences of code that remain together and are remarkably consistent from one generation to the next.
The size of the gene chunks gets smaller with each generation, but they diminish at a consistent and predictable rate. As a result, Pe’er can use his models to determine distant relationships shared by two individuals by measuring the length of their common DNA segments. First cousins, for example, are likely to share 12.5 percent of a grandparent’s genome, while second cousins have a mere three percent in common.
Pe’er published a groundbreaking study in the November issue of the "American Journal of Human Genetics" analyzing the migratory and population patterns of Ashkenazi Jews and the Masai people of Kenya. He showed that the Masai, who are semi-nomadic and marry across village boundaries, descend from a wide genetic ancestry fed by migrations and intermingling between a large number of tribesmen.
The Ashkenazi, by contrast, descend from a small number of founders—perhaps only hundreds of individuals in late medieval times—and have remained largely genetically isolated since, even as their descendants now number several million.
Pe’er earned his Ph.D. in computer science at Tel Aviv University in 2002 and then worked at Israel’s Weizmann Institute of Science and the Broad Institute of Harvard and MIT in Cambridge, Mass.
Genetics offered an opportunity to use computer science to affect the real world, “as opposed to more theoretical computer science,” he said. He graduated just as a genetics revolution had begun.
Only a dozen years ago it was possible to look at a limited number, perhaps 20,000, of genetic markers called microsatellites. Two individuals usually needed to have at least one million base pairs of DNA in common for the shared segment to be identifiable using microsatellites. Around 2005, scientists could narrow down similarities between two individuals to segments of 10,000 base pairs in size.
Over the last year it has become possible to sequence an individual’s entire genome for about $3,000, which makes it affordable for researchers like Pe’er to use the technique on a large scale.
In the case of the Ashkenazi, Pe’er and colleagues have sequenced genomes from 140 subjects with four Ashkenazi grandparents and are aiming for a total of 500 to provide a “good representation of the gene pool of the entire set of founders of the population,” he said.
The project provides a peek at a new kind of genetic analysis that many believe will revolutionize how doctors treat patients. In 10 years, Pe’er predicts, sequencing an entire genome will be so inexpensive that everyone could know his or her genetic makeup.
That information will help doctors predict which medications will most likely work effectively for a particular individual by examining how others with similar genetic characteristics have responded in the past.
“Ten years from now, personalized genetics is going to be ubiquitous,” Pe’er said. “Should they choose to, everybody’s genome is going to be part of records that physicians use when prescribing drugs, so that treatments will be optimally targeted to your genome in terms of dosage, desired responses and minimal adverse side effects.”