March 12, 2007
2:00 PM, 7220 Wean Hall
Srinath Sridhar
Thesis Proposal
Title: Algorithms for Analyzing Intraspecific Sequence Variation
Abstract:
Analysis of human genetic variation has gained significant momentum due to the success of many large-scale sequencing projects. Within the last five years, about 4 million single nucleotide polymorphisms (SNPs) have been genotyped over hundreds of individuals belonging to several ethnic groups. Recently, researchers have made significant progress in detecting and cataloging copy number polymorphisms (CNPs) as well. These large-scale efforts have now made it possible to analyze genetic variation within a single species, mainly humans, at very fine-scales. In our work, we address a fundamental question: how can we explain and understand these genetic differences that are present within a single species?
In this work, we primarily focus on two methods to analyze SNP variation data within humans. We develop maximum parsimony phylogenetic tree reconstruction algorithms that are specifically catered to work on SNP data. Such a phylogeny should cluster closely related individuals (perhaps an ethnic group) together. Therefore, these techniques are widely used to answer questions on human migration. A second method to understand genetic variation is to directly cluster the SNP data based on the property that individuals within a cluster would share similar genotypes (and phenotypes).
Mathematically, we work with two variants of the phylogeny reconstruction problem, both of which are NP-complete. The first variant is equivalent to finding a Steiner minimum tree on a hypercube and the second is a generalization of this problem. We solve the two variants in polynomial time when the size of the phylogeny (Steiner minimum tree) is `small’. We solve the clustering problem by finding the max-cut of a graph, a well-known NP-complete problem. We can show that if the graph is generated with the clusters `planted’, then our algorithm returns the max-cut in polynomial time.
Thesis Committee:
Guy E. Blelloch, Co-Chair
Russell Schwartz, Co-Chair
R. Ravi
Eran Halperin, International Computer Science Institute, Berkeley