Mathematics of Information Technology and Complex Systems

Assembly and Analysis of 2-base Encoded Sequencing Data

Development of algorithms for Next Generation Sequencing (NGS) technologies is one of the most important applications of mathematical and computational methods in the field of biology. The ABI SOLiD NGS technology generates dibase-encoded sequence data, often known as color-space. Color-space presents an especially challenging subset of NGS data, as it does not represent the DNA sequence directly, and instead encoded in an ingenious manner. Under the Mprime grant Assembly and Analysis of 2-base Encoded Sequencing Data, we are working to attack some of the most difficult computational problems in the analysis of color-space datasets, including genome assembly and variation discovery. By taking advantage of sophisticated computational methods we are aiming to make effective use of di-base encoding to simplify some of the problems associated with NGS data analysis.