GW Computational Biology Institute: Making Sense of Big Data

April 14, 2014

From left, Max Alekseyev, Keith Crandall, Marcos Perez-Losada and Jeremy Goecks of the Computational Biology Institute

It has been a mere decade since the Human Genome Project was successfully completed, but today, DNA sequencing and genetic data sharing have the potential to revolutionize health care.

It is drastically easier and cheaper to sequence genomes than it was even a few years ago, and new technologies have opened up the possibility of personal genome sequencing as a tool for diagnosing and treating fatal diseases. Unfortunately, each human genome contains an estimated 30,000 genes with 3.2 billion base pairs, amounting to 100 gigabytes of data with decent coverage, or roughly 20 percent of a laptop’s storage space.

So how will researchers and physicians store and manage that enormous amount of data, and more importantly, will they be able to understand it?

A team of researchers at the George Washington University’s Computational Biology Institute (CBI) is poised to answer this and other challenging questions about what is called “big data,” a buzzword used to describe a collection of data sets too large to be handled by regular software.