Decoding DNA’s Secrets With Advanced Computing

07-27-2022

Each teaspoon of ocean water contains millions of microscopic organisms and enough genetic data to fill a hard drive. This astounding abundance of information represents incredible potential – and a monumental challenge. It can provide profound insights to microbes that make life on Earth possible, but only if we can make sense of it.

An emerging field, called bioinformatics, harnesses computational tools to understand organisms’ ecological roles and potential. Fusing genetics, mathematics, and programming, researchers use advanced computer algorithms to unlock otherwise hidden information from genes to find answers about life, its origins, and its possibilities.

“The techniques we use are not specific to oceanography; they are applied to all kinds of life,” said Senior Research Scientist Ramunas Stepanauskas. “We are experts in the genomics of single-celled organisms in the ocean, but there's no environment I can think of on Earth from which we haven't received samples, analyzed them, and produced data.”

Microbes represent the vast majority of marine life. They form the base of the food web and carry out global processes that sustain all life on the planet. Stepanauskas directs Bigelow Laboratory’s Single Cell Genomics Center, where researchers have been developing and applying bioinformatics methods since 2009.

Scientists in the Single Cell Genomics Center translate DNA from individual microbes into sequences of letters that represent the genes in an organism. These can be millions of characters long and contain the cellular instructions for an organism’s traits. They also store much information about a microbe’s evolutionary past and its potential.

“At its core, bioinformatics refers to extracting information from DNA,” said Research Scientist Julia Brown. “We disentangle this data to answer high-level questions that help us understand organisms and the roles they play in critical ocean processes.”

Bioinformatics enables the search for patterns in a sea of information, and Bigelow Laboratory researchers routinely examine billions of genetic sequences simultaneously to find them. This requires the use of networked servers with the processing power of hundreds of typical computers, such as Bigelow Laboratory’s high performance computer cluster – affectionately referred to as Charlie.

“There's no way that humans alone could sift through the amount of data generated these days,” said Postdoctoral Scientist Robin Sleith. “What takes 10 hours to generate on a modern sequencer would take lifetimes to do manually.”

Sleith works with Senior Research Scientist Pete Countway and in partnership with researchers across the state for the Maine eDNA project. He and his colleagues use bioinformatics to study cells from the environment to understand pressing issues, such as harmful algal blooms. He looks at the genetic information in water samples to understand ecological complexity, from the organisms themselves to their relationships with each other and the way they are changing alongside the environment.

“With the amount of data we have now and the complexity of the questions we are asking, the only option we have is to rely on these algorithms and computer processes to make that tractable,” he said.

Bioinformatics has grown out of that challenge, one that Stepanauskas, Brown, and their Single Cell Genomics Center colleagues have been working with for a long time. They provide a host of genomics services to scientists at Bigelow Laboratory and around the world, most of which include producing massive amounts of genomic data from individual cells without the need to cultivate them in the lab.

“We have gained a lot of experience applying existing bioinformatics tools and developing our own for use with single-celled organisms,” Stepanauskas said. “We were the first center for microbial single cell genomics in the world, and we have unique resources and a unique perspective around bioinformatics.”

This spring, he and Brown leveraged that expertise to lead Bigelow Laboratory’s first bioinformatics course. The course focused on single cell genomics and hands-on training in computational techniques to study microbial ecology and evolution. Eighteen students from undergraduates to faculty members traveled from four continents to Bigelow Laboratory to receive the specialized training.

“The tools we taught will help with any bioinformatic analysis, whether it has to do with single-celled organisms or not,” Brown said. “We want to get these tools into as many hands as possible and empower people to do these analyses themselves, because getting started can be very intimidating.”

Bioinformatic techniques are increasingly becoming an integral part of microbiology, almost a prerequisite. The opportunities that the field provides are vast and still largely unexplored. As Bigelow Laboratory researchers move their research forward, they also hope to continue training scientists to better understand the approaches and make their own discoveries.

“All of microbiology is moving toward more complicated genomic data,” Brown said. “It's becoming increasingly necessary to have a background in bioinformatics in order to move science forward. We want to help build a strong network of researchers who are using these techniques to improve science overall.”