Plotting a Course in an Ocean of Big Data

03-18-2021

In its most basic form, science is the process of making observations, collecting data, and using it to answer questions about the world around us. To scientists, data is everything.

For decades ocean science has been hungry for data. Our oceans are so vast and complex that collecting information from them is a herculean task. However, with increasingly portable computers, an abundance of space satellites, and revolutionary ways to gather and store digital information, modern researchers now find themselves with more data on their plates than they can handle.

“Big data” is an increasingly common phrase that refers to the efforts to use this unprecedented influx of information. This comes with big challenges, but even bigger opportunities.

The volume and velocity of data are the most intuitive challenges. As technologies improve, enormous amounts of data are being generated at a constantly increasing rate. However, the most difficult challenge of big data is the variety – data from many different sources about many different things in many different formats that must be compared, combined, and processed.

But big data is a natural fit for global ocean science, and overcoming the challenges means unlocking knowledge and tools that are impossible by any other means.

“What's interesting in ocean science is that you're taking genomics data, satellite data, buoy data, and even some social data, and you're trying to blend it all together,” said Senior Research Scientists Nick Record. “I think the real challenge in ocean sciences is combining all those different types of big data together to answer a question about right whales, toxic algal blooms, or some other important issue.”

Record is one of the many Bigelow Laboratory scientists who use big data in their research, and he is currently leading an effort to develop an ocean forecasting center at Bigelow Laboratory. The new center will use big data and artificial intelligence algorithms to provide tools and insights too complex to develop through other methods. Researchers at Bigelow Laboratory have long been building expertise in these areas, and this center will create a hub focused on answering foundational research questions and creating solutions to support society.

One of his colleagues, Research Scientist Catherine Mitchell, works with ocean color and carbon cycling. She gets her data largely from satellites, autonomous underwater vehicles, and measurements taken from ships. As the technology develops, the use of autonomous systems grows and with it, the amount of data.

“We're all changing the way we do our science in terms of the types of instruments people are using when they're out in the field, or the fact that we have these autonomous platforms,” she said. “It behooves us to learn methods about how to interact with those data to build a bigger picture of a given system, whether that's integrating a buoy in the Gulf of Maine with daily satellite images, or something further afield.”

Despite the recent advancements and attention, big data research existed before the term made headlines. Senior Research Scientist Ramunas Stepanauskas is a microbiologist who specialises in single-celled organisms and genomics, a field that helped pioneer big data research.

“Genomes are like blueprints of the organism and by looking at what genes they have, we can tell what kind of ecological roles they likely fill in the ocean, which then can be verified by experimentation,” he said. “But examining and decoding that information is incredibly challenging, as the genetic information in a single drop of seawater can easily fill a standard computer hard drive.”

Three scientists in lab wearing protective equipment.  Scientist hands test tube to another scientist.

In 2003, researchers completed the Human Genome Project, sequencing a human genome that now serves as the basis for genetic studies. That project cost $5 billion and decades to complete. With the technology and knowledge available today, scientists could complete the same task in a few days for thousands of dollars. The relative speed and low cost of the process makes the approaches for handling the data more important than ever.

Stepanauskas’ team is creating a Global Ocean Reference Genomes database. Their work will map the genomes of tens of thousands of microbes to their genetic potential, in a similar way to what researchers previously did for the human genome. The scale of this work is immense, but Stepanauskas thinks big data is more about a mindset than a size.

“I think it's not the scale of data, it's the approach,” he said. “It is the way you look at the value of data and that you don't accumulate data to only answer predefined questions. Instead, you design your study in a way that the accumulated data, with the help of advanced computational tools such as artificial intelligence, can serve as a resource to answer a multitude of questions you might not even know yet.”

This approach is an important part of the future plan for big data at Bigelow Laboratory. As the institute works to secure the final funding that is needed to launch the ocean forecasting center, the team behind it is already amassing ideas for new sources of data and new ways of combining them to develop new real-time ocean forecasting tools. In the future, these tools may help shellfish farmers protect their harvest from toxic algae and ship captains steer clear of endangered North Atlantic right whales.

The nascent center’s team is also beginning the work needed to foster an interdisciplinary, collaborative community that can help turn big data into needed solutions. Already, they have organized an international data science workshop with participants from over 20 countries on six continents. Looking ahead, Record said that the team will prioritize open access to their methods and data, and inclusivity in their teaching and science practices.

“This is a space where we have a lot of expertise, but we want to get as many people involved with us as possible,” Record said. “There’s incredible promise in this research, but it will take a lot of complex work and diverse perspectives to realize its potential to provide much needed knowledge and equitable solutions.”