Julia Frugoli’s NSF-funded study of legumes aims at understanding, at the cellular and molecular levels, how and why peas and beans are able to do something few other plants on the planet can do: They extract nitrogen from the air, versus the soil, and use it to fuel their growth.
High-performance computing makes it possible for Frugoli to measure gene expression — the information stored in DNA that dictates the production of proteins, in this case, bacteria.
All crops require nitrogen, a key component of chlorophyll, which is required for photosynthesis. Legumes pull nitrogen from the air (about 78 percent of air is nitrogen) and send it to nodules within their roots, where bacteria convert it into a form they can use to grow.
It is common practice to apply nitrogen fertilizer to farmland, but roughly half of this nitrogen doesn’t stick around. Instead of going to the plants, it flows into the surrounding water table, polluting the watershed with nitrates that render everything from local wells in South Carolina to a vast region in the northern Gulf of Mexico “dead zones,” dangerous to humans and suffocating to aquatic life. Scientists all over the world are searching for ways to combat nitrogen fertilizer runoff before it pollutes the planet’s watershed to irreversible levels.
Frugoli and her team want to figure out how other plants could “nodulate” — create the nodules in their roots capable of converting nitrogen from the air versus the ground.
The computational research required to break down all the genetic information that may solve this problem is what the Palmetto Cluster was built to do.
“A desktop computer can’t even process a piece of our work in a week,” Frugoli says. “The fact that it can take a few days for each experiment on one of the fastest computers there is gives you an idea of how critical the Palmetto Cluster is as a resource.”
The ability to measure gene expression is exponentially greater today than it was two decades ago when Frugoli got her start.
“We’ve gone from one gene at a time to hundreds to thousands and now hundreds of thousands in a single experiment,” she says. “That means more machines and more data. And more data requires new approaches.”
When she started her work at Clemson in 2000, she might have looked at what level a gene was expressed under one set of conditions. “Now we process millions of bits of data to look at all of the genes at once under multiple sets of conditions, and we get actual numbers,” Frugoli says.
That also requires someone to develop the computer algorithms to make sense of the bits of data coming off the sequencing machines and find ways to compare millions of bits of data across experiments.