New Programming Language Uncovers Hidden Environmental Pollutants

Generic programming interface

A powerful new computer language developed at the University of California, Riverside is revolutionizing the way scientists detect hidden environmental pollutants—without requiring them to write a single line of code.

Mass Query Language, or MassQL, functions as a search engine for mass spectrometry data, making it far easier and faster to scan massive datasets for chemical signatures. By putting this capability into the hands of biologists and chemists, many of whom don’t have programming experience, the tool is already accelerating discovery and unveiling pollutants previously overlooked.

"Mass spectrometry data is like a chemical fingerprint," explains Mingxun Wang, the UC Riverside computer scientist who created MassQL. "It tells us what molecules are present in a sample—be it air, water, or even blood—and in what quantities."

Making Advanced Data Analysis Accessible

Traditionally, working with mass spectrometry data required specialized coding skills, something most biologists and chemists lack. Wang's goal was to break that barrier. "We wanted to give researchers the ability to mine their own data exactly how they want to, without spending months or years learning to code," he said.

MassQL simplifies the process by offering a user-friendly query language that enables scientists to search for specific molecular patterns in large datasets. As Wang puts it, "It acts like a filter. You tell it what to look for, and it sifts through everything to find matching patterns."

Finding Hidden Chemicals in Public Waterways

To demonstrate the language's power, Nina Zhao, a postdoctoral researcher at UC San Diego, used MassQL to scan every publicly available mass spectrometry dataset related to global water samples. Her target? Organophosphate esters, a class of chemicals commonly used in flame retardants.

"There are literally a billion measurements in this kind of data," said Wang. "You simply can't go through it manually. MassQL makes that possible."

Zhao's search revealed not only known organophosphate compounds but also previously undocumented versions, including breakdown products that form when these chemicals degrade over time. Some of these compounds, she warned, are known to disrupt hormones and cause cardiovascular issues in both humans and wildlife.

"These weren't just theoretical risks," Zhao said. "These compounds are out there, and now we know where."

A Tool for a Broad Range of Discoveries

Beyond pollutants, MassQL has already proven useful across dozens of research areas. In the recently published paper in Nature Methods, Wang and his colleagues describe over 30 different use cases for the language.

Examples include:
Identifying fatty acids as biological markers of alcohol poisoning
Tracking bacterial signaling molecules to understand microbial communication
Searching for new antibiotics to help address the global resistance crisis
Detecting "forever chemicals" on playground surfaces and other public areas
"In the past, I would get frequent requests to build one-off software tools for each of these applications," Wang said. "I figured there had to be a better way—one language that could handle it all."

That vision is now a reality.

Building a Common Language Across Disciplines

Creating MassQL wasn't just about writing code. One major challenge was ensuring that both chemists and computer scientists could understand and use the tool effectively.

To address this, Wang's team brought together 70 scientists from across disciplines to standardize the language. These experts helped define the terms and data structures the software would use, ensuring the tool would be broadly useful and intuitive.

"That collaboration was critical," Wang said. "MassQL had to bridge two worlds—chemistry and computing."

Enabling a New Era of Open Discovery

MassQL is freely available, and because it can tap into existing, publicly shared datasets, scientists can begin new investigations without even collecting samples themselves. That's a huge leap forward in terms of accessibility and efficiency.

"The language allows me to track everything that's ever been detected in all data on air, soil, water, and even in the human body," Zhao said. "If it exists, we can search for it."

Wang is hopeful that MassQL will continue to unlock scientific breakthroughs across environmental science, medicine, and beyond.
"I'm excited to hear about the discoveries that could come from this," he said. "We've only just scratched the surface."