CRD’s Deb Agarwal Named to Committee to Help Shape California State Water Data Structure
October 19, 2020
Deb Agarwal, head of the Data Science and Technology Department in the Computational Research Division, is one of 11 members named to the inaugural steering committee of the California Water Data Consortium (CWDC). The committee will be formally installed at an October 22 online meeting.
Agarwal has been developing tools to help improve the collection, organization, and analysis of water-related data since 2006, when she helped found the Berkeley Water Center as a collaboration between Berkeley Lab and UC Berkeley. Much of her work has been in support of data collection by AmeriFlux, a network of locally managed sites measuring ecosystem CO2, water, and energy fluxes in North, Central, and South America.
The CWDC describes itself as “a neutral space that facilitates collaboration and sustained engagement across public, private, and nonprofit sectors to improve the data lifecycle and increase access to high quality, comprehensive, and interoperable data available to inform water decision-making at every level of government.”
Agarwal heard that the organization was forming a steering committee to work with the CWDC and its partners to help implement AB-1755, California’s Open and Transparent Water Data Act that requires the Department of Water Resources and other state agencies “to create, operate, and maintain a statewide integrated water data platform; and to develop protocols for data sharing, documentation, quality control, public access, and promotion of open-source platforms and decision support tools related to water data.”
“When I was interviewed about joining the committee, I described the approach we have to work with users and learn what they want in their data repository rather than just following conventional wisdom,” Agarwal said. “We’ve found that when those who collect the data get to help decide how to report the data, they are more likely to commit to following through.”
Agarwal cites AmeriFlux as an example of the value of standardizing data tools. AmeriFlux researchers track carbon dioxide exchange between plants and soil on the ground with the planet’s atmosphere, on an hourly basis at sites representing a range of ecosystems, from the Arctic tundra to North American prairies and Amazonian rainforests.
When the Berkeley Lab team took over managing AmeriFlux, there were around 130 sites across North, Central, and South America, but only about half were active. Now there are more than 500 sites, of which 364 have contributed data in the last five years.
The AmeriFlux data server now includes semi-automated data format and quality checking; a database and schema to organize and archive related data gathered at the site. The team has also implemented data processing software to create a consistent data product.
Agarwal’s department has also worked with Charuleka Varadharajan, a scientist in the lab’s Earth and Environmental Sciences Area, and her team to develop a central data repository for the Earth and Environmental System Sciences program in the Biological and Environmental Research Office in DOE’s Office of Science. They laid the groundwork by visiting multiple large research groups in the program and asking what the scientists wanted from the repository, how they wanted to interact with the data, and what features they wanted.
“We took all of the work with users and created a user-centered repository where we build the features the users want,” she said. “As a result, we hope to become the repository of choice for many related projects.”
To help the state implement AB-1755, six separate state agencies will need to come together and decide how to consistently report data for a central repository.
“They will need to build the tools to make it advantageous for them to work together,” Agarwal said. “If they manage to do that, I honestly think we can move people forward and help them build a strong data system.”
Managing the state’s water resources has long been a battle as various interests seek to balance supply with demands from cities, the environment, agriculture, and industry. Much of the water originates far from the end-users and massive aqueducts have been built to try to quench the state’s thirst. In addition to state and federal players, many cities have their own water agencies and there are more than 500 special water districts in California. On top of all that, there are often complex questions over who owns the rights to what water.
“Once you start looking around, you realize how important water is to this state,” Agarwal said. “By not trying to be Herculean but by taking a pragmatic approach I think we can move people forward on making the water data available and accessible.”
About Computing Sciences at Berkeley Lab
High performance computing plays a critical role in scientific discovery. Researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab’s Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.