Longest Record of Continuous Carbon Flux Data is Now Publicly Available
Berkeley Lab tools foster collaboration amongst a global coalition of ecosystem researchers
September 27, 2016
Linda Vu, [email protected], 510.495.2402
Around the world—from tundra to tropical forests, and a variety of ecosystems in between—environmental researchers have set up micrometeorological towers to monitor carbon, water, and energy fluxes, which are measurements of how carbon dioxide (CO2), water vapor and energy (heat) circulate between the soil, plants and atmosphere. Most of these sites have been continuously collecting data, some for nearly 25 years, monitoring ecosystem-level changes through periods of extreme droughts and rising global temperatures. Each of these sites contributes to a regional network—i.e. the European Network (Euroflux) or the Americas Network (AmeriFlux)—and the regional networks together comprise a global network called FLUXNET.
Recognizing that a plethora of scientific insights could be gleaned from this information, over 450 sites worldwide are sharing their observation data with the FLUXNET database. The project’s most recent data release—FLUXNET2015—includes some of the longest continuous records of ecosystem data ever taken. The information has undergone extensive quality checks and controls (QA/QC) and is now publicly available online.
Computer scientists at the Department of Energy’s (DOE’s) Lawrence Berkeley National Laboratory (Berkeley Lab) contributed to the development of the FLUXNET database and website, as well as the software tools that automatically perform QA/QC and fill gaps in field observations. They also helped build tools that allow researchers to easily upload, download and share datasets, as well as track how and where each site’s data will be used. Much of this work was done in collaboration with colleagues at the Max Planck Institute of Biogeochemistry and the Universities of California-Berkeley (UC Berkeley), Virginia (UVA) and Tuscia, Italy.
The AmeriFlux Management Project— which is funded by the DOE and led by Berkeley Lab—the European Ecosystems Fluxes Database and FLUXNET project worked with several regional networks to process and harmonize all of the information in the FLUXNET2015 release.
According to Dennis Baldocchi, UC Berkeley Professor and FLUXNET Principal Investigator, this data is allowing researchers to ask questions about long-term trends in climate and ecosystem health that would have previously been impossible to investigate. The data could also be used to help a variety of people, from meteorologists to farmers, make better-informed decisions.
“We know that the concentration of CO2 changes in an ecosystem over time, and now we can look at how these changes affect the photosynthesis or water usage of an entire forest or desert,” he says. “We can also look at the affects of extreme weather events—like hot and cold spells—on an ecosystem."
FLUXNET: A Collaboration Decades in the Making
According to Baldocchi, an international flux collaboration began in 1995 when a handful of scientists who were just starting to collect year-round observations of the CO2 exchange between the ground and atmosphere met in La Thuile, Italy to discuss the best methods for collecting and recording data. Shortly after that meeting, the DOE created the AmeriFlux program and began funding a handful of researchers to measure CO2 exchange in a number of ecosystems around North America. Meanwhile, European counterparts were launching similar studies of ecosystems across Europe, and NASA launched the Aqua Satellite to understand Earth’s water-cycle and Terra Satellite to explore connections between Earth’s atmosphere, land, snow and ice, ocean and energy balance.
Concurrently, NASA also funded the FLUXNET collaboration to blend data from the regional networks to validate their satellite data. In 1999, the official FLUXNET group met again at the Marconi Conference Center in Marshall, California to work on harmonizing their datasets. This included coming up with common names, units and time-steps in their field data descriptions, which was crucial for multi-site analysis. The researchers also developed a process for filling gaps in the field data.
“When you are observing in nature it’s inevitable that you will have gaps in your data because a raindrop or bug messes with a sensor, or a sensor may malfunction. We use statistical models to fill those gaps of information,” says Baldocchi.
Once the initial NASA funding expired in the late 1990s, international researchers who saw a benefit in keeping the FLUXNET collaboration going sustained it through a patchwork of funds from agencies in Canada, Europe and the U.S., as well as “in-kind” support from research groups, universities and foundations. In fact, this is how Berkeley Lab’s involvement in building software tools for FLUXNET began.
“It’s a big deal that we are able to get hundreds of scientists to come together, trust each other, share and exchange data and work toward a common goal,” says Baldocchi. “In the early days, this was a small group and everyone knew each other so we were able to meet in a place like Italy and form a personal relationship. As the community grows, software tools that track where and how each site’s data is being used are invaluable to form trust.”
Berkeley Lab Joins the FLUXNET Software Collaboration
Berkeley Lab computer scientists officially joined the FLUXNET collaboration in 2006, when CRD’s Deb Agarwal offered to use funds from her Microsoft Research Grant to help the collaboration build a website, queryable database and a workflow for requesting and sharing site data. With limited funds, the initial versions of the FLUXNET website and database were extremely utilitarian.
“In the beginning, many researchers were nervous to share their data publicly because they risked losing credit if someone used their data without acknowledging them, or published a paper on their data before they could,” says Agarwal. “So we focused on tools that would give the site managers an incentive to comfortably share their data. In the beginning, we focused on enabling data policy compliant use of the data.”
“By 2006 we were getting to the point of big data, and the computer scientists essentially provided the site managers with some guidance and ‘adult supervision’ by convincing us that the FLUXNET website should serve as our central point of data exchange,” says Baldocchi. “Before this, a lot of scientists were just emailing data files back and forth to each other. There was no version control or vetting of the files, so we would have no idea what version was the most recent, what had been analyzed, or how it was different from the original dataset. We got all of this with the new website and database.”
Once the Microsoft grant ran out, Agarwal and the Berkeley Lab team were able to maintain the FLUXNET site with limited funds from Baldocchi and DOE. At the same time, European collaborators who did have funding worked on updating the FLUXNET data standards and building tools for quantifying uncertainty in the field data, which will be especially useful for climate models. Eventually, as Berkeley Lab received funding to build tools for the AmeriFlux project, Agarwal and Gilberto Pastorello were able to work with the European collaborators on metadata standards, as well as building pipelines for QA/QC, gap filling and uncertainty quantification. Both groups leveraged this work to update tools for the global FLUXNET collaboration.
“The data in the recent release is dramatically better than previous releases,” says Agarwal. “Our team re-standardized the data completely and our European collaborators completely revamped the uncertainty quantification. We created a high bar for QA/QC and ran every single site year of data through this pipeline, we manually inspected the things that weren’t automated and sent data that didn’t pass muster back to the sites to fix. With these new standards, we have established a baseline of quality that guarantees the sites are comparable. ”
According to Gilberto Pastorello, a Berkeley Lab Computer Scientist and FLUXNET contributor, the uncertainty estimates for FLUXNET data are especially valuable for climate modelers who want to validate their simulations with FLUXNET data.
“We’ve adopted a model where we take the most recent and well accepted developments in the flux community and incorporate them into our pipeline, which can be quite challenging because we are having to rework some of the codes to ensure that they will run in a production environment. It’s a lot of work to get the code ready to run in a very uncertain set of conditions and not misbehave,” says Pastorello. “Ultimately we want to get this pipeline to a point where we can run it consistently here at Berkeley Lab and in Europe, as well as share it with the community so that anyone who wants to run it can use it.”
Pastorello is also currently working with European collaborators to implement a partitioning tool into the pipeline, which will break down CO2 flux data into respiration and photosynthesis.
“It was incredibly important to me to keep this project going. This data isn’t something that you can re-create, you can’t start collecting this information from scratch again,” says Baldocchi. “Today’s FLUXNET has exceeded my hopes and aspirations, our datasets contain some of the oldest flux measurements in Europe and North America, and now we have contributors from Asia, Africa, Australia, New Zealand and the South Pacific.”
“People work very hard to collect field data, so their data is very valuable to them. To share it openly, the way they are doing, seems counterintuitive to their efforts, but after seeing how their data is being used to do regional and global studies more and more of these researchers want to share their data and are asking us why their data isn’t in the latest FLUXNET release yet,” says Pastorello. “It is extremely rewarding to see our tools help these individuals realize the advantages of FLUXNET.”
About Computing Sciences at Berkeley Lab
High performance computing plays a critical role in scientific discovery, and researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab’s Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.
Founded in 1931 on the belief that the biggest scientific challenges are best addressed by teams, Lawrence Berkeley National Laboratory and its scientists have been recognized with 13 Nobel Prizes. Today, Berkeley Lab researchers develop sustainable energy and environmental solutions, create useful new materials, advance the frontiers of computing, and probe the mysteries of life, matter, and the universe. Scientists from around the world rely on the Lab’s facilities for their own discovery science. Berkeley Lab is a multiprogram national laboratory, managed by the University of California for the U.S. Department of Energy’s Office of Science.
DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit energy.gov/science.