A-Z Index | Phone Book | Careers

Berkeley Lab Cybersecurity Specialist Highlights Data Sharing Benefits, Challenges at NAS Meeting

December 4, 2018

Contact: Kathy Kincade, kkincade@lbl.gov, +1 510 495 2124

NAS Building 2

The National Academy of Sciences building in Washington, D.C.

There are many reasons why data isn’t widely shared by organizations that collect it or between scientists who analyze it in search of new scientific insights. This topic, and ways to overcome those barriers in a world of data-driven science, was the subject of a recent meeting of the Committee on Science, Engineering, Medicine, and Public Policy (COSEMPUP), a joint unit of the National Academy of Sciences, National Academy of Engineering, and the National Academy of Medicine, which took place on November 8, 2018, in Washington, D.C.

Science focuses on building knowledge about the universe through a combination of observation and experimentation. While some science can be done using mathematics and simulations, eventually science needs to be tested and confirmed in the real world. This requires data. And at a time when scientists are asking increasingly important and fundamental questions about the universe, it also requires unique and expensive instruments that generate very large amounts of data that need to be shared.

The COSEMPUP meeting was far from the first to address this essential activity, but it brought an unusual union of scientific experts to address needs, barriers, and incentives to make progress.

Screen Shot 2018 12 04 at 10.17.12 AM

Sean Peisert, Computational Research Division

The core topic of the meeting was data sharing in biomedical science, so presentations focused on current challenges in biomedical data sharing and identifying potential solutions from other domains that could be applied, noted Sean Peisert, a leading cybersecurity researcher at Lawrence Berkeley National Lab and an invited speaker at the meeting. He discussed how the strategic use and combination of computer security and privacy-preserving techniques can be used to overcome certain data-sharing barriers and serve as a means to facilitate, enhance, and create incentives for increased data sharing in the sciences - thereby accelerating data-driven scientific discovery.

In particular, Peisert described how using varying combinations of current and future hardware and software techniques could help meet or exceed standards for data subject to government regulations, such as HIPAA or FISMA; address concerns regarding unregulated scientific data still containing individually private information; and provide solutions for proprietary data that might contain trade secrets.

Five Solutions

At present, there are typically five solutions for data sharing that are used independently or in combinations, Peisert noted:

  1. We often don’t share data at all, which is a huge inhibitor to scientific research.
  2. We require people using data to come to the data, rather than being able to work with the data in their own computing environment, which often doesn’t scale in the cases in which data requires a long time for analysis and presence of many scientists at a particular remote facility for long periods is onerous.
  3. We put legal protections in place.  
  4. We put elaborate security protections in place, such as “air gaps” — solutions that disconnect computing systems from any computer networks so that data on those systems cannot accidentally leak, or be maliciously stolen.
  5. We transform data, e.g., by redacting or “fuzzing” it in a way that it no longer presents as significant a risk if it is put in the wrong hands.

“But all these solutions have downsides, for various reasons, ranging from data not being shared at all to data being very difficult to use, to data losing significant research utility,” he said.

Fortunately, advances in computing technology and techniques have provided numerous advances that can reduce barriers to developing trustworthy data sharing solutions, including:

  • Hardware-trusted execution environments, something that all of the major chip-makers have now deployed
  • Software solutions for computing over encrypted data, which are no longer a trillion times too slow, as they were just a few years ago
  • Differential privacy, a statistical technique for provably providing privacy guarantees while explicitly balancing research utility
  • “Smart contract” components in blockchain technologies.

These techniques also increase trust in security and privacy and create positive incentives for data sharing as well, while enabling stronger tracking of data generation and use, thereby allowing data providers to be compensated for sharing useful data, Peisert emphasized.

In addition to Peisert, presenters at the meeting included:

About Computing Sciences at Berkeley Lab

The Lawrence Berkeley National Laboratory (Berkeley Lab) Computing Sciences Area provides the computing and networking resources and expertise critical to advancing Department of Energy Office of Science (DOE-SC) research missions: developing new energy sources, improving energy efficiency, developing new materials, and increasing our understanding of ourselves, our world and our universe.

ESnet, the Energy Sciences Network, provides the high-bandwidth, reliable connections that link scientists at 40 DOE research sites to each other and to experimental facilities and supercomputing centers around the country. The National Energy Research Scientific Computing Center (NERSC) powers the discoveries of 7,000-plus scientists at national laboratories and universities, including those at Berkeley Lab's Computational Research Division (CRD). NERSC and ESnet are both Department of Energy Office of Science National User Facilities. The Computational Research Division (CRD) conducts research and development in mathematical modeling and simulation, algorithm design, data storage, management and analysis, computer system architecture and high-performance software implementation. NERSC and ESnet are Department of Energy Office of Science User Facilities.

Berkeley Lab addresses the world's most urgent scientific challenges by advancing sustainable energy, protecting human health, creating new materials, and revealing the origin and fate of the universe. Founded in 1931, Berkeley Lab's scientific expertise has been recognized with 13 Nobel prizes. The University of California manages Berkeley Lab for the DOE’s Office of Science.

The DOE Office of Science is the United States' single largest supporter of basic research in the physical sciences and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.