A-Z Index | Directory | Careers

Berkeley Lab Cybersecurity Specialist Highlights Data Sharing Benefits, Challenges at NAS Meeting

December 4, 2018

Contact: Kathy Kincade, kkincade@lbl.gov, +1 510 495 2124

NAS Building 2

The National Academy of Sciences building in Washington, D.C.

There are many reasons why data isn’t widely shared by organizations that collect it or between scientists who analyze it in search of new scientific insights. This topic, and ways to overcome those barriers in a world of data-driven science, was the subject of a recent meeting of the Committee on Science, Engineering, Medicine, and Public Policy (COSEMPUP), a joint unit of the National Academy of Sciences, National Academy of Engineering, and the National Academy of Medicine, which took place on November 8, 2018, in Washington, D.C.

Science focuses on building knowledge about the universe through a combination of observation and experimentation. While some science can be done using mathematics and simulations, eventually science needs to be tested and confirmed in the real world. This requires data. And at a time when scientists are asking increasingly important and fundamental questions about the universe, it also requires unique and expensive instruments that generate very large amounts of data that need to be shared.

The COSEMPUP meeting was far from the first to address this essential activity, but it brought an unusual union of scientific experts to address needs, barriers, and incentives to make progress.

Screen Shot 2018 12 04 at 10.17.12 AM

Sean Peisert, Computational Research Division

The core topic of the meeting was data sharing in biomedical science, so presentations focused on current challenges in biomedical data sharing and identifying potential solutions from other domains that could be applied, noted Sean Peisert, a leading cybersecurity researcher at Lawrence Berkeley National Lab and an invited speaker at the meeting. He discussed how the strategic use and combination of computer security and privacy-preserving techniques can be used to overcome certain data-sharing barriers and serve as a means to facilitate, enhance, and create incentives for increased data sharing in the sciences - thereby accelerating data-driven scientific discovery.

In particular, Peisert described how using varying combinations of current and future hardware and software techniques could help meet or exceed standards for data subject to government regulations, such as HIPAA or FISMA; address concerns regarding unregulated scientific data still containing individually private information; and provide solutions for proprietary data that might contain trade secrets.

Five Solutions

At present, there are typically five solutions for data sharing that are used independently or in combinations, Peisert noted:

  1. We often don’t share data at all, which is a huge inhibitor to scientific research.
  2. We require people using data to come to the data, rather than being able to work with the data in their own computing environment, which often doesn’t scale in the cases in which data requires a long time for analysis and presence of many scientists at a particular remote facility for long periods is onerous.
  3. We put legal protections in place.  
  4. We put elaborate security protections in place, such as “air gaps” — solutions that disconnect computing systems from any computer networks so that data on those systems cannot accidentally leak, or be maliciously stolen.
  5. We transform data, e.g., by redacting or “fuzzing” it in a way that it no longer presents as significant a risk if it is put in the wrong hands.

“But all these solutions have downsides, for various reasons, ranging from data not being shared at all to data being very difficult to use, to data losing significant research utility,” he said.

Fortunately, advances in computing technology and techniques have provided numerous advances that can reduce barriers to developing trustworthy data sharing solutions, including:

  • Hardware-trusted execution environments, something that all of the major chip-makers have now deployed
  • Software solutions for computing over encrypted data, which are no longer a trillion times too slow, as they were just a few years ago
  • Differential privacy, a statistical technique for provably providing privacy guarantees while explicitly balancing research utility
  • “Smart contract” components in blockchain technologies.

These techniques also increase trust in security and privacy and create positive incentives for data sharing as well, while enabling stronger tracking of data generation and use, thereby allowing data providers to be compensated for sharing useful data, Peisert emphasized.

In addition to Peisert, presenters at the meeting included:

About Computing Sciences at Berkeley Lab

High performance computing plays a critical role in scientific discovery. Researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab’s Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.