Summer Students Tackle COVID-19
Berkeley Lab’s Computational Research Division Summer Students Support Pandemic Response
October 21, 2020
By Carol Pott
With near-daily changes in the understanding of SARS-CoV-2 and how COVID-19 is transmitted and spread, researchers are challenged with mountains of data in the race to end the outbreak. Adding to that, the inaccessibility of high-quality electronic health records (EHRs) creates a challenge for researchers needing more complete information to map the viral outbreak and fully understand its broader impact on health.
Because of privacy issues and HIPAA regulations, EHRs are rarely made available for wide-spread research and tend to stay siloed within the individual academic and medical facilities that collect them. And when that medical data is made more broadly available for research, the patient’s personal information is usually stripped from the record. Though this helps to maintain privacy, removing personal data significantly reduces the EHR’s usefulness for research and medical analysis. These issues were particularly acute in the first weeks of the current pandemic when there were only a few active cases but remains a significant issue today with more than 41 million cases worldwide.
As a part of the Computational Research Division’s (CRD) summer student program at Lawrence Berkeley National Laboratory, four graduate students from the University of California, Davis (UC Davis) researched a method that could allow doctors and researchers to leverage valuable health information in the battle against COVID-19 while also preserving patient privacy in COVID-19-related EHRs. The students worked to support the COVID-19 response by using actual EHR data and applying differential privacy, a data-driven approach first published in 2006 that provides strong, statistical privacy guarantees while balancing privacy and the utility of data for use in machine learning and other analyses. Differential privacy is widely used by companies like Apple, Google, and Microsoft, as well as the U.S. Census Bureau and the United Nations, among others.
“Earlier this year, many researchers were sitting on the sidelines of the COVID-19 response wondering how we could make a difference,” said Sean Peisert, research lead and staff scientist in the Data Science and Technology Department in CRD. Peisert and Nicholas Anderson, professor of Informatics and director of Informatics Research at UC Davis Health, combined forces and determined that applying privacy techniques to actual EHR data was one way to contribute to the research on the pandemic.
“Providing scientists with access to high-quality EHR data with privacy protections in place would enable better science — including public health for COVID-19 — and it could be revolutionary if medical data sharing wasn’t stuck in the 20th century," said Peisert. “We wanted to explore ways to do this and contribute to research on the pandemic.”
Leveraging the UC Health COVID-19 Limited Dataset (COVID LDS), the students worked closely with Anderson and his research computing staff at UC Davis Health as well as CRD computer systems engineer Reinhard Gentz who mentored the students and helped to support the computing infrastructure used in the project.
The students’ work was supported through the UC Laboratory Fees Research Program (LFRP) and Contractor Supporting Research (CSR) funding at Berkeley Lab. Their work was part of a broader effort addressing privacy issues in energy-related domains, such as the use of personally-identifiable geospatial data in optimizing vehicle traffic routing and enabling privacy-preserving analysis of data from the power grid including metering data that has the potential to reveal activities inside individual homes and/or sensitive information about the topology of the power grid.
“Being able to orient and support this team of interdisciplinary students in the adaptation and evaluation of these leading-edge analytic and data access methods on actual COVID-19 EHR data, during a pandemic, and entirely virtually was very exciting. We would love to see more regular “embedded” experiences that provide this focused research in the future,” Anderson said.
Each of the UC Davis graduate students took a slightly different angle on research this summer:
Archit Garg is a third-year computer science master's student from New Delhi, India, who began studying computer science in high school and credits his family and culture with his inspiration to pursue his master's degree. His research revolved around integrating a variety of open-source differential privacy tools, such as those from Google, IBM, and Microsoft, with the Observational Medical Outcomes Partnership (OMOP) Common Data Model data back-end containing the UC Health COVID LDS. “Protection of data and privacy is one of the fields that I am a big advocate for and I have always wanted to work on a project within those areas,” Garg said. “The opportunity to solve a problem that has a direct impact on the ongoing pandemic has been an amazing experience.”
Also from India, Chitrabhanu Gupta is a second-year master's student in computer engineering. Having grown up at a time when technology was rapidly reshaping the world, he was inspired to make a contribution that would improve people’s lives. “Data privacy restrictions have crippled researchers. Differential privacy represents a possible solution that may allow more researchers to have access to sensitive data without compromising privacy,” said Gupta. His work focused on examining applications of differential privacy to statistical database queries of the UC Health COVID LDS. These types of queries could be used by epidemiologists and public health researchers to identify significant statistical trends regionally or across large subsets. “It was especially gratifying to be able to work with the COVID-19 dataset and make a tangible contribution to a problem that is currently plaguing the world,” said Gupta.
Jinyue Song is a second-year Ph.D. student in computer science. Hailing from Canada, Song was encouraged in his freshman year and to explore as many fields as possible and, as a result, he tried discrete math, topology, machine learning, and blockchain before deciding on computing science. He left Canada to pursue a master's at the University of San Francisco and was subsequently offered a Ph.D. opportunity to study computing science at UC Davis. Song developed a survey that documents the pros and cons of a variety of privacy-preserving technologies, including differential privacy, secure multiparty computation, and homomorphic encryption. This survey also discussed and compared the features and application scenarios of a combination of privacy protection techniques and four popular computing frameworks, including cloud computing, edge computing, blockchain, and federated learning. “It was an honor and a joy to do research with Professor Peisert this summer and I believe that this work can contribute to solving practical issues with COVID-19 research,” said Song. “The research I was able to do and the relationships I built were amazing.”
Jayneel Vora is a second-year Ph.D. student in computer science. From India, Vora became fascinated by data privacy leaks after being bombarded with targeted advertising related to his online search history. “Tools such as federated learning and differential privacy will become essential to understanding how data privacy can be ensured,” said Vora, “especially with the onset of powerful machine learning algorithms and a deeper insertion of adversaries in the data pipeline.” Vora’s work examined differentially private clustering and classification approaches to differentiate the UC Health COVID LDS.
About Computing Sciences at Berkeley Lab
High performance computing plays a critical role in scientific discovery. Researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab’s Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.