A-Z Index | Phone Book | Careers

Berkeley Lab, BIDS Take on Big Data

Berkeley Lab-BIDS Fellows to share their expertise at ML4Sci Workshop and California Water Data Hackathon

August 30, 2018

Contact: Linda Vu, lvu@lbl.gov, +1 510.495.2402

BIDS Logo Acronym Color

The world is currently generating data at a break-neck pace — about 2.5 quintillion bytes per day — and this trend is only accelerating. To make sense of this torrent of information, Berkeley Institute for Data Science (BIDS) has built an ecosystem of researchers to advance data-analytic methods and inquiry, develop and expand software and analytics tools, and share best practices.

The BIDS ecosystem comprises an impressive network of Fellows, including some who are Lawrence Berkeley National Laboratory (Berkeley Lab) scientists. This month, several Berkeley Lab-BIDS Fellows are organizing two of events to share their data-science expertise. Some are helping to organize a Machine Learning for Science (ML4Sci) Workshop that will be held in early September, where they will introduce and train scientists to use state-of-the-art machine learning applications on massively parallel supercomputers. At the end of September, another group is hosting the California Water Data Hackathon to help address the state’s lack of access to clean, safe drinking water. 

“There’s a perspective that one promise of data science comes from the interdisciplinary nature of the research it enables. ‘Inter’ can mean among different fields of inquiry, but also can mean among alternative approaches to handling data-intensive workloads in research,” said BIDS Executive Director David Mongeau. “For instance, many data scientists at UC Berkeley might default to familiar infrastructure for their work, but by interacting with Berkeley Lab can explore alternative approaches made possible with high performance computing.”

Machine Learning For Science (ML4Sci)

Screen Shot 2018 06 07 at 4.57.25 PM 768x349

Cori supercomputer at NERSC. (Photo by Marilyn Chung, Berkeley Lab) 

Some of the Berkeley Lab researchers bringing this expertise to BIDS are Deborah Agarwal, head of the Computational Research Division’s (CRD’s) Data Science and Technology Department; Daniela Ushizima, a staff scientist in the Center for Advanced Mathematics for Energy Research Applications (CAMERA) and CRD’s Data Analytics & Visualization group; and Kristofer Bouchard, computational bioscientist in the Biosciences Area. They are helping to organize the ML4Sci workshop, which will be held at Berkeley Lab Sept. 4-5 in conjunction with the National Energy Research Scientific Computing Center’s (NERSC’s) annual Data Day (Sept. 6-7). Other key organizers of the workshop are NERSC’s Data & Analytics Services group members Prabhat, Steve Farrel, Mustafa Mustafa, and Zarija Lukić  of Berkeley Lab’s Computational Cosmology Center.

The workshop will feature several UC Berkeley faculty-BIDS Fellows as keynote speakers, including Bin YuJohn Canny, Philip Stark, and Joshua Bloom. The event will introduce researchers to cutting-edge machine learning applications for high-energy physics, nuclear physics, cosmology, chemistry, biosciences, materials engineering, climate, and high performance computing. Additionally, machine learning experts, will provide hands-on training to deploy these applications on supercomputers at NERSC.

“There are so many benefits from the cross-pollination of expertise and resources between Berkeley Lab and BIDS,” said Ushizima. “During the ML4Sci workshop, Berkeley Lab staff will be showcasing Jupyter tools. Today, these tools are open source and serve a variety of data science needs—for example, there are currently more than 2 million Jupyter Notebooks hosted on Github. But the root of Jupyter was pioneered by Fernando Perez, one of the founding fathers of BIDS, currently a professor in the Department of Statistics at UC Berkeley, and a Berkeley Lab researcher.” 

Earlier this year, the Association for Computing Machinery honored the Jupyter Project Team for developing a tool that has had a lasting influence on computing. At Berkeley Lab, Ushizima also leads the Department of Energy Early Career Project Image across Domains, Algorithms and Learning (IDEAL).

California Water Data Hackathon

RoyK Shyh Wang Hall

Berkeley View from Berkeley Lab. (Photo by Roy Kaltschmidt, Berkeley Lab)

Beyond scientific applications, BIDS also focuses on social impact issues. Earlier this year, when a number of state agencies, private companies and the West Big Data Innovation Hub joined forces to create the 2018 California Safe Drinking Water Data Challenge, BIDS knew it wanted to be a part of this effort. Zexuan Xu, a BIDS Data Science Fellow and a postdoctoral researcher in hydrology in Berkeley Lab’s Earth and Environmental Sciences Area, is helping  to organize BIDS’ participation in this event.

As part of the challenge, BIDS is teaming up with UC Berkeley’s Division of Data Sciences to host the California Water Data Hackathon on Sept. 14-15. According to Xu, the hackathon is open to all but mostly undergraduate and graduate students from a variety of disciplines. The goal is to teach the students about California’s water issues, then have them use publicly available data to help find innovative ways to increase community access to safe drinking water, better understand vulnerabilities, then help identify and deploy solutions.  

“Up to 1 million Californians lack access to clean, safe drinking water at some point during the year. Droughts and other disruptions in water supply and contamination in water quality can limit or eliminate access to safe drinking water for days, months, or years,” said Xu. “All the topics that the hackathon participants will address are currently open questions. If they come up with interesting questions and/or solutions, we will deliver their interests to the state agencies, and encourage them to continue the research.”

In many ways the hackathon embodies the philosophy of BIDS, which takes a broad view of data science and welcomes candidates from a full range of research focuses—from digital humanities and psychology to statistics and computer science—who are interested in pushing the frontiers of data-intensive research in their own field and in cross-disciplinary collaborations.

“The greatest benefit of being a BIDS Fellow is getting to know people that work in different fields of science. I am a domain expert in earth and environmental science, but others are experts in math, software development, statistics, bioscience, etc.,” said Xu. “Because the community is so integrated, I can collaborate with mathematicians that I don’t normally have access to. We work on research projects together, then I have a chance to learn the cutting-edge research in other science areas and also share my knowledge and insights with others in my domain area.” 

That benefit and the bonds that Berkeley Lab and Univeristy continue to strengthen come in part through Nobel Laureate Saul Perlmutter serving as BIDS Director. He shares the 2011 Nobel Prize in Physics for the discovery of the accelerating expansion of the universe.  

Although registration for the ML4Sci workshop is closed, you can still register for the California Water Data Hackathon here: https://www.eventbrite.com/e/california-water-data-hackathon-tickets-48720835330 

A full list of BIDS Fellows: https://bids.berkeley.edu/people

About Computing Sciences at Berkeley Lab

The Lawrence Berkeley National Laboratory (Berkeley Lab) Computing Sciences organization provides the computing and networking resources and expertise critical to advancing the Department of Energy's research missions: developing new energy sources, improving energy efficiency, developing new materials and increasing our understanding of ourselves, our world and our universe.

ESnet, the Energy Sciences Network, provides the high-bandwidth, reliable connections that link scientists at 40 DOE research sites to each other and to experimental facilities and supercomputing centers around the country. The National Energy Research Scientific Computing Center (NERSC) powers the discoveries of 7,000-plus scientists at national laboratories and universities, including those at Berkeley Lab's Computational Research Division (CRD). CRD conducts research and development in mathematical modeling and simulation, algorithm design, data storage, management and analysis, computer system architecture and high-performance software implementation. NERSC and ESnet are Department of Energy Office of Science User Facilities.

Lawrence Berkeley National Laboratory addresses the world's most urgent scientific challenges by advancing sustainable energy, protecting human health, creating new materials, and revealing the origin and fate of the universe. Founded in 1931, Berkeley Lab's scientific expertise has been recognized with 13 Nobel prizes. The University of California manages Berkeley Lab for the DOE’s Office of Science.

DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.