Berkeley Lab Researchers Develop Platform for Hosting Science Data Analytics Competitions
NNSA’s Urban Radiological Search Competition is the first to use the portal
April 16, 2018
By Linda Vu
The Department of Energy’s National Nuclear Security Agency (NNSA) is searching for innovative algorithms to detect non-natural radiation sources in urban environments, which has important implications for national security and environmental remediation. So NNSA is taking a cue from academia, as well as companies like Netflix and Google, and hosting a data analytics competition.
For an event like this, academic organizations and businesses typically upload their datasets to a platform like Kaggle, which is designed to host public predictive modeling and analytics competitions, and have competitors test out their techniques. But since NNSA’s competition has its own set of unique requirements and at the moment is only open to employees and contractors at National Laboratories across the DOE complex, NNSA needed a more specialized system.
So NNSA turned to researchers and engineers in the Lawrence Berkeley National Laboratory’s (Berkeley Lab’s) Computational Research (CRD), Nuclear Science (NSD) and Information Technology (IT) divisions to help build a Kaggle-inspired platform. Now that the system has been developed and is hosted on Berkeley Lab IT servers, the team says it also can be used to host data analytics competitions for other scientific problems or disciplines. Those interested in hosting a competition should contact NSD’s Brian Quiter.
Meanwhile the NNSA’s Urban Radiological Search Competition is now live and will be open until April 30, 2018. There currently are 45 users from 20 teams competing, and a total of 285 submissions.
“We are creating a data competition portal. Because of the unique requirements of our stakeholders, we built a system inspired by Kaggle, but tailored to the needs of DOE research” says Shreyas Cholia, who leads CRD’s Usable Software Systems Group. “You can think of it as crowd-sourced machine learning and analytics, where users download test and training datasets, and apply their own techniques to predict a set of results. Results are then uploaded to the competition portal and scored against the canonical ground truth data.”
Cholia notes that the Berkeley Lab-developed platform includes a user-friendly web portal where competition hosts can upload datasets and setup a competition. In fact, the system can host several competitions at once. Competitors can download that dataset, perform their analysis and upload their results. The system will then score the techniques and update a public scoreboard so that the competitors can see where they stand. Each user can submit a number of techniques during the competition to try to improve their score.
For the Urban Radiological Search Competition, Cholia worked closely with Berkeley Lab Applied Nuclear Physicist Brian Quiter to develop the competition platform.
“We want to use the competition infrastructure that we are hosting to revolutionize how radiological search is done,” says Quiter. “This is an opportunity to figure out the capabilities of this type of competition, and we’ll use the lessons learned from this first competition to spawn more advanced competitions.”
According to Tenzing Joshi, an applied nuclear physicist at Berkeley Lab who is participating in the Urban Radiological Search Competition, there is a need in the security community to deploy sensors in a variety of locations to look for radioactive sources in places where they shouldn’t be. However, this is relatively challenging to do because radiation signatures are weak, so the detector data is pretty sparse. The problem is further complicated by the fact that the world is naturally radioactive and there are medical and industrial radiological sources that can get picked up by the detector.
“Deploying large numbers of advanced sensors across the country to improve resolution and efficiency isn’t economically viable, so you have to make due with limited resolution and efficiency,” says Joshi. “So to eke out more performance, the revolution over the past 40 years hasn’t been in detectors, it’s been in computational power, which has helped make the nuances of data more available for analysis.”
According to Quiter, the Urban Radiological Search Competition was formulated through a collaboration between Berkeley Lab, Oak Ridge National Laboratory (ORNL), and Los Alamos National Laboratory (LANL). LANL led the design of the competition from the data science perspective, while ORNL formulated the modeled data to test the competitors’ algorithms. ORNL’s formulations were benchmarked with real data and the goal of their modeling was to encapsulate those nuances of field data, which wasn’t computationally feasible 5 years ago.
“This competition is a wonderful opportunity for research groups to demonstrate the performance of their codes in a way that doesn’t force them to give up control of it. And it allows the competition sponsors, in this case the NNSA, to see where everyone stands and what methods work,” says Joshi. “The Berkeley Lab competition platform has been really easy to use.”
Quiter’s team has been working with CRD to develop data management frameworks and tools for a number of nuclear science projects over the years.
“Many of my projects are NNSA-funded, and they became aware of our ability to build platforms for managing and disseminating data to researchers across the country. This is what prompted them to ask us to develop this competition portal,” says Quiter.
“I think we have a unique model for the work that we do in Berkeley Lab’s Usable Software Systems and Integrated Data Frameworks groups. Our computer scientists and engineers are embedded with the domain science team, so that we can understand their requirements and build software to meet their needs.” says Cholia. “Our focus on usability and user research drives what we are doing, and it’s exciting to know that it caught the attention of NNSA staff who saw a value in that, and asked us to help build this competition portal.”
The portal was built using the Python Django framework. Development of the portal included key contributions from Hamdy Elgammal (CRD), Yeongshnn Ong (CRD), Kai Song (IT), Krishna Muriki (IT) and Val Hendrix (CRD), who worked with NSD staff to build a robust and reusable platform.
About Computing Sciences at Berkeley Lab
High performance computing plays a critical role in scientific discovery. Researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab’s Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.