A-Z Index | Directory | Careers

ACM’s Software System Award Honors Project Jupyter Team

Berkeley Lab’s Fernando Pérez pioneered software and has been integral to its expansion

May 2, 2018

By Linda Vu
Contact: cscomms@lbl.gov

Fernando Perez and Brian Granger

Fernando Perez and Brian Granger discuss the architecture of Project Jupyter, a collaborative computing software, as its scope expands to work with data science applications in over 40 programming languages.

The Project Jupyter team has been honored with an Association of Computing Machinery (ACM) Software System Award for developing a tool that has had a lasting influence on computing. Project Jupyter evolved from IPython, an effort pioneered by Fernando Pérez, an assistant professor of statistics at UC Berkeley and staff scientist in the Usable Software Systems Group in Lawrence Berkeley National Laboratory’s (Berkeley Lab’s) Computational Research Division.

The award and a prize of $35,000 will be presented to the team at the ACM Awards banquet in San Francisco on June 23, 2018.

Project Jupyter is an open, international collaboration that develops tools for interactive computing: a process of human computer interplay for scientific exploration and data analysis. The collaboration develops applications such as the widely popular Jupyter Notebook, an open-source web app that allows users to create and share documents that contain live code, equations, visualizations and narrative text.

Today, more than 2 million Jupyter Notebooks are hosted on the popular GitHub service, covering technical documentation to course materials, books and academic publication. Jupyter has been transformative in scientific collaborations and reproducibility, as exemplified by its use at the LIGO observatory, whose discovery of gravitational waves was recognized with the 2017 Nobel Prize in Physics. The LIGO Open Science Center publishes Jupyter Notebooks that allow anyone to replicate their original analyses. Jupyter Notebooks also serves as a core infrastructure for research endeavors like the Department of Energy (DOE)-funded KBase platform for predictive biology, the GenePattern Notebook project from the Broad Institute and UC San Diego and the European Union-funded OpenDreamKit project that is building virtual research environments for mathematics.

JupyterHub supports the deployment of Jupyter tools in multiuser environments, from small research groups to universities, companies and other organizations. JupyterHub is used in numerous commercial companies, research at facilities such as CERN and high-performance computing centers like DOE’s National Energy Research Scientific Computing Center (NERSC) and San Diego Supercomputer Center (SDSC).

“The flexibility of the Jupyter architecture makes it easy to deploy in a variety of scenarios: while individual users can run the tools on a personal laptop or workstation, the same tools can be deployed on remote resources,” says Shane Canon, a project engineer at NERSC. “In fact, NERSC offers Jupyter as an interactive tool for remote access to its supercomputing resources.”

At UC Berkeley two new courses Foundations of Data Science and Principles and Techniques of Data Science, will be supported by Jupyter Notebooks deployed in the cloud and integrated with campus authentication. The courses are being offered as part of UC Berkeley’ new data science major. Pérez will be teaching the upper-division course Principles and Techniques of Data Science.

In industry, the Jupyter Notebook is widely used as a daily computation and data-analysis tool, and major companies have created hosted services based on Jupyter. Google’s Cloud DataLab, Microsoft’s Notebooks on Azure and IBM’s Data Science Experience all offer Jupyter Notebooks on their respective cloud infrastructure. 

In education, at least 45 different courses use Jupyter Notebooks to teach a wide variety of subjects, including high-school level Computer Science, Aerodynamics, Numerical Methods, Statistics, Computational Physics, Cognitive Science and Data Science. These have been deployed at leading universities in the U.S. and abroad, including UC Berkeley, Cal Poly, MIT, Harvard, Columbia and Imperial College.

As a graduate student studying physics at the University of Colorado in the early 2000s, Pérez remembers using a hodgepodge of software systems to illustrate code, equations, visualizations and text in his scientific computing papers. This inspired him to create a unified environment for scientific computing. He found researchers around the globe that had all independently started building scientific computing tools in Python and combined these disparate efforts into one open-source platform called IPython—“I” for interactive. The program was free, and anyone could inspect its code, modify it and make the output available under liberal licensing terms.

Over the years, IPython evolved to meet the needs of various communities and in 2014 project rebranded itself as “Jupyter” to recognize the fact that it was no longer just for Python. In 2015, Pérez and Brian Granger of California Polytechnic University, San Luis Obispo received $6 million from the Leona M. and Harry B. Helmsley Charitable Trust, Alfred P. Sloan Foundation and Gordon and Betty Moore Foundation to expand and improve the capabilities of the Jupyter Notebook.

Since then, Pérez and Granger have secured additional funding from other sources like the DOE and industry partners like Google, Microsoft and Anaconda Inc. Companies such as Bloomberg, IBM, Microsoft, Netflix, Rackspace and Anaconda also support the project, either with services or with the time of engineers who actively contribute to Jupyter’s development. The next-generation user interface for the Jupyter Notebook, known as JupyterLab, is currently being developed in an open collaboration with team members and engineers from Bloomberg and Anaconda.

“One afternoon in late 2001, I was a physics graduate student at the University of Colorado working on my dissertation and decided to spend an afternoon writing the original, tiny version of IPython,” says Pérez. “I could not have imagined that this would grow into a worldwide platform almost two decades later. For me, it’s been a wild ride, made possible by going from a personal exploration to an open collaboration with an incredible team ”

“This is a project that has demonstrated 20 years of intellectual contributions with major impact in research, education and industry, and it continues to make its advances available to the world as an open platform,” says Kathy Yelick, Associate Laboratory Director of Berkeley Lab Computing Sciences. “The ACM System Software Award is an incredible honor, and this team is entirely deserving of this recognition.”

In addition to Pérez, other members of the Jupyter Project collaboration include Brian E. Granger and Carol Willing (Cal Poly San Luis Obispo),  Matthias Bussonnier (UC Berkeley BIDS), Paul Ivanov and Jason Grout (Bloomberg), Thomas Kluyver (European XFEL), Damián Avila (Anaconda, Inc.), Steven Silvester (JP Morgan Chase), Jonathan Frederic (Google), Kyle Kelley (Netflix), Jessica Hamrick (DeepMind), Sylvain Corlay (QuantStack), Peter Parente (Valassis Digital).

NERSC is a DOE Office of Science user facility.

About Computing Sciences at Berkeley Lab

High performance computing plays a critical role in scientific discovery. Researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab’s Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.