InTheLoop | 06.03.2013
The weekly newsletter for Berkeley Lab Computing Sciences
June 3, 2013
CRD’s Arie Shoshani, Peter Nugent Win Director’s Achievement Awards
The recipients of the Berkeley Lab Director’s Lifetime and Exceptional Achievement Awards have been announced. Arie Shoshani, head of the Scientific Data Management Group in the Computational Research Division, is one of two recipients of the Berkeley Lab Prize Lifetime Achievement Award. Exceptional Achievement honorees in the Science category include Peter Nugent, leader of the Computational Cosmology Group in CRD. Read more.
Researchers Model Impact of Aerosols over California
For the first time ever, researchers from the Pacific Northwest National Laboratory (PNNL), Colorado State University and the California Air Resources Board have characterized the relative, direct influence of different aerosol species on seasonal atmospheric warming and cooling over California using supercomputers at NERSC and at PNNL. The scientists found that aerosols have a net cooling effect on California’s atmosphere, but individual species contribute differently. While sulfates contributed the most to cooling, black carbon particles, or soot, were responsible for up to 95 percent of countervailing warming. Read more.
ESnet Staff to Present Paper, Participate in Two Demos at Europe’s Largest Networking Conference
From June 3–6, more than 500 networking researchers from across Europe and around the world will meet in Maastricht, The Netherlands for TNC2013, the TERENA Networking Conference. Among them will be several staff members from ESnet who will present a paper and co-lead two demonstrations, including one showcasing the world’s first intercontinental 100 gigabit-per-second link for research and education. TNC is Europe’s largest and most prestigious research networking conference.
As part of the Monday, June 3, opening program, ESnet will join five of the world’s other leading research and education (R&E) networks and two commercial partners to demonstrate for the first time a trans-Atlantic 100 Gbps transmission link for research and education between North America and Europe. Read more. In conjunction with this, ESnet Chief Technologist Inder Monga will lead a demo “Visualize 100G traffic,” showing the total traffic on the 100 Gbps link between New York and the conference in Maastricht. ESnet’s Jon Dugan is a prime contributor to the demo, while Patrick Dorn also contributed.
On Tuesday, June 4, during the “Big Data, Big Deal” session, ESnet’s Bill Johnston will present “Enabling high throughput in widely distributed data management and analysis systems: Lessons from the LHC.” The paper was co-authored by Michael Ernst of Brookhaven National Lab and ESnet’s Eli Dart and Brian Tierney.
On Wednesday, June 6, Monga will lead a demo on “How many modern servers can fill a 100 Gbps Transatlantic Circuit?” The demonstration will show that with the proper tuning and tools, only two hosts on each continent can generate almost 80 Gbps of traffic. Brian Tierney and Chin Guok of ESnet are supporting the demo, which is collaboration with the University of Amsterdam.
NERSC Strategic Plan Is Now Online
The NERSC Strategic Plan for FY2014–2023 is now online here. Requested by the DOE Office of Advanced Scientific Computing Research as input for ASCR’s long-term planning, the strategic plan discusses NERSC’s mission, goals, science drivers, planned initiatives, and technology strategy, among other topics. Go here for more NERSC publications and reports.
Computing Sciences Summer Student Program Begins Tomorrow
The Computing Sciences Summer Student Program begins on June 4 with a lunch and a welcome from CRD Director David Brown. Thirty-five students will be working on a variety of projects with CRD, ESnet, and NERSC staff. Osni Marques chairs the program this year, with help from David Skinner, Elizabeth Bautista, Brian Tierney, Marcia Ocon-Leimer, Maria Maroudas, Jeff Todd, and Teresa Montero.
A poster session on August 1 will give students the opportunity to prepare and present a tabletop poster that summarizes their work, describes their Lab experiences, and presents the results of their research during their stay at the Lab. Prizes will be awarded for the best, most innovative, and most original posters.
A series of talks and tours throughout the summer will familiarize the students with research in Computing Sciences and at Berkeley Lab. See the schedule here.
NERSC Turns 40 in 2014; CS Staff Invited to Suggest a Theme
Established in 1974, NERSC is likely the longest running supercomputing center for open science. Planning is just beginning for the 40th anniversary, and one of the first goals is to choose a theme for the anniversary. Have an idea or two? Please send them to Jon Bashor.
City College of San Francisco Students Visit NERSC
On May 28, some City College of San Francisco students got a look at the inner workings of the NERSC machine room, with Nick Cardo and David Paul leading the way. Elizabeth Bautista organized the tour, and Margie Wylie took photos. View the photos on Facebook.
This Week’s Computing Sciences Seminars
Learning from Subsampled Data: Active and Randomized Strategies
Monday, June 3, 2:00–3:00 pm, 50A-5132
Fabian Wauthier, University of California Berkeley
In modern machine learning applications, we frequently encounter situations where enormous amounts of data are available. This trend puts stress on current computational resources and motivates learning from a small subsample of data. Depending on the circumstances, subsampling can be active (i.e. active learning) or randomized. In this talk I will show work on both fronts.
Active Learning: As models become more complex, active learning tends to become harder. For example, consider complex Bayesian models, which often rely on an MCMC-based method for inference. Here, active learning is commonly thought to be computationally infeasible, since a naive scoring implementation would require running many additional MCMC chains. I propose a new MCMC-based framework for tractable approximate active learning which reuses samples from an existing MCMC chain for approximate scoring. This avoids running extra MCMC chains and outperforms the naive approach.
Randomized Subsampling: A fundamental theoretical question related to randomized subsampling looks at the sample complexity of a statistical model. I will focus on two simple randomized algorithms for ranking from pairwise comparisons and will show sample complexities that in expectation achieve a corresponding lower bound. Additionally, the algorithms possess interesting recovery properties: One algorithm recovers the rank with uniform quality across a permutation, while the other recovers the rank more accurately near the top than the bottom.
OSF Brown Bag: Nagios Downtime Scripts
Tuesday, June 4, 12:00–1:00 pm, OSF room 238
Patrick Buddeberg, NERSC/ESnet Operations
The Nagios web interface to downtime management works well for small scale usage, but it does not scale very well to the many host (i.e., nodes) environment at NERSC. Additionally, it will not give a warning that a downtime will be ending shortly. To alleviate these issue, the Nagios downtimes scripts were created to form a command line interface for working with many hosts and services at once, and a script to be run as a cron job to give warning of ending downtimes. This second feature also is capable of reducing mail volume.
CS Summer Student Program Lunch Kickoff
Tuesday, June 4, 12:00–1:30 pm, 50A-5132
David Brown, LBNL/CRD
CS Summer Student Brown Bag: High-Order Discontinuous Galerkin Methods for Conservation Laws (tentative)
Thursday, June 6, 12:00–1:00 pm, 70-191
Per-Olof Persson, UC Berkeley and LBNL/CRD
About Computing Sciences at Berkeley Lab
The Lawrence Berkeley National Laboratory (Berkeley Lab) Computing Sciences organization provides the computing and networking resources and expertise critical to advancing the Department of Energy's research missions: developing new energy sources, improving energy efficiency, developing new materials and increasing our understanding of ourselves, our world and our universe.
ESnet, the Energy Sciences Network, provides the high-bandwidth, reliable connections that link scientists at 40 DOE research sites to each other and to experimental facilities and supercomputing centers around the country. The National Energy Research Scientific Computing Center (NERSC) powers the discoveries of 7,000-plus scientists at national laboratories and universities, including those at Berkeley Lab's Computational Research Division (CRD). CRD conducts research and development in mathematical modeling and simulation, algorithm design, data storage, management and analysis, computer system architecture and high-performance software implementation. NERSC and ESnet are Department of Energy Office of Science User Facilities.
Lawrence Berkeley National Laboratory addresses the world's most urgent scientific challenges by advancing sustainable energy, protecting human health, creating new materials, and revealing the origin and fate of the universe. Founded in 1931, Berkeley Lab's scientific expertise has been recognized with 13 Nobel prizes. The University of California manages Berkeley Lab for the DOE’s Office of Science.
DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.