InTheLoop | 06.10.2013
The weekly newsletter for Berkeley Lab Computing Sciences
June 10, 2013
Trillion Particle Simulation on Hopper Honored with Best Paper
An unprecedented trillion-particle simulation, which utilized more than 120,000 processors and generated approximately 350 terabytes of data, pushed the performance capability NERSC’s Cray XE6 “Hopper” supercomputer to its limits.
In addition to shedding new light on a longstanding astrophysics mystery, the successful run also allowed a team of computational researchers from Berkeley Lab and Cray Inc. to glean valuable insights that will help thousands of scientists worldwide make the most of current petascale systems and future exascale supercomputers. The paper describing their findings won best paper at the 2013 Cray User Group conference in Napa Valley, California. Read more.
An HPCwire article on this achievement, “Researchers Squeeze Record I/O from Hopper,” quotes Prabhat and Surendra Byna of CRD.
Wall Street Journal: Chinese Supercomputer Poised to Take No. 1 Ranking
A new supercomputer based in Changsha, China is likely to be ranked the world’s fastest later this month, reclaiming a crown briefly held by China’s Tianhe-1A in 2010. The new system has exceeded 30 petaflops (quadrillion calculations per second) in one benchmark test, almost doubling its nearest rival’s score. Berkeley Lab Deputy Director Horst Simon comments on this development in a Wall Street Journal article (subscription required). An excerpt from the article was published in the China Digital Times.
Live Science: Deadly Heat Waves Intensify As Summers Sizzle
An op-ed in Live Science discusses a new study by the Centers for Disease Control and Prevention (CDC) showing that deaths caused by heat are on the rise in United States. The article includes comments by climate scientists Michael Wehner and Daithi Stone of the Computational Research Division. Read more.
HPCwire: The Network as a Scientific Instrument
A Q&A with ESnet Director Greg Bell has just been posted on HPCwire. In it, Bell argues that it’s time to start thinking about research networks as instruments for discovery, not just infrastructures for service delivery. Read more.
Calling All Scientists: Apply to the New Enlighten Your Research Global Program!
ESnet strives to expand our network capabilities to new communities in different science domains, as well as assist these communities in connecting to their collaborators worldwide. To help achieve these goals, ESnet has teamed up with four leading national research and education networks (NRENs)—Funet in Finland, Internet2 in the US, Janet in the UK and SURFnet in the Netherlands—to create a new program called Enlighten Your Research Global (EYR-Global). The program aims to facilitate new scientific collaborations where data needs require global network resources to accelerate research. Read more.
This Week’s Computing Sciences Seminars
CS Summer Student Program: Introduction to HPC and NERSC Tour
Monday, June 10, 9:00 am–12:00 pm, OSF
Richard Gerber and Jim Mellander, LBNL/NERSC
Named Data Networking (NDN)
Monday, June 10, 11:00 am–12:00 pm, 50B-4205
Susmit Shannigrahi, Colorado State University
Despite its huge success, the Internet was designed to share resources. Since then, the main focus has shifted from “where” (location of the hosts) to “what” (the actual content). Communication, however, is still between end-hosts. This creates several problems. Content availability is dependent on availability of hosts, security is dependent on securing the channel, and content is location dependent.
Named Data Networking (NDN) is built around the philosophy that users care about the content, not where it is coming from. There are several architectural changes that have been proposed. First, there is no notion of hosts — there is no IP address in a packet, as it contains the content name. Secondly, the communication is driven by the consumers of content. There are two types of packets: interest and data packet. Consumers express interest for a content. It is routed based on the name and any entity that has the data reply back. The content is publicly authenticable — anyone can verify that the content was signed by a particular key.
In this talk, I discuss the basic ideas behind NDN. I also talk about the architectural details, benefits over the current Internet and the state of research.
Information Processing, Codes, and Communication over Networks: Theory and Practice
Monday, June 10, 1:00–2:00 pm, 50B-4205
Naveen Goela, University of California, Berkeley
Information processing over unreliable networks is one of the current challenges in engineering. For point-to-point channels, Shannon established capacity results in 1948, and it took more than forty years to find coded systems approaching the capacity limit with feasible complexity. Significant research efforts have gone into extending Shannon’s capacity theorems to networks with many partial successes. By contrast, the development of actual codes of reasonable complexity for networks (codes implemented by modern circuits) has received limited attention to date. In the first part of the talk, I will present main ideas about Arikan’s polarization theory of random variables and its use in designing useful (and optimal) network codes for such tasks as broadcast, distributed compression, and secrecy.
In the second part of the talk, I will address how theory in networks is directly applicable for tackling current problems about distributed storage and information processing of “big data” sets. In the large-scale distributed storage problem, network codes must efficiently replicate failed data nodes.
Furthermore, an analyst may only wish to compute a function or a query on the data set. It is possible to compress (or even encrypt) the data while still executing computations and queries reliably. As one of several potential applications, I will discuss a recent Persistent Surveillance aerial imaging system I helped to develop at Lincoln Laboratories, MIT, which consists of imaging, registering, tracking, and querying objects over a city-wide coverage area.
MATLAB Training Seminars
Tuesday, June 11, 10:00 am–12:00 pm and 1:00–3:00 pm, Building 50 Auditorium
Please join MathWorks at complimentary MATLAB training seminars for Berkeley Lab researchers, staff and students. The event features two technical sessions presented by a MathWorks engineer:
Session 1: Introduction to Data Analysis and Visualization with MATLAB
MATLAB is a programming environment for algorithm development, data analysis, visualization, and numerical computation. Using MATLAB, you can solve technical computing problems faster than with traditional programming languages, such as C, C++, and FORTRAN.
During this introductory technical seminar, we will provide an overview of MATLAB and introduce you to the powerful statistical analysis and visualization capabilities available in the MATLAB product family. We will demonstrate how to analyze and visualize data, introduce desktop tools for editing and debugging code, and show you how to publish your results. Highlights include:
- Accessing data from files, spreadsheets and other sources
- Performing statistical analysis, curve and surface fitting routines
- Developing algorithms and applications to automate your workflow
- Generating reports in HTML and other file formats to share your work
Session 2: Optimizing and Accelerating Your MATLAB Code
In this session you will learn simple ways to optimize and accelerate the execution speed of your MATLAB code. We will address common pitfalls in writing MATLAB code and show you how to generate standalone C and C++ code to accelerate computationally intensive portions of MATLAB code. We will also introduce high-level parallel programming constructs that allow you to create and run parallel MATLAB applications on multicore processors, GPUs and clusters to speed up large-scale simulations. Highlights include:
- Optimizing MATLAB code to boost execution speed
- Using MATLAB Coder to automatically generate portable C code
- Creating parallel applications using the Parallel Computing Toolbox
- Employing CPUs and GPUs to speed up large-scale simulations
Go here to register for the event (not required, but helpful for event planning purposes).
CS Summer Student Brownbag: Big Bang, Big Data, Big Iron
Thursday, June 13, 12:00–1:00 pm, 70-191
Julian Borrill, LBNL/CRD
Parallel Execution of Data-Intensive Scientific Workflows
Friday, June 14, 11:00–11:30 am, 50B-4205
Patrick Valduriez, INRIA and LIRMM, ZENITH Team, Montpellier, France
Scientific workflows are often data-intensive, thus requiring parallel processing in high-performance computing environments. To be amenable to automatic parallel processing, the specification of a workflow should be high-level and provide for optimization. Recently, we have proposed an algebraic approach for the optimization and parallelization of data-intensive scientific workflows. This approach is based on a workflow algebra with powerful operators such as Filter, Map and Reduce, a set of algebraic transformation rules as basis for optimization and a parallel execution model. In this talk, I will introduce this approach and discuss current and future work within the ZENITH team.
Joint work with Eduardo Ogasawara, Jonas Dias, Daniel de Oliveira, Marta Mattoso (UFRJ, Brazil) and Fabio Porto (LNCC, Brazil).
Profile Diversity in Search and Recommendation
Friday, June 14, 11:30 am–12:00 pm, 50B-4205
Esther Pacitti, University of Montpellier 2 and LIRMM, France
We focus in search and recommendation and, more specifically, we investigate profile diversity, a novel idea in searching scientific documents. Combining keyword relevance with popularity in a scoring function has been the subject of different forms of social relevance. Content diversity has been thoroughly studied in search and advertising, database queries, and recommendations. We investigate profile diversity to address the problem of returning highly popular but too-focused documents. We show how to adapt Fagin’s threshold-based algorithms to return the most relevant and most popular documents that satisfy content and profile diversities and run preliminary experiments.
Joint work with Maximilien Servajean (Univ. Montpellier 2 and LIRMM), Sihem Amr-Yahia (LIG, Univ. Grenoble), and Pascal Neveau (INRA, Montpellier).
Link of the Week: Do Outreach or Your Science Dies
In a Scientific American guest blog, Jai Ranganathan, a conservation biologist and co-founder of SciFund Challenge, writes:
Scientists, here’s the bottom line. If you don’t convince the public that your science matters, your funding will quickly vanish and so will your field. Put another way, the era of outreach being optional for scientists is now over.
Researchers have been able to cloister within an academic ivory tower — conducting their research without paying much attention to what’s going on in the wider world — only because there has been a relatively stable funding base for science. Governmental sources have been vital to that funding base, particularly for basic research where the government picks up most of the tab.
Unfortunately, the stability of that funding is now a thing of the past….
One example of this kind of public engagement is COMPASS, a nonprofit dedicated to communicating with the public about oceanic science, as described in this article in PLOS Biology.
About Computing Sciences at Berkeley Lab
The Lawrence Berkeley National Laboratory (Berkeley Lab) Computing Sciences organization provides the computing and networking resources and expertise critical to advancing the Department of Energy's research missions: developing new energy sources, improving energy efficiency, developing new materials and increasing our understanding of ourselves, our world and our universe.
ESnet, the Energy Sciences Network, provides the high-bandwidth, reliable connections that link scientists at 40 DOE research sites to each other and to experimental facilities and supercomputing centers around the country. The National Energy Research Scientific Computing Center (NERSC) powers the discoveries of 7,000-plus scientists at national laboratories and universities, including those at Berkeley Lab's Computational Research Division (CRD). CRD conducts research and development in mathematical modeling and simulation, algorithm design, data storage, management and analysis, computer system architecture and high-performance software implementation. NERSC and ESnet are Department of Energy Office of Science User Facilities.
Lawrence Berkeley National Laboratory addresses the world's most urgent scientific challenges by advancing sustainable energy, protecting human health, creating new materials, and revealing the origin and fate of the universe. Founded in 1931, Berkeley Lab's scientific expertise has been recognized with 13 Nobel prizes. The University of California manages Berkeley Lab for the DOE’s Office of Science.
DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.