InTheLoop | 04.22.2013
DOE Pulse: Berkeley Lab Tackles Next-Generation Climate Models
A recent feature story in DOE Pulse points out that to keep up with nature, climate models are running at ever-higher resolutions, requiring ever-greater processing speeds and altered computer architectures. Michael Wehner of CRD notes that simulations run with a low-resolution climate model can give completely contrary results from a high-resolution version of the same model. John Shalf of NERSC and Wes Bethel of CRD are also quoted in the story. Read more.
Also, Wehner will be one of five speakers at tonight’s Science at the Theater presentation on climate change, “How Hot Will It Get?” A full house is expected, but live streaming is available. Click on the link above or see “This Week’s CS Seminars” below for details.
Oak Ridge Is First National Lab with 100G Connection to ESnet
On Tuesday, April 9, Oak Ridge National Laboratory in Tennessee became the first DOE lab with a 100-gigabit-per-second link to ESnet’s 100G backbone. Since the high-speed backbone went into production in late 2012, each of the labs served by ESnet has been developing plans to upgrade their connections to 100G. Lawrence Berkeley and Lawrence Livermore national labs are currently testing 100G links, which should go into production soon. Read more.
Kathy Yelick to Give Invited Talk at Salishan Conference
Since 1981, the Conference on High-Speed Computing has been convening annually at Salishan Lodge in Gleneden Beach, Oregon. This invitation-only conference is a means of getting experts in computer architecture, languages, and algorithms together to improve communications, develop collaborations, solve problems of mutual interest, and provide effective leadership in the field of high-speed computing.
This year’s conference is being held this week, April 22–25. Tomorrow Associate Lab Director Kathy Yelick will give a presentation on “Open Problems, Solved Problems and Non-Problems in DOE’s Big Data.” Here is the abstract:
DOE’s high-end computing programs have traditionally focused on modeling and simulation, but some of the biggest computing challenges today come from the data sets produced by the experimental user facilities within DOE’s Office of Science. With genome sequencers, light sources, telescopes and other detectors and sensors used throughout DOE, the data rates, volumes, complexity and analysis requirements are far beyond what an individual scientist can handle. In addition, there are opportunities for new scientific discoveries and increased science quality from the combination or re-analysis of previously collected data sets. In this talk, I will describe ongoing work to support the data streams coming from experimental facilities, identify some key research and facility challenges, and explain how these problems differ from other Big Data workloads. I will also argue that, contrary to some of the rhetoric, Big Data is intimately tied to modeling and simulation and will drive the need for advanced computing performance just as modeling and simulation requirements do. Finally, DOE’s tradition of strong interdisciplinary science teams makes it well positioned to tackle many of the data challenges that arise in science.
ESnet’s Greg Bell to Deliver Keynote at Canadian Conference
ESnet Director Greg Bell will give the keynote address at the 2013 THINK Conference organized by ORION, the high-speed network linking 1.8 million researchers in Ontario, Canada. The conference will be held April 25, 2013, in Toronto.
This year’s THINK Conference will focus on “Extreme Data,” including the trends, the issues, and what they mean for Ontario’s research, education, and innovation communities. The THINK Conference will challenge organizational leaders affected by the circumstances of the day to take notice, exchange knowledge, share best practices, and develop ideas that lead to innovative solutions.
According to Bell, “It’s time to start thinking about research networks as instruments for discovery, not infrastructures for service delivery.” In his talk, Bell will describe what’s at stake in this distinction, and explain how the culture and strategy of ESnet are influenced by its unusual institutional context: embedded in a US national laboratory, classified as a “user facility,” and located uphill from a famously audacious university.
Vis Group Staff Participating in DOE Computer Graphics Forum
The DOE Computer Graphics Forum is being held this week, April 22–25, in Portland, Oregon. While there are other venues for presenting and discussing graphics/visualization research, the DOE labs, with their emphasis on extreme scale computation, face unique challenges in visualization and data understanding. Thus, over the years, the forum has proven invaluable for sharing information and experiences, building cross-institutional collaborations, and coordinating efforts across programs.
Three member of CRD’s Visualization Group are contributing to this year’s forum: Hank Childs is the site chair; David Camp will give a talk on “Optimizing VisIt for Multi-Core Systems”; and Harinarayan Krishnan will give a talk on “New Python Scripting for VisIt.”
Daily Californian Looks at Potential Scientific Impact of NERSC’s Newest Supercomputer
An April 16 Daily Californian story featured NERSC’s Edison supercomputer and its potential role in future scientific breakthroughs. The story quoted Berkeley Lab’s David Bailey, Jon Bashor, and Kathy Yelick. Richard Gerber corrected an important number in the comments at the end of the article. Read more.
ASCR Discovery Features Former Alvarez Fellow Aleksandar Donev
The DOE newsletter ASCR Discovery recently posted a story, “Of colorful candies and fluid dynamics,” about Aleksandar Donev, assistant mathematics professor at New York University’s Courant Institute, 2009 Alvarez Fellow in CRD’s Center for Computational Sciences and Engineering (CSEE), and DOE Early Career Research Award recipient. Donev uses computer simulations to study how thermal fluctuations affect fluid behavior at scales comparable to the size of molecules. CCSE head John Bell was a co-author of the paper that provided the illustration for the article. Read more.
Huffington Post: Are the Digits of Pi Random?
A widely circulating pi meme asserts, among other things:
Pi is an infinite, nonrepeating decimal — meaning that every possible number combination exists somewhere in pi. Converted into ASCII text, somewhere in that infinite string of digits is the name of every person you will ever love, the date, time, and manner of your death, and the answers to all great questions of the universe.
So is this really true? The answer given in a Huffington Post blog by David Bailey, head of the Complex Systems Group in the Computational Research Division, and Jonathan Borwein of the University of Newcastle, Australia, is “probably, maybe....” The key difficulty, they say, is proving that pi is normal. Read more.
This Week’s Computing Sciences Seminars
Carbon Cycle 2.0: A Tale of Two Planets: The Earth That Was, and Soon Will Be
Monday, April 22, 12:00–1:00 pm, Bldg. 50 Auditorium
Bill Collins, LBNL
Climate change has arrived. Last year was the hottest year ever recorded in the United States. Australia just experienced an “angry summer” with unprecedented numbers of heat waves and fires.
In this talk, we discuss the latest evidence from across the scientific community documenting how the climate system has been changing. The weight of this evidence clearly shows that the Earth is rapidly warming and thawing. The data also suggest that these changes are very likely, and almost entirely, due to the action of manmade greenhouse gases and other pollutants on the Earth’s energy balance. Using state-of-the-science models developed by scientists at LBNL and other national labs, we show simulations of how the climate could evolve if current trends persist. Finally, we suggest how these models could be enhanced to explore options to mitigate global warming and how Berkeley Lab is developing the frameworks for investigating sustainable energy futures.
Science at Theater: How Hot Will It Get?
Monday, April 22, 6:00–9:00 pm, Berkeley Repertory Theater, 2025 Addison St.
Free admission, but RSVP to firstname.lastname@example.org ASAP for a reservation; a large crowd is expected.
HD live-streaming broadcast: http://www.lbl.gov/LBL-PID/fobl/index.html
Berkeley Lab climate scientists reveal how their new findings — from the tundra to the rainforest — could upend current thinking about the pace of climate change, and what this will mean for you. Speakers/topics:
Bill Collins and the balance of energy: What do computer models predict about the future of the earth’s climate?
Jeff Chambers and the rainforest: How much carbon do our forests absorb and what if this rate changes?
Margaret Torn and the Arctic permafrost: What happens to the Earth’s climate when the permafrost thaws?
Michael Wehner and extreme weather: What does high-performance computing tell us about heat waves, floods, droughts and hurricanes?
Maximilian Auffhammer and climate policy: What kind of carbon tax might actually work?
Toward Programmable High-Performance Multicores
Tuesday, April 23, 11:00 am–12:30 pm, 380 Soda Hall, UC Berkeley
Josep Torrellas, University of Illinois at Urbana-Champaign
One of the biggest challenges facing us today is how to design parallel architectures that attain high performance while efficiently supporting a programmable environment. In this talk, I describe novel organizations that will make the next generation of multicores more programmable and higher performance. Specifically, I show how to automatically reuse the upcoming transactional memory hardware for optimized code generation. Next, I describe a prototype of Record & Replay hardware that brings program monitoring for debugging and security to the next level of capability. I also describe a new design of hardware fences that is overhead-free and requires no software support. Finally, if time permits, I will outline architectural support to detect sequential consistency violations transparently.
Communication Lower Bounds for Programs That Access Arrays: Scientific Computing and Matrix Computations Seminar
Wednesday, April 24, 12:10–1:00 pm, 380 Soda Hall, UC Berkeley
Nick Knight, UC Berkeley
In a previous seminar talk (2/15/12), we described a generalization of the communication lower bound theory for 3-nested-loop-like algorithms. The new theory yields a lower bound on data movement for any computer program that accesses arrays via affine functions of the loop indices.
We review our recent results regarding practical computation of this lower bound. The decidability of one formulation of the problem implies a positive solution to Hilbert’s 10th Problem over the rationals (a longstanding open problem). But despite the possible undecidability, a useful approximation can still be obtained. More recently, we have devised another formulation of the problem that leads to an effective algorithm.
Having established this machinery for computing lower bounds, we now consider the related problem of attaining these bounds, i.e., constructing optimal algorithms. We sketch our current directions.
This is joint work with M. Christ, J. Demmel, T. Scanlon, and K. Yelick (UCB Math and CS depts.).
Superlinear Lower Bounds for Multipass Graph Processing
Thursday, April 25, 11:00 am–12:00 pm, Soda Hall, Wozniak Lounge, UC Berkeley
Venkatesan Guruswami, Carnegie Mellon University
The data stream model has emerged as an important algorithmic paradigm for computation on massive data sets. Algorithms and lower bounds in the streaming model have received significant attention in problem domains rife with data, such as statistics, clustering, and linear algebra. With large graphs arising naturally in many contexts, graph problems have also been considered in the streaming model. As streaming algorithms for even simple graph-theoretic tasks require space proportional to the number of vertices, the “semi-streaming” setting that allows quasi-linear space has been identified as a sweet-spot for graph streaming.
In this talk, we will rule out multipass semi-streaming algorithms for some basic graph problems. Specifically, we will present n^(1+Omega(1/p)) lower bounds for the space complexity of p-pass streaming algorithms for the following problems on n-vertex graphs:
- testing if the graph has a perfect matching,
- computing the distance between two specific vertices, and
- testing if there is a directed path between two vertices.
Before our result, it was known that these problems require Omega(n^2) space in one pass, but no n^(1+Omega(1)) lower bound was known for two or more passes.
Our result follows from a communication complexity lower bound for a communication game in which the players hold two graphs on the same set of vertices. The task of the players is to find out whether the sets of vertices reachable from a specific vertex in exactly p+1 steps intersect. We show that this game requires a significant amount of communication if the players are forced to speak in a specific difficult order. Our proof involves, among other things, establishing an information cost lower bound for a decision version of the classic pointer chasing problem and a direct sum type theorem for the disjunction of several instances of this problem.
Joint work with Krzysztof Onak.
Link of the Week: Beware the Big Errors of “Big Data”
Nassim N. Taleb, the author of Antifragile and The Black Swan, writes in Wired that we’re more fooled by noise than ever before, and it’s because of a nasty phenomenon called “big data.” With big data, researchers have brought cherry-picking to an industrial level. Modernity provides too many variables, but too little data per variable. So the spurious relationships grow much, much faster than real information.
In other words: Big data may mean more information, but it also means more false information. Taleb is not saying here that there is no information in big data. There is plenty of information. The problem — the central issue — is that the needle comes in an increasingly larger haystack. Read more.