InTheLoop | 04.19.2010
April 19, 2010
NERSC and JGI Tackle Genomics High Performance Computing
A torrent of data has been flowing from the advanced sequencing platforms at the Department of Energy’s Joint Genome Institute (JGI), among the world’s leading generators of DNA sequence information for bioenergy and environmental applications. Last year, JGI generated over one trillion nucleotide letters of genetic code for its various user programs, an eight-fold increase in productivity from 2008. This year JGI expects to sequence five times more data than the previous year, producing more than a petabyte of data.
To ensure that there is a robust computational infrastructure for managing, storing and gleaning scientific insights from this ever-growing flood of data, JGI is joining forces with NERSC. Computing systems will be split between JGI’s campus in Walnut Creek, Calif. and NERSC’s Oakland Scientific Facility, which are 20 miles apart. The NERSC Division will also manage JGI’s six-person systems staff to integrate, operate and support these systems. Read more.
Lab Is Confirming Employee Citizenship via E-Verify Project
As mandated by the Department of Homeland Security, the Laboratory is currently engaged in a project to E-Verify all active employees. The International Researchers & Scholars Office (IRSO) is leading this effort and is sending an email message (from either Bliss Calma or Gina Marquez of the IRSO) requesting information. Employees (guests are not included in this project) are asked to respond as quickly as possible.
There have been questions from employees related to the validity of I-9 / E-Verify e-mails from IRSO. Employees who have questions about the project can contact the IRSO at x6326. Articles about the E-Verify project appeared in the Jan. 25 and March 29 editions of Today at Berkeley Lab.
MMDS 2010 Registration Is Now Open
Registration is now open for the Workshop on Algorithms for Modern Massive Data Sets (MMDS 2010), to be held June 15–18 at Stanford University. The goals of MMDS 2010 are to explore novel techniques for modeling and analyzing massive, high-dimensional, and nonlinearly structured scientific and Internet data sets, and to bring together computer scientists, statisticians, mathematicians, and data analysis practitioners to promote cross-fertilization of ideas. Go here for registration information.
Safety Reminder: Uneven Surfaces and Stairs
Computing Sciences staff are reminded of the need for attention when walking on the uneven surfaces found in many locations at the Lab. An employee recently tripped on a concrete barrier in a parking lot, injuring his hand. Following discussion of the incident at the March 25 CS all-hands meeting, CS staff brought up concerns with tilted and rotting steps leading to the Pit parking lot. This safety issue was entered into the Corrective Action Tracking System. The Facilities Division has implemented a temporary fix for the one broken step, and additional repairs are planned for completion when the ground is dry enough to fix the supports to the stairs.
HP Day at Cal on Thursday
Hewlett Packard and Entisys Solutions will describe a variety of their technologies during HP Day at Cal at CITRIS headquarters, Sutardja Dai Hall, from 8:30 am to 1:15 pm on Thursday, April 22. For more information and to register, go here.
This Week’s Computing Sciences Seminars
Special EECS Seminar: Energy Efficient Computing
Monday, April 19, 1:00–2:00 pm, Soda Hall, Wozniak Lounge (430), UC Berkeley
Tajana Simunic Rosing, University of California, San Diego
In this talk I give an overview of the algorithms we have developed at UCSD to significantly lower the energy consumption in computing systems. We derived optimal power management strategies for stationary workloads that have been implemented both in HW and SW. Run-time adaptation can be done via an online learning algorithm that selects among a set of policies. We generalize the algorithm to include thermal management since we found that minimizing the power consumption does not necessarily reduce the overall energy costs. To reduce the performance costs typically associated with state of the art thermal management techniques, we developed a new set of proactive management policies. The experimental results using real datacenter workloads on an actual multicore system show that our proactive technique is able to dramatically reduce the adverse effects of temperature by over 60%. Most recently we have shown that symbiotic scheduling of workloads in virtualized environments can lead to average 15% energy savings with 20% performance benefit in high utilization scenarios.
I will also present some of the recent work we had done to address the energy savings in battery powered and energy harvesting systems. We are designing a new kind of “citizen infrastructure,” CitiSense, as an end-to-end health and environmental information system with near real-time data streams and feedback loops from the system to the sensing, processing, and actuation infrastructure. We have developed adaptive algorithms to tradeoff accuracy of computation versus the available energy for such systems, while taking into account the energy harvesting capabilities.
LAPACK Seminar: Finding Structure with Randomness: Stochastic Algorithms for Constructing Low-Rank Matrix Decompositions
Wednesday, April 21, 11:10 am-12:00 pm, 380 Soda Hall, UC Berkeley
Joel Tropp, CalTech
Computer scientists have long known that randomness can be used to improve the performance of algorithms. A familiar application is the process of dimension reduction, in which a random map transports data from a high-dimensional space to a lower-dimensional space while approximately preserving some geometric properties. By operating with the compact representation of the data, it is theoretically possible to produce approximate solutions to certain large problems very efficiently.
Recently, it has been observed that dimension reduction has powerful applications in numerical linear algebra and numerical analysis. This talk provides a high-level introduction to randomized methods for computing standard matrix approximations, and it summarizes a new analysis that offers (nearly) optimal bounds on the performance of these methods. In practice, the techniques are so effective that they compete with—or even outperform—classical algorithms. Since matrix approximations play a ubiquitous role in areas ranging from information processing to scientific computing, it seems certain that randomized algorithms will eventually supplant the standard methods in some application domains.
Looking for Patterns in Videos
Wednesday, April 21, 4:00-5:00 pm, HP Auditorium (306 Soda Hall), UC Berkeley
Minta Martin, University of Maryland
In this talk, I will first discuss some of the general principles for designing robust video-based pattern recognition systems. I will then present many examples of statistical techniques for video-based modeling and recognition of actions involving single and multiple humans using motion, shape and behavior as features. Methods based on dynamic texture models, landmarks and special manifolds will be discussed. Methods for addressing variations due to time warping and viewpoints will be illustrated for action recognition and unsupervised clustering of video sequences.
Link of the Week: Please Do Not Change Your Password
A new study has concluded what lots of us have long suspected: Many of those irritating cyber security measures are a waste of time. The study, by Cormac Herley, a principal researcher for Microsoft Research, found that instructions intended to spare us from costly computer attacks often exact a much steeper price in the form of user effort and time expended. “Most security advice simply offers a poor cost-benefit trade-off to users,” wrote Herley.
About Computing Sciences at Berkeley Lab
The Lawrence Berkeley National Laboratory (Berkeley Lab) Computing Sciences organization provides the computing and networking resources and expertise critical to advancing the Department of Energy's research missions: developing new energy sources, improving energy efficiency, developing new materials and increasing our understanding of ourselves, our world and our universe.
ESnet, the Energy Sciences Network, provides the high-bandwidth, reliable connections that link scientists at 40 DOE research sites to each other and to experimental facilities and supercomputing centers around the country. The National Energy Research Scientific Computing Center (NERSC) powers the discoveries of 6,000 scientists at national laboratories and universities, including those at Berkeley Lab's Computational Research Division (CRD). CRD conducts research and development in mathematical modeling and simulation, algorithm design, data storage, management and analysis, computer system architecture and high-performance software implementation. NERSC and ESnet are DOE Office of Science User Facilities.
Lawrence Berkeley National Laboratory addresses the world's most urgent scientific challenges by advancing sustainable energy, protecting human health, creating new materials, and revealing the origin and fate of the universe. Founded in 1931, Berkeley Lab's scientific expertise has been recognized with 13 Nobel prizes. The University of California manages Berkeley Lab for the DOE’s Office of Science.
DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.