InTheLoop | 06.01.2015
Meraculous Deciphers Book of Life with Supercomputers
Genomes are like the biological owner’s manual for all living things. Cells read DNA instantaneously, getting instructions necessary for an organism to grow, function and reproduce. But for humans, deciphering this “book of life” is significantly more difficult.
By applying some novel algorithms, computational techniques and the innovative programming language Unified Parallel C (UPC) to the cutting-edge de novo genome assembly tool Meraculous, a team of scientists from the Berkeley Lab's Computational Research Division (CRD), Joint Genome Institute (JGI) and UC Berkeley, simplified and sped up genome assembly. Their work reduces a months-long process to minutes. This advance was primarily achieved by “parallelizing” the code to harness the processing power of supercomputers, such as the National Energy Research Scientific Computing Center’s (NERSC’s) Edison system. »Read more.
Register Now: Free Stanford Short Courses Held Prior to SIAM Geosciences 2015
On Sunday June 28, the Institute for Computational & Mathematical Engineering at Stanford University will offer a series of short courses in a range of interesting topics, including data science, optimization, software engineering and geosciences applications. The short courses—which precede the SIAM Geosciences 2015 meeting hosted by Stanford—are free and open to everyone, including researchers at national labs and summer interns. It is not necessary to register for the SIAM Geosciences conference to attend the Sunday short courses. »See the list of offerings and register now.
Funding Available to Attend Grace Hopper Conference
The Grace Hopper Celebration of Women in Computing conference will be held in Houston, TX on October 14-16, 2015.
This conference is designed to bring the research and career interests of women in computing to the forefront. Presenters are leaders in their respective fields, representing industrial, academic, and government communities. Leading researchers present their current work, while special sessions focus on the role of women in today’s technology fields, including computer science, information technology, research, and engineering.
The Berkeley Lab Computing Sciences Diversity Working Group is able to pay the expenses for few lab staff to attend. All classes of employees and affiliates within the Computing Sciences area (ESnet, CRD, NERSC or CS) are eligible to apply (including guests and summer students). Funding is available to cover some or all travel expenses (including food and lodging); some registrations may also be covered, depending on available funds and number of applications. Your division will continue to be responsible for paying your salary during the conference. If you would like to be considered, please contact Elizabeth Bautista with the following information:
- Your name,
- Your supervisor's name, and
- A one paragraph summary of why you would like to attend the conference.
The deadline for submitting applications is 5:00 p.m., Friday, June 12. You must be able to register within a few days of being approved for funding.
ESnet, SDN to Play Roles as LHC Fires up Again in June
"When the Large Hadron Collider (LHC) starts back up in June, the data collected and distributed worldwide for research will surpass the 200 petabytes exchanged among LHC sites the last time the collider was operational. Network challenges at this scale are different from what enterprises typically confront...," writes John Dix in a May 26 Network World article. In the piece, Dix interviews two people involved in architecting the world-wide collaboration of networks and computing centers charged with moving, storing, serving and crunching petabytes of data from the world's largest atom smasher. The interview highlights ESnet's role in carrying large data flows from the LHC and contributing reserved bandwidth via OSCARS. They also discuss the part that will be played by software defined networking (SDN) using open-source Openflow software, which ESnet has also been engaged in researching. »Read more.
Steering Clear of ‘Sneakernet’ at Big-Data Scale
In a May 27 article about the continuing prevalence of "sneakernet," GCN (Government Computing News) cited ESnet as an example of the really big data that's being carried on networks and more to come in future years. "ESnet helps to directly connect a long list of national laboratories, institutions and research facilities. It currently carries about 20-petabytes of data a month, and is expected to grow to 100 petabytes by 2016," wrote article author Amanda Ziadeh. »Read more.
This Week's CS Seminars
Adaptive Parallelism Mapping in Dynamic Environments using Machine Learning
Monday, June 1, 10:00am-11:00am, 50B-4205
Murali Emani, University of Edinburgh, UK
Modern day hardware platforms are parallel and diverse, ranging from mobiles to data centers and co-location of mainstream parallel applications is increasingly becoming common. The execution environment composed of workloads, hardware, software, data and others is dynamic and unpredictable. Efficient matching of program parallelism to machine parallelism under this uncertainty is hard. The mapping policies should anticipate these variations and enable effective resiliency to the applications to achieve peak performance. This talk proposes few techniques based on predictive modelling techniques to adaptively map programs by determining the best degree of parallelism. When evaluated on highly dynamic executions, these solutions are proven to surpass default, state-of-art adaptive and analytic approaches.
Is Event Attribution Accurate? Evaluating simulated fraction of attributable risk with observations
Monday, June 1, 2:00pm - 3:00pm, 50F-1647
Fraser Lott, UK Met Office
NERSC Brown Bag — Memory Errors in Modern Systems: The Good, The Bad, and the Ugly
Monday, June 1, 12:15pm - 1:15pm, OSF Conf. Rm. 238
John Shalf, Lawrence Berkeley National Laboratory
Several recent publications have shown that hardware faults in the memory subsystem are commonplace. These faults are predicted to become more frequent in future systems that contain orders of magnitude more DRAM and SRAM than found in current memory subsystems. These memory sub- systems will need to provide resilience techniques to toler- ate these faults when deployed in high-performance computing systems and data centers containing tens of thousands of nodes. Therefore, it is critical to understand the efficacy of current hardware resilience techniques to determine whether they will be suitable for future systems.
We present a study of DRAM and SRAM faults and errors from the field. We use data from two leadership-class high-performance computer systems to analyze the reliability impact of hardware resilience schemes that are deployed in current systems. Our study has several key findings about the efficacy of many currently- deployed reliability techniques such as DRAM ECC, DDR address/command parity, and SRAM ECC an parity. We also perform a methodological study, and find that counting errors instead of faults, a common practice among researchers and data center operators, can lead to incorrect conclusions about system reliability. Finally, we use our data to project the needs of future large-scale systems. We find that SRAM faults are unlikely to pose a significantly larger reliability threat in the future, while DRAM faults will be a major concern and stronger DRAM resilience schemes will be needed to maintain acceptable failure rates similar to those found on today’s systems.
Data Intensive Research – Enabling and Optimizing Flexible ‘Big Data’ Workflows
Thursday, June 4, 12:00pm - 1:00pm, OSF 943-238 & 50B-4205
Ian Corner, CSIRO, Australia
As data growth and proliferation continues to outpace research grade infrastructure, do we need a new approach to the problem? Are we ‘big data’ ready? Or do we only have ‘lots of data?’ For ‘big data’ to have a future it needs to be discoverable, related, well connected, and easily mapped to existing/future workflows.
Can we transition from having ‘lots of data’ into ‘big data’ while: reducing costs, improving data management practices, accelerating workflows, and opening up new workflow possibilities? Data only speaks when it is analyzed. Analysis expresses relationship through workflow. ‘Big data’ requires optimized workflow while preserving relationship. Research is often ad-hoc so workflows must remain flexible. What data management frameworks and optimized infrastructure do we require to accelerate workflows and maintain relationship while reducing overheads? The CSIRO strategy is based on: 1. Increasing bandwidth of data to compute by building our disk arrays not for "data storage" but for "high speed data cache," 2. Transforming the use of petascale tape libraries from "back up" (as an afterthought) to "data storage with integrated protection", 3. Escaping the monolithic data problem by containerizing data sets to achieve a flexible mechanism for connecting data to workflows, and 4. Allowing throughput optimizations by using predefined data categories to communicate access patterns and protection regimes to the infrastructure. This talk looks at what can be achieved today, based on existing technologies and real eResearch workflows. It provides a scalable plan to survive and prosper, as data sets and workflows continue to grow.
This seminar will be video confrenced in two locations: Oakland 943-238 and Building 50B-4205. You may also join from by personal computer, smartphone or telephone by following the remote access instructions below.
REMOTE ACCESS INSTRUCTIONS
Join from PC, Mac, iOS or Android: https://zoom.us/j/3597564990
VOICE: +1 (415) 762-9988 or +1 (646) 568-7788 (US Toll)
Meeting ID: 359 756 4990
International numbers available: https://zoom.us/zoomconference