InTheLoop | 07.20.2015
MANTISSA Finds New Ways To Solve Big Data Analysis Challenges
Researchers at Berkeley Lab are working to address emerging data management and analysis issues through MANTISSA, a DOE-funded program that supports development of novel algorithms to enable new software tools in various science domains to run at scale on current and next-generation supercomputers. »Read more.
Zooming In: Revealing Science Concealed in Large Image Data Sets
By applying a combination of machine learning and pattern recognition, Daniela Ushizima and her Lawrence Berkeley National Laboratory colleagues are developing techniques to automatically analyze data, filtering them for useful information. »Read more.
The MiSIng Piece Revealed: Classifying Microbial Species in the Genomics Era
In a study published ahead online July 6, 2015 in Nucleic Acids Research (NAR), a team of researchers from the U.S. Department of Energy Joint Genome Institute (DOE JGI), a DOE Office of Science User Facility and their collaborators developed and evaluated a new method for classifying microbial species that could be supplemented – as needed – by traditional approaches relied on by microbiologists for decades.
The team implemented the MiSI method over a massive database of more than 13,000 bacterial and archaeal high quality genomes selected from the Integrated Microbial Genomes (IMG) database. The DOE Office of Science's National Energy Research Scientific Computing Center at Berkeley Lab partners with JGI to make high-performance computing and storage available to its researchers. »Read more.
Science DMZ Set to Expedite Data In and Out of ORNL
Oak Ridge National Laboratory is deploying a “Science DMZ,” a network architecture first developed by ESnet and NERSC. The Science DMZ reroutes large datasets around firewalls, yet ensures that the networking and computing resources are protected. A number of labs have deployed science DMZs, as have more than 100 universities under an NSF program. ORNL’s Science DMZ will accelerate the transfer of data to and from the Oak Ridge Leadership Computing Facility. »Read more.
This Week's CS Seminars
Tiling and Asynchronous Communication Optimizations for Stencil Computations
Tuesday, July 21, 9:00 - 10:00am, NERSC OSF 943, Conference Room 238, Tareq Malas, King Abdullah University of Science and Technology
The importance of stencil-based algorithms in computational science has focused attention on optimized parallel implementations for multilevel cache-based processors. Temporal blocking schemes leverage the large bandwidth and low latency of caches to accelerate stencil updates and approach theoretical peak performance. A key ingredient is the reduction of data traffic across slow data paths, especially the main memory interface. In this work we introduce a multicore-optimized wavefront diamond temporal blocking (MWD) scheme, which leverages shared outer-level caches and has reduced cache size requirements compared to other methods. MWD shows performance advantages in bandwidth-starved situations, which are exacerbated by the high bytes per lattice update case of variable coefficients. Our thread groups concept provides a controllable trade-off between concurrency and memory usage, shifting the pressure between the memory interface and the CPU. We present performance results on a contemporary Intel processor.
REMOTE ACCESS DETAILS
- PC, Mac, iOS or Android: https://zoom.us/j/501983740
- Phone: +1 646 568 7788 (US Toll) or +1 415 762 9988 (US Toll)
Meeting ID 501 983 740
International numbers available: https://zoom.us/zoomconference
- H.323/SIP room system:184.108.40.206 (US West) or 220.127.116.11 (US East)
Meeting ID 501 983 740
CS Summer Student Seminar: Origami Workshop
Tuesday, July 21, 11am-12pm, 50B-4205
Terry Ligocki, Berkeley Lab
Unit origami is a form of origami where individual pieces (units) are folded and assembled into more complex, beautiful geometric shapes. The Origami Workshop is a relaxed "hands on" workshop where everyone will learn to fold an origami unit and then, in groups, will assemble them. If time permits, instructor will demonstrate a second type of unit for all to attempt completion of a second type of assembly. Instructor will bring examples of completed unit origami objects. All folding paper will be supplied. No previous experience folding is necessary or required. Come and have fun!
BIDS Seminar—SFrame and SGraph: Scalable External Memory Data Frame and Graph Structures for Machine Learning
Wednesday, July 22, 1:30–3:00 pm, 190 Doe Library, UC Berkeley
Jay Gu, Dato (formerly known as GraphLab)
A good machine learning platform requires not just robust implementations of statistical models and algorithms but also the right data structures for efficient and scalable feature engineering and data cleaning. In this talk, we discuss SFrame and SGraph, two scalable data structures designed with machine learning tasks in mind. These external memory structures make efficient use of disks and utilize a whole bag of tricks for speed. On a single machine, SFrame supports real-time interactive query on terabytes of data. When used in a distributed setting, SGraph supports iterative graph analytics tasks at unparalleled speed. On a graph with 100 billion edges, SGraph computes Pagerank at 30secs/iter with only 16 EC2 machines. We walk through the architectural design and discuss tricks for scale and speed. SFrame and SGraph are the backbone of a new Python machine learning platform called GraphLab Create. Both are available for download as open source projects or as part of the GraphLab Create binary.
Bringing the User into Systems Research and Software Development for Science
Thursday, July 23, 11am -12pm, 50F-1647
Lavanya Ramakrishnan, Berkeley Lab
Scientific facilities are increasingly generating large data sets. Most of these communities face a number of challenges in data management, workflow management and data transfer while managing the workflow. Next-generation scientific productivity relies on user-friendly tools and efficient and effective execution of these data workflows. Traditional approaches to efficiency in supercomputers focus on the hardware and software of the machine and do not consider the user. User research focuses on understanding user behaviors, needs, and motivations and has been used for web interfaces and product development but has rarely been applied to research and development of software environments for science. In this talk, I will highlight various aspects of user research through projects in the Usable Software Systems (USS) group. The user research complements the systems research and software development that is necessary to build next generation software ecosystems.
Link of the Week: Celebrate the 46th Anniversary of Humankind's 'Giant Leap'
Exactly 46 years ago today, the crew of the Apollo 11 landed on the moon, making it the perfect day to revisit the web site NASA created last year to celebrate the 45th anniversary of this historic day. »Read more.