InTheLoop | 04.14.2014
CRD Staff Host Albany HS Students
Computing Research Division staff are helping Albany High School students explore careers in science and mathematics. On April 10, CRD staffers joined other lab employees hosting students for career "shadow" days. Dan Martin, Andy Nonaka, Zarija Lukic, Dave Donofrio and Pardeep Pall volunteered to share their experiences with students interested in their fields. Rollin Thomas will host a student on April 17. Jon Bashor, computing sciences communications manager, organized the effort.
Daughters & Sons to Work Day is April 24
The Lab's popular Daughters & Sons to Work day is happening Thursday, April 24. The deadline for registering children to participate is Friday, April 18. Volunteers are urged to sign up to help with tasks from escorting students around the lab to serving ice cream at the end-of-day social. »Learn more.
Reminder: ESnet, NERSC Say 'Cheese' for NASA Earth Day Mosaic
Today, the staff of ESnet is invited to participate in a group photos that will become part of NASA's GlobalSelfie day. The NERSC photo happens Wednesday.
ESnet: 1:00pm, Monday, April 14, Building 50 auditorium
NERSC: 1:00pm, Wednesday, April 16, OSF computer room
NASA plans to collect and use photos posted to social media on that day to recreate the iconic Blue Marble image taken in 1972 by the crew of the Apollo 17 spacecraft.
ESnet’s Jason Zurawski Giving Two Talks on Improving Network Endpoint Performance
Jason Zurawski, a science engagement engineer for ESnet, will be giving two presentations in the coming weeks at meetings aimed at helping network professionals improve network performance at end sites.
At the Globus World conference being held April 15-17 at Argonne National Laboratory, Zurawski will discuss the design for Science DMZs (which speed up the movement of science data) and tuning data transfer nodes for improved performance. His talk will be part of a session on Advanced Endpoint Configuration and Integration.
The following week, Zurawski will give a remote presentation on perfSONAR to attendees of the Campus Network Monitoring and Security Workshop sponsored by CESNET, the Czech Education and Scientific Network, to be held April 24 -25, 2014 at Czech Technical University in Prague. perfSONAR is a collaboratively developed tool for measuring and improving network performance. Zurawski began working on perfSONAR while a graduate student at the University of Delaware.
This Week's CS Seminars
An Algorithmic Framework for X-ray Nanocrystallographic Reconstruction in the Presence of the Indexing Ambiguity
Wednesday, April 16, 2014, 3:30pm - 4:30pm, 939 Evans Hall - UC Berkeley Campus »Map
Jeffrey Donatelli, Mathematics Group, Lawrence Berkeley National Laboratory
While conventional X-ray crystallography has been extensively used to determine atomic structure, its applicability is limited to objects than can be formed into large crystal samples. An appealing alternative, made possible by recent advances in light source technology, is X-ray nanocrystallography, which is able to image structures resistant to large crystallization by substituting a large ensemble of easier to build nanocrystals, which are delivered to an X-ray beam via a liquid jet. However, nanocrystallographic diffraction experiments suffer from severe shot-to-shot variability due to varying crystal sizes, orientations, and incident photon flux densities and the diffraction images are highly corrupted with noise.
Autoindexing techniques, commonly used in conventional crystallography, can determine partial orientation information using Bragg peak patterns, but only up to crystal lattice symmetry. This limitation results in an ambiguity in the orientations, known as the indexing ambiguity, when the diffraction data displays less symmetry than the lattice and leads to data that appear twinned if left unresolved. Furthermore, missing phase information must be recovered to determine the imaged object's structure.
An algorithmic framework is presented that utilizes a periodic analysis of both Bragg and non-Bragg data for precise autoindexing, Fourier analysis and image segmentation to reveal crystal size, multi-modal analysis coupled with scaling to correct for varying incident photon flux densities and identify structure factors, and clique analysis on a graph theoretical model of concurrency to resolve the indexing ambiguity. Additionally, the feasibility of determining structure through iterative phasing techniques, which have less experimental requirements than traditional phasing methods, is examined. Results are presented for several sets of simulated nanocrystallographic diffraction images using typical parameters and noise levels reported in current experiments.
SDS: A Framework for Scientific Data Services
Friday, April 18, 2014, 1:00pm - 2:00pm, Bldg. 50F, Room 1647
Suren Byna, Scientific Data Management Group, Lawrence Berkeley National Laboratory
Existing I/O stack for science lacks high-level semantic information and is a bottleneck for many simulation and data analysis efforts. Parallel file systems treat data as sequences of bytes stored in files. Even though, high-level I/O libraries, such as HDF5, can provide an abstraction of objects with rich data formats, they cannot easily adapt to changing application characteristics and evolving hardware. For example, writing large data to file systems need careful tuning of optimization parameters at different levels of parallel I/O stack. Similarly, when applications write their data to parallel file systems with organizations designed to achieve fast write speeds, analysis tasks reading the data in a pattern that is different from the write pattern experience poor I/O performance. Toward solving these problems, we introduce Scientific Data Services (SDS), a new framework for providing data management as services.
The initial implementation of SDS focuses on reorganizing previously written files into data layouts that benefit read patterns and transparently directing read calls to the reorganized data. Ongoing research includes automatic analysis of data access patterns, dynamic reorganization, parallel bitmap indexing, parallel querying, and supporting SQL queries on scientific data in their original file formats. In this talk, we describe the design and current implementation of the SDS architecture, and present the initial results.
Link of the Week: 'Big Data' Isn't Always Big
When it comes to "big data," size doesn't really matter, Forrester principal analyst Mike Gualtieri recently told an audience at the Hadoop Summit in Amsterdam. According to tech news service ZDnet, He pointed out that private genome sequencing companies like 23andMe produce a mere 800MB data per customer. "But within that, there are four billion pieces of information and lots of patterns. So it's a big processing challenge, it's a big compute challenge. You don't have to have petabytes of data to have a big-data opportunity or issue," ZDnet quoted Gualieri saying. »Read more.