A-Z Index | Directory | Careers

NetLogger Helps Supernova Factory Improve Data Analysis

May 1, 2005

The Nearby Supernova Factory (SNfactory) project, established at Berkeley Lab in 2002, aims to dramatically increase the discovery of nearby Type 1a supernovae by applying assembly-line efficiencies to the collection, analysis and retrieval of large amounts of astronomical data.

To date, the program has resulted in the discovery of about 150 Type 1a supernovae – about three times the entire number reported before the project was started. Type Ia supernovae are important celestial bodies because they are used as “standard candles” for gauging the expansion of Screen-Shot-2015-07-02-at-2.08.50-PM.pngthe universe.               

Contributing to the SNfactory's remarkable discovery rate is its custom-developed “data pipeline” software. The pipeline fills with up to 50 gigabytes (billion bytes) of data per night from wide-field cameras built and operated by the Jet Propulsion Laboratory's Near Earth Asteroid Tracking program (NEAT).NEAT uses remote telescopes in Southern California and Hawaii.

Around 25,000 new images are captured each day, and the goal is to complete all processing before the next day’s images arrive. Image data is copied in real time from the Mt. Palomar Observatory in Southern California to a mass storage system at NERSC. Then the image data is copied to a large shared disk array on a 344-node cluster called PDSF. Each image is 8 MB (uncompressed), and the processing of each image requires between 5 and 25 reference images, for a total disk space requirement of about 0.5 TB each day.

Supernovae are found by comparing recently acquired telescope images with older reference images. If there is a source of light in the new image that did not exist in the old image, it could be a supernova. Subtracting the new image from the reference image identifies new light sources. This process is quite delicate: aligning the images, matching the point-spread functions, and matching the photometry and bias all require precise calibration.

Because of the high demand put on all the resources in the pipeline, making sure that the data flow smoothly and can be analyzed quickly and correctly is critical to the overall success. While there are a number of tools for evaluating the performance of single systems, identifying the workflow bottlenecks in a distributed system such as the SNfactory requires a different type of application.

For the past 10 years, Brian Tierney and others in the Collaborative Computing Technologies Group have been developing the Netlogger toolkit as part of the Distributed Monitoring Framework project.

NetLogger is a set of libraries and tools to support end- to-end monitoring of distributed applications. During the past few months, the team has been working closely with the SNfactory project to help debug and tune their application. “NetLogger has been extremely useful in the debugging and commissioning of our data processing pipeline,” said Stephen Bailey, one of the lead developers on SNfactory project. “It has helped us identify bugs and processing bottlenecks in order to improve our efficiency and data quality. It additionally has allowed real time monitoring of the data processing to quickly identify problems that need immediate attention. This debugging, commissioning, and monitoring would have taken much longer without NetLogger.”

Tierney and Bailey, along with Dan Gunter of the Collaborative Computing Technologies Group, have written a paper entitled “Scalable Analysis of Distributed Workflow Traces,” which will be presented at the 2005 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'05) to be held June 27-30 in Las Vegas. The paper can be found at <http://dsd.lbl.gov/publications/NetLogger- SNFactory.pdf>.

“The first problem the SNFactory scientists asked us to solve was to figure out why some of their workflows where failing without any error messages as to the cause,” Tierney said. “Even when error messages were generated, the SNfactory application produced thousands of log files, and it was very difficult to locate the log messages related to failed workflows. NetLogger was very useful for easily characterizing where the failures were occurring so they would know where to focus debugging efforts.” The figure below shows a typical workflow for the SNfactory application on a single cluster node. CPU and network data is shown at the bottom.

 

 

This figure actually demonstrates a bug in the SNfactory processing that went undetected for several months before NetLogger analysis.

The SNfactory application processes a group of images together, starting with uncompressing the images, and then doing image calibration and subtraction. The next step is to generate a skyflat image, which is a calibration image that is formed from a median combination of several of other images. The skyflat is used to correct other images to adjust for the sky brightness on a given night, which can vary due to humidity, cloud cover, and so on. The skyflat calibration image is then applied to all images within the job. Under some conditions it was determined erroneously that the skyflat calibration was not necessary. All lifelines except the two nearly vertical ones near the beginning should have converged at the setskyflat event.


About Computing Sciences at Berkeley Lab

High performance computing plays a critical role in scientific discovery. Researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab’s Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.