Summer Students Further Their Careers While Participating in Cutting-Edge Research
July 1, 2004
Each summer, Berkeley Lab’s Computing Sciences organization hosts students from various universities in the United States and abroad. Additionally, through the DOE’s Computational Science Graduate Fellowship (CSGF), which works to identify and support some of the best computational science graduate students in the nation, fellows participate in a three-month practicum at a DOE research laboratory. Berkeley Lab is hosting two CSGF fellows this summer.
One CSGF fellow at the Lab, Michael Wolf of the University of Illinois, Urbana- Champaign, is working on improving the parallel efficiency of the electromagnetic field solver Tau3P, which he co-developed while working at the Stanford Linear Accelerator Center (SLAC) for five years. While he was at SLAC he worked with Ali Pinar and Esmond Ng of the Scientific Computing Group (SCG), the same two researchers he is collaborating with this summer. His results of parallel mesh partitioning have been encouraging, and he also made some progress improving the matrix/vector multiplication algorithm used in Tau3P.
“When developing algorithms, there is a danger that you might forget about their applica- tion,” said Michael, who’s getting his Ph.D. in Computer Science. “I want to create algorithms that are useful to the scientific community, and labs such as LBNL afford me this opportunity.” With an undergraduate degree in biology and computer science from Harvey Mudd College, Michael finds he gravitates toward computational biology. During his Lab practicum he’s also working on creating biological computational models with Physical Biosciences researcher Teresa Head- Gordon. “Computational biology has these huge, seemingly unsolvable problems,” he said. “I love the challenge of finding the balance between computational feasibility and model accuracy.”
Computing Science’s other CSGF fellow, Ben Lewis of the Massachusetts Institute of Technology (MIT), is also working in biology, specifically genomics. Ben is working with Mike Eisen in LifeSciences, comparing DNA sequences from Sciences, comparing DNA sequences from multiple fruit fly genomes, and studying the evolution of DNA regions that may be involved in the transcriptional regulation of genes essential for normal fly development, metabolism and more. Regions of DNA that have been preserved in evolution among divergent species of fruit flies are more likely to be involved in processes that are essential to the organisms' fitness.
“High performance computing makes it possible to do thorough, detailed analyses of the vast amounts of genomic sequence data produced by several recent fruitfly genome projects,” he said. “We split the genomes into little pieces and compare them in parallel; clusters allow us to do this quickly and efficiently.”
Working in Eisen’s lab, Ben is developing algorithms that make use of genome sequence data from six drosophila species to identify sequence motifs that may be involved in the transcriptional regulation of a family of genes called microRNA genes. The unique properties of microRNA genes, such as their small size and presumed importance in development, make these genes an ideal subject for computational comparative genomics analyses of gene expression.
“Biology will be totally different in a few years,” Ben said. “We’re going through a transformation in how research is done. Computation-based discoveries are begin- ning to lead the way in research areas that have long been dominated by experimentalists.”
In addition to hosting the CSGF fellows, Computing Sciences hosts its own Summer Student Program, which gives students an opportunity to gain relevant research experi- ence while pursuing their degree. This year 16 students are partnering with one or more staff members on well-defined research projects.
“We rely on a huge number of students,” Deb Agarwal, head of the Distributed Systems Department, told the summer stu- dents at one gathering. “Don’t underestimate your impact at the Lab; this is really important research.”
Many of their projects, which they develop at the Lab during the 12-week summer pro- gram, become the basis for their theses. Several of the students presented their research findings in an open seminar on Tuesday, August 3.
Here is a summary of their presentations:
Ryan McKenzie from the University of Kentucky presented a talk titled “Building High-Level Tool Interfaces with Python.” He addressed the advantages of using high-level software interfaces as teaching tools, specifically in the context of the DOE ACTS Collection. He also dis- cussed the challenges in designing and implementing such an interface. Ryan has been working with SCG’s Tony Drummond and Osni Marques.
Viral Shah from UC Santa Barbara presented “Parallel Programming without MPI.” He is working on further developing Matlab*P, what he calls a “simpler, more elegant way to write parallel programming.” He hopes that his work on Matlab*P will become part of his graduate thesis. Viral has been working with SCG’s Parry Husbands, who wrote the original version of Matlab*P for his thesis.
Hormozd Gahvari from UC Berkeley presented “Benchmarking Sparse Matrix-Vector Multiplication.” A sparse matrix is a matrix that contains mostly zeros with just a few non-zeros. The TOP500 supercomputers are benchmarked by Linpack, which is not a sparse matrix, but a dense matrix multiplication. Gahvari is interested in studying a new approach to benchmarking—creating a version of APEX Map that simultaneously runs on multiple streams. He works with Erich Strohmaier and Hongzhang Shan of the Future Technologies Group.
Hui Xiong from the University of Minnesota presented “Hyperclique Pattern Discovery and Its Application to ProteinFunctional Module Extraction.” Hui is researching how the application of a hyper- clique pattern — a type of association pattern containing objects that are highly affiliated with each other — can identify functional modules in protein complexes. Proteins in the same functional module tend to be involved in common elementary biological functions. He has been working with SCG’s Chris Ding and Stephen Holbrook in Physical Biosciences.
Konrad Malkowski from Pennsylvania State University presented “Data Mapping Techniques for Sparse Matrix Factorization.” He is studying how to distribute data during sparse matrix factorization in order to improve system performance. He is also working on improving the accuracy of model's predictions. Specifically, he’s studying the applications of multiple pass methodology in optimizing data mapping. The goal of his work is to make clusters more efficient during solution of sparse matrix systems using direct methods. Konrad has been working with SCG’s Esmond Ng and Parry Husbands.
Lisa Cowan from Mills College presented “Performance of Overlay Construction Algorithms in Representative Applications.” An overlay network is a virtual network built on top of a physical network, such as the Internet. She is studying how to implement an overlay construction algorithm, implement it in a typical application and evaluate its performance. Lisa works with Karlo Berket in the Collaboration Technologies Group.
About Computing Sciences at Berkeley Lab
The Lawrence Berkeley National Laboratory (Berkeley Lab) Computing Sciences organization provides the computing and networking resources and expertise critical to advancing the Department of Energy's research missions: developing new energy sources, improving energy efficiency, developing new materials and increasing our understanding of ourselves, our world and our universe.
ESnet, the Energy Sciences Network, provides the high-bandwidth, reliable connections that link scientists at 40 DOE research sites to each other and to experimental facilities and supercomputing centers around the country. The National Energy Research Scientific Computing Center (NERSC) powers the discoveries of 7,000-plus scientists at national laboratories and universities, including those at Berkeley Lab's Computational Research Division (CRD). CRD conducts research and development in mathematical modeling and simulation, algorithm design, data storage, management and analysis, computer system architecture and high-performance software implementation. NERSC and ESnet are Department of Energy Office of Science User Facilities.
Lawrence Berkeley National Laboratory addresses the world's most urgent scientific challenges by advancing sustainable energy, protecting human health, creating new materials, and revealing the origin and fate of the universe. Founded in 1931, Berkeley Lab's scientific expertise has been recognized with 13 Nobel prizes. The University of California manages Berkeley Lab for the DOE’s Office of Science.
DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.