Yelick to Receive ACM and IEEE Computer Society Ken Kennedy Award
ACM and IEEE Computer Society have named Katherine Yelick as the recipient of the 2015 ACM/IEEE Computer Society Ken Kennedy Award for innovative research contributions to parallel computing languages that have been used in both the research community and in production environments. She was also cited for her strategic leadership of the national research laboratories and for developing novel educational and mentoring tools. The award will be presented at SC 15: International Conference for High Performance Computing, Networking, Storage and Analysis November 17, in Austin, Texas. »Read more.
Register Now for Nov. 12 Wang Hall Opening Ceremonies
All Lab employees are invited to attend the dedication of Computing Sciences' new home, Wang Hall (also known as the Computational Research and Theory facility, or CRT). Starting at 1 p.m. on Thursday, Nov. 12, a dedication ceremony will be followed by tours of the building and a symposium on “Pioneering the Next Computing and Internet Frontier for Scientific Discovery.” The dedication and symposium will be held on the fourth floor of the building prior to installation of the furniture, so everyone in the lab community is invited to attend.
»Registration is required for all events.
Is Your Digital Information More at Risk Today than a Decade Ago?
Perceptions to the contrary, personal data breaches aren't happening any more frequently today than a decade ago, according to a recent study coauthored by Steven Hofmeyr of the Computational Research Division at Berkeley Lab. More surprisingly, the study found that data breaches, in general, aren't getting any bigger and happen twice as often by accident as by theft.
Hofmeyr and his University of New Mexico collaborators came to those conclusions by analyzing a decade of information about data breaches published by the Privacy Rights Clearinghouse. The resulting write up of their study “Hype and Heavy Tails: A Closer Look at Data Breaches,” won the Best Paper Award at the Workshop on the Economics of Information Security in June. »Read more.
After 10 Years, IMG Still Revolutionizing Genomics
In 2005, the Integrated Microbial Genome (IMG) data management system was launched to support comparative analysis of genomes sequenced at the Department of Energy’s Joint Genome Institute (JGI). At that time, the system had only a few registered users and contained about 3,000 genomes.
A decade later IMG is one of the largest publicly available data management and analysis systems for microbial genome and metagenome datasets, containing about 50,000 datasets. The system also has more than 13,500 registered users from 93 countries across six continents, has contributed to thousands of published papers and has served as a tool for teaching genome and metagenome comparative analysis at numerous universities and colleges around the globe.
On this milestone anniversary, the two researchers from Berkeley Lab who have led the development of IMG—Victor M. Markowitz, leader of the Lab’s Biosciences Computing Group, and Nikos C. Kyrpides, head of JGI’s Prokaryotic Super Program—reflect on the development, evolution and impact of this system. »Read more.
Upcoming CS Seminars
Accelerate Data Storage and Analysis for Scientific Discovery
Wednesday, Oct. 21, 11am – 12pm, Bldg 50F Room 1647
Bin Dong, Scientific Data Management Group, CRD, Berkeley Lab
Many large scientific activities produce massive amounts of data, and the data volumes are quickly increasing. For example, the Advanced Light Sources and Fusion facilities researchers are now able to produce terabytes of data in a short period of hours to days. Climate model and particle physics simulations produce petabytes of data per run. The primary requirements alongside these mountains of data are the scalable storage system, whose capability matches the speed of data production and the new data analysis tool, which allows domain scientists to quickly extract scientific insights. In this talk, I will introduce my previous and ongoing efforts in the investigation of novel theorems and algorithms to tune storage systems by way of parallel I/O and to accelerate scientific insight discovery process through in-situ data analysis.
Applied Math: An Analysis of Implicit Sampling in the Small-Noise Limit
Wednesday, Oct. 21, 3:30 – 4:30pm, 939 Evans Hall, UC Berkeley
Kevin Lin, University of Arizona
Weighted direct samplers, also known as importance samplers, are Monte Carlo algorithms for generating independent, weighted samples from a given target probability distribution. Such algorithms have a variety of applications in, e.g., data assimilation and state estimation problems involving stochastic and chaotic dynamics. One challenge in designing and implementing weighted samplers is to ensure the variance of the weights (and that of the resulting estimator) are well-behaved. In recent work, Chorin, Tu, Morzfeld, and coworkers have introduced a class of novel weighted samplers, called implicit samplers, which have been shown to possess a number of nice properties. In this talk, I will report on an analysis of the variance of implicit samplers in the small-noise limit, and describe a simple method (suggested by the analysis) to obtain higher-order implicit samplers. The algorithms are compared on a number of concrete examples. This is joint work with Jonathan Goodman and Matthias Morzfeld.
Neyman Seminar: Detection of Local Genomics Signals
Wednesday, Oct. 21, 4 – 5pm, 1011 Evans Hall, UC Berkeley
David Siegmund, Stanford University
Several problems of genomic analysis involve detection of local genomic signals. For data generated by paired end reads, we consider a model built from Poisson processes for detection of insertions and deletions. Statistics are suggested that use (i) variations in insert length, (ii) hanging reads, or (iii) read depth. Significance thresholds accounting for multiple comparisons are determined. The marginal power is computed and used to determine and compare the usefulness of the different statistics under various conditions. This is joint research with Nancy Zhang, Benjamin Yakir, and Charlie L. Xia.
BIDS Data Science Lecture: Reflections on Data Science in Some Real-World Applications
Wednesday, Oct. 21, 4 – 5pm, 1011 Evans Hall, UC Berkeley
Aman Ahuja, Data Science Consultant, The Data Guild
In this talk, I will share some stories from my years of consulting and advising around data products, highlighting cases where real-world constraints led to a surprising result or creative solution. While reflecting on my biased sample of cases, I may also share, if I’m feeling bold, some of the mistakes I’ve made along the way. To illustrate some technical ideas more concretely while protecting clients, I will use some simplified or toy examples. Participation and questions encouraged.
Finite Element Solution of Interface and Free Surface Three-Dimensional Fluid Flow Problems Using Flow-Condition-Based Interpolation
Thursday, Oct. 22, 10 – 11:00am, Bldg. 50B, Room 4205
Soyoung You, Department of Aerospace Engineering, The University of Texas at Austin
In scientific research and industry, accurate and conservative methods for free surface analysis are essential. Especially, mass conservation and accurate dynamic response must be attained because of safety concerns. However, nonlinear effects induced by continuously moving domains make this goal particularly difficult to achieve. This presentation shows the physical and computational guidelines using an Arbitrary Lagrangian Eulerian (ALE) method fundamentally derived from the Reynolds transport theorem to compute unsteady Newtonian flows including fluid interfaces and free surfaces. The calculation accounts for the frequently overlooked nonlinearity effects, which are costly to treat and therefore require a particular treatment in order to allow the use of large time steps and achieve a computationally efficient method. The Navier-Stokes equations are then solved using a ‘flow-condition-based interpolation’ (FCBI) scheme along with a finite element scheme. The FCBI method uses exponential interpolations derived from the analytical solution of the 1-dimensional advection-diffusion equation in order to account for up-winding effects. The resulting method conserves mass very accurately, and is stable and accurate even when using coarse meshes. Finally, a 2-dimensional FCBI method with special focus on its application to flow problems in highly nonlinear moving domains featuring interfaces and free surfaces is revisited. An effective and newly developed 3-D FCBI tetrahedral element is also presented in the context of such applications. The 3-D FCBI solution scheme can solve a wide range of flow problems since it can handle highly nonlinear and unsteady flow conditions, even when large mesh distortions occur. Various example solutions are presented to show the effectiveness of the developed solution schemes.
Link of the Week: Berkeley Play Examines Life of Computing Pioneer Ada Lovelace
Ada Lovelace, the math genius credited with creating the first algorithm for computing, is the subject of a play that just opened at the Berkeley City Club. Lovelace, who was the daughter of English poet Lord Byron, is best known for an article about Charles Babbage’s "Analytical Engine," a theoretical, but never built computer. In the paper, printed in 1843, she presented the first documented computing algorithm (for calculating Bernoulli numbers). She also identified the potential for computers to do far more than calculate. The play, "Ada and the Memory Engine," runs through Nov. 22. A number of other activities around the world are planned to mark the 200th anniversary of Lovelace's birth on Dec. 10, 2015. »Read more.