Humanities and High Performance Computers Connect at NERSC
December 22, 2008
Contact: Linda Vu, 510.495.2402, LVu@lbl.gov
High performance computing and the humanities are finally connecting — with a little matchmaking help from the Department of Energy (DOE) and the National Endowment for the Humanities (NEH). Both organizations have teamed up to create the Humanities High Performance Computing Program, a one-of-a-kind initiative that gives humanities researchers access to some of the world’s most powerful supercomputers.
As part of this special collaboration, the DOE’s National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory will dedicate a total of one million compute hours on its supercomputers and technical training to humanities experts. Meanwhile, the program’s participants were selected through a highly competitive peer review process led by the NEH’s Office of Digital Humanities.
“A connection between the humanities and high performance computing communities had never been formally established until this collaboration between DOE and NEH. The partnership allows us to realize the full potential of supercomputers to help us gain a better understanding of our world and history,” says Katherine Yelick, NERSC Division Director.
The selected projects are currently getting up to speed with NERSC systems and staff.
“Supercomputers have been a vital tool for science, contributing to numerous breakthroughs and discoveries. The Endowment is pleased to partner with DOE to now make these resources and opportunities available to humanities scholars as well, and we look forward to seeing how the same technology can further their work,” says NEH Chairman Bruce Cole.
Three projects have been selected to participate in the program’s inaugural run.
The Perseus Digital Library Project, led by Gregory Crane of Tufts University in Medford, Mass., will use NERSC systems to measure how the meanings of words in Latin and Greek have changed over their lifetimes, and compare classic Greek and Latin texts with literary works written in the past 2,000 years. Team members say the work will be similar to methods currently used to detect plagiarism. The technology will analyze the linguistic structure of classical texts and reveal modern pieces of literature, written or translated into English, which may have been influenced by the classics.
“High performance computing really allows us to ask questions on a scale that we haven’t been able to ask before. We’ll be able to track changes in Greek from the time of Homer to the Middle Ages. We’ll be able to compare the 17th century works of John Milton to those of Vergil, which were written around the turn of the millennium, and try to automatically find those places where Paradise Lost is alluding to the Aeneid, even though one is written in English and the other in Latin,” says David Bamman, a senior researcher in computational linguistics with the Perseus Project.
According to Bamman, the basic methods for creating such a literary analysis tool have existed for some time, but the capability for analyzing such a huge collection of texts couldn’t be fully developed due to a lack of compute power. He notes that the collaboration with DOE and NERSC eliminates that roadblock.
In addition to tracking changes in ancient literature, NERSC computers will also be reconstructing ancient artifacts and architecture with the High Performance Computing for Processing and Analysis of Digitized 3-D Models of Cultural Heritage project, led by David Koller, Assistant Director of the University of Virginia’s Institute for Advanced Technology in the Humanities (IATH) in Charlottesville, Va.
Over the past decade, Koller has traveled to numerous museums and cultural heritage sites around the world, taking 3D scans of historical buildings and objects — recording details down to a quarter of a millimeter.
According to Koller, a 3D scan of the Renaissance statue David, carved by Michelangelo, contains billions of raw data points. To convert this raw data into a finished 3D model is extremely time consuming, and nearly impossible on a desktop computer. Limited compute power has also limited Koller’s ability to efficiently recreate large historical sites, like Roman ruins in Italy or Colonial Williamsburg in Virginia. He hopes to use the NERSC resources to digitally restore these sites in three-dimensional images for analysis.
Over the years, Koller has also digitally scanned thousands of fragments that chipped off ancient works of art, some dating back to the ancient Greek and Roman empires. Koller hopes to use NERSC computers to put these broken works back together again like a digital 3D jigsaw puzzle.
“The collaboration with NERSC opens a wealth of resources that is unprecedented in the humanities,” says Koller. “For years, science reaped the benefits of using supercomputers to visualize complex concepts like combustion. Humanists, on the other hand, didn’t realize that supercomputers could potentially meet their needs too, until NEH and DOE proposed this collaboration last year.… I am really excited to see what comes out of this partnership.”
In contrast to the other Humanities High Performance Computing projects that will be done at NERSC, the Visualizing Patterns in Databases of Cultural Images and Video project, led by Lev Manovich, Director of the Software Studies Initiative at the University of California, San Diego, is not focused on working with a single data set. Instead, this project hopes to investigate the full potential of cultural analytics using different types of data including: millions of images, paintings, professional photography, graphic design, user-generated photos; as well as tens of thousands of videos, feature films, animation, anime music videos and user-generated videos.
“Digitization of media collections, the development of Web 2.0 and the rapid growth of social media have created unique opportunities to studying social and cultural processes in new ways. For the first time in human history, we have access to unprecedented amounts of data about people’s cultural behavior and preferences as well as cultural assets in digital form,” says Manovich.
For approximately three years, Manovich has been developing a broad framework for this research that he calls Cultural Analytics. The framework uses interactive visualization, data mining, and statistical data analysis for research, teaching and presentation of cultural artifacts, processes and flows. Manovich’s lab is focusing on analysis and visualization of large sets of visual and spatial media: art, photography, video, cinema, computer games, space design, architecture, graphic and web design, product design. Another focus is on using the wealth of cultural information available on the web to construct detailed interactive spatio-temporal maps of contemporary global cultural patterns.
“I am very excited about his award to use NERSC resources, this opportunity allows us to undertake quantitative analysis of massive amounts of visual data,” says Manovich. “We plan to process all images and video selected for our study using a number of algorithms to extract image features and structure; then we will use variety of statistical techniques — including multivariate statistics methods such asfactor analysis, cluster analysis, and multidimensional scaling — to analyze this new metadata; finally, we will use the results of our statistical analysis and the original data sets to produce a number of highly detailed visualizations to reveal the new patterns in our data.”
About Computing Sciences at Berkeley Lab
The Computing Sciences Area at Lawrence Berkeley National Laboratory(Berkeley Lab) provides the computing and networking resources and expertise critical to advancing Department of Energy Office of Science (DOE-SC) research missions: developing new energy sources, improving energy efficiency, developing new materials, and increasing our understanding of ourselves, our world, and our universe. ESnet, the Energy Sciences Network, provides the high-bandwidth, reliable connections that link scientists at 40 DOE research sites to each other and to experimental facilities and supercomputing centers around the country. The National Energy Research Scientific Computing Center (NERSC) powers the discoveries of 7,000-plus scientists at national laboratories and universities. NERSC and ESnet are both Department of Energy Office of Science National User Facilities. The Computational Research Division (CRD) conducts research and development in mathematical modeling and simulation, algorithm design, data storage, management and analysis, computer system architecture and high-performance software implementation.
Berkeley Lab addresses the world's most urgent scientific challenges by advancing sustainable energy, protecting human health, creating new materials, and revealing the origin and fate of the universe. Founded in 1931, Berkeley Lab's scientific expertise has been recognized with 13 Nobel prizes. The University of California manages Berkeley Lab for the DOE’s Office of Science. The DOE Office of Science is the United States' single largest supporter of basic research in the physical sciences and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.