Continuing Arecibo’s Legacy
The Arecibo Observatory, UCF, TACC, the University of Puerto Rico, EPOC, Globus, CICoE Pilot partner to move telescope data to Ranch system
April 21, 2021
The original version of this news release was written and published by the Texas Advanced Computing Center.
Millions of people have seen footage of the collapse in December 2020 of the famed Arecibo radio telescope. What they would not have seen from those videos was Arecibo’s data center, located outside the danger zone. It stores the “golden copy” of the telescope’s data – the original tapes, hard drives, and disk drives of sky scans since the 1960s.
Now a new partnership will make sure that about three petabytes (3,000 terabytes) of telescope data is securely backed up off-site and made accessible to astronomers around the world who will be able to use it to continue Arecibo Observatory’s legacy of discovery and innovation.
Within weeks of Arecibo’s collapse, the Texas Advanced Computing Center (TACC) entered into a partnership with the University of Central Florida (UCF), the Engagement and Performance Operations Center (EPOC), the Arecibo Observatory, the Cyberinfrastructure Center of Excellence Pilot (CICoE Pilot), and Globus at the University of Chicago. Together, they’re moving the Arecibo radio telescope data to TACC's Ranch, a long-term data mass storage system. Plans include expanding access to more than 50 years of astronomy data from the Arecibo Observatory, which up until 2016 had been the world’s largest radio telescope.
“Arecibo data has led to hundreds of discoveries over the last 50 years,” said Francisco Cordova, Director of the Arecibo Observatory. “Preserving it and, most importantly, making it available to researchers and students worldwide will undoubtedly help continue the legacy of the facility for decades to come. With advanced machine learning and artificial intelligence tools available now and in the future, the data provides opportunity for even more discoveries and understanding of recently discovered physical phenomena.
Since 2018, UCF has led the consortium that manages the observatory, which is funded by the National Science Foundation (NSF). EPOC, a collaboration between Indiana University and the Energy Sciences Network (ESnet) funded by the U.S. Department of Energy’s Office of Science, had itself partnered with UCF in profiling their scientific data movement activities a year prior to the collapse.
The data storage is part of the ongoing efforts at Arecibo Observatory to clean up debris from the 305-meter telescope’s 900-ton instrument platform and reopen remaining infrastructure. NSF is supporting a June 2021 workshop that will focus on actionable ways to support Arecibo Observatory’s future and create opportunities for scientific, educational, and cultural activities.
“The collapse of the Arecibo Observatory platform certainly raised a sense of urgency within our team,” said Julio Alvarado, Big Data Program Manager at Arecibo. The Big Data team was already working on a strategic plan for their Data Management and Cloud programs. Those plans had to be prioritized and executed with new urgency and importance. The legacy of the observatory relied on the data stored for the over 1,700 projects dating back to the 1960s.
Alvarado’s team reached out to UCF’s Office of Research for help, which connected Arecibo to two NSF-funded cyberinfrastructure projects, EPOC led by Principal Investigators Jennifer Schopf and Dave Jent from Indiana University, and Jason Zurawski from ESnet; and the CICoE Pilot led by Ewa Deelman of the University of Southern California.
“We got involved when the University of Central Florida noted they were challenges in trying to identify a new data center off of the island and were struggling with the demands of efficiently moving that data,” said Zurawski, Science Engagement Engineer of ESnet and Co-PI of the EPOC project.
Migrating the entire Arecibo data set – well over a petabyte in size – would take many months or even years if done inefficiently, but could take only weeks with proper hardware, software, and configurations, added Hans Addleman, the Principal Network Systems Engineer for EPOC. The EPOC team provided the infrastructure skills and resources that helped Arecibo design their data transfer framework using the latest research tools and expertise. The CICoE Pilot team helped Arecibo evaluate their data storage solutions and design their future data management and stewardship experience to make Arecibo’s data accessible to the scientific community.
“Through EPOC, ESnet is helping to support the data management and storage needs of the Arecibo Observatory as it goes through this challenging transition,” said Inder Monga, Director of ESnet. “Their archive holds over one petabyte of data in hard drives and over two petabytes of data in tapes. With tools like perfSONAR, we are enabling them to maintain and preserve these precious resources for scientists worldwide.”
The Data-Transfer Process
As a result of Arecibo’s limited Internet connectivity, the University of Puerto Rico and Engine-4, a non-profit co-working space and laboratory, are contributing to the data-transfer process by allowing Arecibo to share their Internet infrastructure. Further, the irreplaceable nature of the data required a solution that would guarantee data integrity while maximizing transfer speed. This motivated the use of Globus, a platform for research data management developed and operated by the University of Chicago.
The data transfer started mid-January 2021. Arecibo’s data landscape consists of three main sources: data in hard drives, data in tape library, and data offsite. The archive holds over one petabyte of data in hard drives and over two petabytes of data in tapes. This data includes information from thousands of observing sessions, equivalent to watching 120 years of HD video.
Currently, data is being transferred from Arecibo hard drives to TACC’s Ranch system, recently upgraded to expand its storage capabilities to an exabyte, or 1,000 petabytes. Ranch upgrades combine a DDN SFA14K DCR block storage system with a Quantum Scalar i6000 tape library. Over 52,000 users archive their data from all facets of science, from the subatomic to the cosmic. Ranch is an allocated resource of the Extreme Science and Engineering Discovery Environment (XSEDE) funded by the National Science Foundation (NSF). Further phases will copy the Arecibo tape library to hard drives and then to TACC, and a later phase will copy data from offsite locations to TACC, Alvarado noted.
To preserve and guarantee continuity to the scientific community, Arecibo’s data is being copied to storage devices, which are then delivered to the University of Puerto Rico at Mayaguez and the Engine-4 facilities for upload. This ensures that the research community continues to access and execute research with the existing data. This data migration is executed in coordination with Arecibo’s IT department, led by Arun Venkataraman.
Given time constraints and limitations in the networking infrastructure connecting the observatory, speed, security, and reliability were key to effectively moving the data, which were collected from Arecibo’s 1,000 foot (305 meter) fixed spherical radio/radar telescope.
The Globus service addressed these needs, while also providing a means to monitor the transfers and automatically recover from any transient errors. This was necessary to minimize the chance of losing or corrupting the valuable data collected by the telescope in its 50+ years of service. The Globus service enabled the UCF and ESNet teams to securely and reliably move 12 terabytes of data per day.
The data travel over the AMPATH Internet exchange point that connects the University of Puerto Rico to Miami. It then uses Internet2 and the LEARN network in Texas to get to TACC in Austin.
From the Past to the Future
Past achievements made with Arecibo include the discovery of the first ever binary pulsar, a find that tested Einstein’s General Theory of Relativity and earned its discoverers a Nobel Prize in 1993; the first radar maps of the Venusian surface and polar ice on Mercury; and the first planet found outside our solar system.
“The data is priceless,” Alvarado emphasized. Arecibo’s data includes a variety of astronomic, atmospheric, and planetary observations dating to the 1960s that can’t be easily duplicated. “While some of the data led to major discoveries over the years, there are reams of data that have yet to be analyzed and could very likely yield more discoveries. Arecibo’s plan is to work with TACC to provide researchers access to the data and the tools necessary to easily retrieve data to continue the science mission at Arecibo,” he said.
The Arecibo IT and Big Data teams are in charge of the data during the migration phases of the project, which doesn’t allow public access. As the migration and data management efforts progresses, the data will be made available to the research community.
Arecibo, TACC, EPOC, and CICoE will continue to work on building tools, processes, and framework to support the continuous access and analysis of the data to the research community. The data will be stored at TACC temporarily, supporting Arecibo’s goal of providing open access to the data. Arecibo will continue to work with the groups on the design and development of a permanent storage solution.
About Computing Sciences at Berkeley Lab
High performance computing plays a critical role in scientific discovery, and researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab’s Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.
Founded in 1931 on the belief that the biggest scientific challenges are best addressed by teams, Lawrence Berkeley National Laboratory and its scientists have been recognized with 13 Nobel Prizes. Today, Berkeley Lab researchers develop sustainable energy and environmental solutions, create useful new materials, advance the frontiers of computing, and probe the mysteries of life, matter, and the universe. Scientists from around the world rely on the Lab’s facilities for their own discovery science. Berkeley Lab is a multiprogram national laboratory, managed by the University of California for the U.S. Department of Energy’s Office of Science.
DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit energy.gov/science.