Univ. of Puerto Rico Team Applies Deep Learning to Structural Biology
Summer Students Testing New Neural Network Approach to Improve Protein Structure Prediction
July 19, 2018
It’s all about opportunity, according to Dan Rosa de Jesús and Julián Cuevas, two students from the University of Puerto Rico Mayagüez (UPRM) who are spending their summer in Computing Sciences at Lawrence Berkeley National Laboratory (Berkeley Lab) learning everything they can about deep learning and how it can improve protein structure prediction, a grand challenge in structural biology.
The two came to the lab in June as part of the Sustainable Research Pathways (SRP) internship program, which runs through early August and attracts about 25 college students and faculty each year. Rosa de Jesus, a computer engineering graduate student specializing in high performance computing (HPC) and smart power optimization, is heading to the University of Texas, El Paso in the fall to begin his PhD; Cuevas is a software engineering student going into his senior year with a growing passion for both HPC and deep learning. They are joined by faculty advisor Wilson Rivera, PhD, a professor of computer science and engineering at UPRM who is participating in DOE’s sponsored Visiting Faculty Program and helped arrange their internships through the SRP program. SRP is partly supported by DOE's Advanced Scientific Computing Research program.
This is the first time UPRM has sent students to the lab as part of the SRP program, according to Rivera – but based on his experiences over the last several months, he said, it won’t be the last. After attending an SRP workshop in December 2017 designed to connect faculty from colleges all over the U.S. with scientists at the Lab who serve as mentors for the summer students, Rivera’s poster presentation caught the eye of Silvia Crivelli, a computational biologist in the Computational Research Division (CRD) whose research focus is protein folding. Together with John Wu, who leads the Scientific Data Management Group at Berkeley Lab, they developed a research proposal that was funded by the Visiting Faculty Program at Berkeley Lab.
Improving Interpretability Models
The UPRM team is now working with Crivelli and other CRD researchers – including Rafael Zamora-Resendiz, an SRP participant and Visiting Faculty Program student intern in 2017 who specializes in machine learning – to develop concepts and techniques designed to improve the application of deep neural networks (DNNs) in structural biology. Zamora-Resendiz is now a research staff member at CRD.
“We are looking at how DNNs are applied to the problem of protein structure prediction,” Cuevas said. So far the, the problem has been mainly handled with convolutional neural networks (CNN). “We are trying to see if we can implement this problem of protein structure classification and prediction with capsule networks (CapsNet) – a new type of neural network – and see if it provides some improvement in accuracy compared to CNNs,” Cuevas explained.
This work will have a transformative effect on the application of deep learning to structural biology, especially with regard to interpretability models that will provide researchers with information that will allow them to better understand the “thinking” process of computers when predicting the structure of proteins, Rosa de Jesús noted. Protein structure prediction is challenging for biologists because “we don’t yet know what is important for the prediction of these structures,” Cuevas said. “We are hoping the neural networks will help us better understand that model.”
Their project also represents an effort between scientists at Berkeley Lab and UPRM to train science, technology, mathematics and engineering professionals and motivate other faculty and students at the university to explore emerging areas in computing science, such as deep learning.
“After this experience, we are in a position to have a group of students at UPRM who will continue the work on this project,” Rivera said. “We are going to take what we have been working on here at the lab this summer and continue it back at UPRM.”
Paying it Forward
Cuevas said he is excited about working with his fellow students at UPRM to teach them everything he has learned so far about deep learning – a cutting-edge subject that hasn’t yet received a lot of attention at the university.
“This is exactly what we expect to happen through this program,” said Crivelli, who is mentoring a total of 15 students and faculty this year. “We are reaching out to these three people (Cuevas, Rosa de Jesús and Rivera), but beyond what they are doing here we hope they will bring the excitement of what they learn here and communicate it to other students when they go back.”
She is proud of what Rosa de Jesús and Cuevas have achieved so far, noting that “they came here to work on protein structure prediction without knowing anything about it and that is very brave, being willing to work outside your comfort zone.”
The lab’s collaborative approach is key to the SRP program’s success, Crivelli emphasized. “We are trying to be very multi-disciplinary, trying for the students to learn beyond what they learn in the classroom because students in computer science need to learn about math and science too,” she said. “The problems we have to solve today are very, very hard and we cannot solve them alone, we need to collaborate and to collaborate effectively we need to become computational scientists. This is the number one lesson for my summer students: there are different people here with different expertise and we need to learn how to bridge the gaps between disciplines to talk to each other and help each other.”
Cuevas agreed. “This experience has been really good – the space, the people, everyone is very positive and welcoming and open,” he said. “I got really lucky meeting Wilson and Dan and working alongside them. I’ve learned so much, and I have to push myself to stay at their pace and work as efficiently as I can. I would recommend any student go out and try to get these kinds of opportunities, get the work done, put in the time.”
Rosa de Jesús – who, just for fun, is helping some of his non-computer science colleagues in the SRP program learn Python by sending them coding challenges every morning - seconded that sentiment, quietly pointing to additional challenges he and many others at UPRM faced in the past year. “Any student should go toward where the opportunities are,” he said. “In Puerto Rico we had some really bad times last year with the hurricanes. But what can I say – we didn’t give up and we kept going on and we finished our semesters, and now – we are here.”
About Computing Sciences at Berkeley Lab
The Lawrence Berkeley National Laboratory (Berkeley Lab) Computing Sciences organization provides the computing and networking resources and expertise critical to advancing the Department of Energy's research missions: developing new energy sources, improving energy efficiency, developing new materials and increasing our understanding of ourselves, our world and our universe.
ESnet, the Energy Sciences Network, provides the high-bandwidth, reliable connections that link scientists at 40 DOE research sites to each other and to experimental facilities and supercomputing centers around the country. The National Energy Research Scientific Computing Center (NERSC) powers the discoveries of 7,000-plus scientists at national laboratories and universities, including those at Berkeley Lab's Computational Research Division (CRD). CRD conducts research and development in mathematical modeling and simulation, algorithm design, data storage, management and analysis, computer system architecture and high-performance software implementation. NERSC and ESnet are Department of Energy Office of Science User Facilities.
Lawrence Berkeley National Laboratory addresses the world's most urgent scientific challenges by advancing sustainable energy, protecting human health, creating new materials, and revealing the origin and fate of the universe. Founded in 1931, Berkeley Lab's scientific expertise has been recognized with 13 Nobel prizes. The University of California manages Berkeley Lab for the DOE’s Office of Science.
DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.