Sean Peisert, senior scientist at Berkeley Lab (Credit: Berkeley Lab)

Sean Peisert at Lawrence Berkeley National Laboratory on Tuesday, October 10, 2017 in Berkeley, Calif. 10/10/17

As the global cyberthreat environment increases, protecting critical systems and sensitive information from digital attacks can overwhelm even the most sophisticated cybersecurity professionals. This is equally true in scientific computing environments, especially those providing a user-facing front-end that is exposed to the Internet and its common threats of vulnerabilities and attacks.

Recently, Lawrence Berkeley Laboratory (Berkeley Lab) Scientific Data Division Senior Scientist Sean Peisert, along with colleagues at other institutions who are also part of Trusted CI, the National Science Foundation (NSF) Cybersecurity Center of Excellence, co-authored the Guide to Securing Scientific Software. The document is designed to help those developing software to better understand and address critical gaps in security while also informing policy makers responsible for committing resources to improving the state of scientific software security.

The guide is the second in an ongoing series of reports stemming from a multi-year program focused on topics of importance to securing scientific computing environments. The first year focused on increasing reliability of data for open science, the second year was dedicated to software assurance, and the third year will be devoted to the security of “operational technology” or “cyber-physical systems” in science. Operational technology (OT) or cyber-physical systems (CPS) are networked systems connected to computing systems on one side and to either controls or sensors of physical systems on the other side. The “solutions roadmap” will be released in December 2022.

We asked Peisert to tell us more about the most pressing issues in cybersecurity for scientific software, what are the biggest threats and vulnerabilities, and how security can be improved.

Who is this guide written for? Who would benefit from reading it?

Let me first state from the outset that this work is a product of Trusted CI, was funded by the NSF, and was a collective effort that included contributions from Andrew Adams at the Pittsburgh Supercomputing Center, Kay Avila at the National Center for Supercomputing Applications, Elisa Heymann and Barton Miller at the University of Wisconsin, Mark Krenz at Indiana University, and Jason R. Lee from the National Energy Research Scientific Computing Center, plus contributions from all of the software teams that took the time to speak with us.

The guide is written for anyone who is developing software for science. This can range from people who are experts in computer science to experts in the scientific domain. Our hope was that by being as inclusive as possible in our audience, we would maximize the number of people who would find and use our guide because those people may not otherwise gravitate toward the resources out there aimed at those who are more likely to have come from traditional software engineering backgrounds.

How do you secure software in a scientific computing environment? Does cybersecurity for scientific software differ from conventional cybersecurity measures?

Scientific software spans the gamut from software that is extremely widely deployed and used – think Jupyter notebooks – to software that is highly bespoke and used to control one telescope in the world. That software may have been written, decades ago, long before our computer networks looked anything like they look today. Commercial industry generally understands the issues that arise around legacy software, but in science we often overlook those issues or think they are not important to address given other priorities for available resources.

Moreover, developing large software systems is one of the most complicated things that humans do. To do it well requires immense amounts of expertise and a recognition of that complexity and requisite expertise by the people who want that software to be developed, such as funding agencies and employers. It also takes significant resources: trained experts, lots of them, and way more time than one thinks it will take.

Scientific software can often get the short end of the stick. Salaries in academic scientific computing tend to be significantly lower than those in industry, meaning that scientific domain experts, rather than trained computer scientists, are often developing the software. As a result, although we see “for dummies” books which suggest that anyone can program, there can be as much of a challenge for a biologist to program well as there can be for a computer scientist to use CRISPR-CAS9 to synthesize genomes well.

So, the backgrounds of people who develop scientific software can often be different from software developers in commercial industry, and overall funding for software can often be comparatively low as well. As a result, some of the common practices used in commercial industry are not always used in scientific software development. Some of those practices include maintaining and following documentation of policies and procedures, ensuring that new code is vetted before it is merged into a codebase, the use of static and dynamic analysis and runtime testing tools, etc.

What are the most critical threats to scientific computing environments?

We talked about this a lot in developing our guide. Threats basically break down into four themes: exploiting humans, exploiting software, exploiting protocols, and insecure design. Exploiting humans includes things like phishing attacks. Exploiting software includes taking advantage of bugs like not checking inputs which, if exploited, can lead to memory being overwritten to redirect the flow of a program to something an attacker wants to have happen. Exploiting protocols includes things like replay attacks and brute-force password cracking attacks. To be sure, many of these threats mirror commercial industry, but in science we often don’t take the time to address the specific risks and consequences like the commercial software industry typically does.

What are some of the gaps and vulnerabilities software developers need to be aware of?

The Open Web Application Security Project (OWASP) “top 10 threats” is a great list that everyone should dig into. But generally, whenever you build a system, if you make a mistake that results in a bug, that bug has at least a chance that it could be an exploitable vulnerability. So, any time you accept external input, such as from a user at a keyboard or over a network, that input needs to be validated or it could compromise a vulnerability in the software. Any time you receive a pull request, ensure it is properly vetted before it is blindly accepted. Any time you use third-party programs or software libraries, be suspicious as to whether it could contain accidental or even maliciously implanted vulnerabilities, and consider the scope of access and privileges that software may have access to. Any time you handle sensitive information, like passwords or cryptographic keys, be really sure that you’ve used and integrated that cryptography correctly. Failing to do these things could create an exploitable vulnerability, leading to any number of possible threats, including SQL injection attacks against databases, buffer overflows, cross-site scripting attacks against web applications, replay attacks, and more.

There is often something in the news about a data breach or a cyberattack. How has cybersecurity changed to address the current threat environment?

With a few exceptions in the largest, most well-funded commercial technology companies, I am not sure that it has changed significantly. This is a pretty big problem because our reliance on software is not going to be decreasing over the coming years.

What best practices do you recommend for scientific software developers?

As best practice number one, I recommend reading our guide! Also, get help – in scientific software development, we are often siloed in our individual projects, but don’t forget that your lab or university may also have helpful resources, as might professional societies and other communities of practice in your scientific domain.

More specifically, for managers and funding agencies, my advice is to fund your software appropriately. Software is a profession and an engineering discipline. Funding software appropriately begins well before implementation with design, and extends well after initial development with operation, maintenance, updates, patching, and so on. Software assurance refers to confidence that a software meets its requirements for functional and security correctness. The entire software lifecycle – from design to development to operation – requires consideration of assurance.

For scientific software developers, my advice is to ask your management for the resources you need. I also recommend taking training on secure software development to help understand the processes, techniques, and tools that can help improve software assurance. Trusted CI also provides its own training, and we give some pointers about training in our guide.

Is there any good news in the cybersecurity world?

Yes! For one thing, as one of my Berkeley Lab colleagues, Jason Lowe-Power, has indicated, this seems to be a “golden age” for computer architecture. I think we’re going to see some really interesting progress in secure computing architectures in the coming years. Also, we’re finally starting to see some movement away from “unsafe” programming languages for systems development, such as C and C++, and instead using languages like Rust. Finally, I’m extremely heartened to see more widespread use of rigorous formal methods in industry as well. Amazon Web Services, Facebook, Google, and Microsoft are particularly noteworthy with their use of formal verification for critical aspects of their processes and systems. I am hoping that this use of formal methods continues to expand within those organizations and beyond.

What impact does this have on the broader scientific and DOE community?

A significant amount of scientific discovery today relies on software. That software is often developed and then later shared or deployed as a service. The people who use or deploy the software and the scientific research that is being conducted using that software, need to be aware of potential security concerns and risks. As we mentioned a few years ago when developing the Open Science Cyber Risk Profile (OSCRP), “bad things can happen to good science.” Just look at all the organizations that have been paying huge ransomware sums over the past few years or consider the telescope that was attacked during the exact time of signs of a neutron star merger.

Funding allocated to security — including software security — can be perceived as taking away funding for the actual “science” being performed. Most scientists and facilities feel an imperative and urgency to produce scientific results, and security can be seen as a distraction at best and an impediment at worst. However, planning ahead for security of software and systems in a scientific project is critical, and, when done correctly and incorporated from the beginning, can actually save a project time and money by preventing security compromises.

In developing software, scientists may feel like they’re developing something for themselves but suddenly that software gains importance and not only becomes widely used but may end up running for decades. I think we can do a better job in science by thinking ahead and envisioning how software might ultimately be used. Moreover, security done well can be an enabling technique. Even in the absence of catastrophic cyberattacks, it has been demonstrated that rigorous approaches to secure software development can mean fewer bugs to patch and this leads to greater trust in the science and the facilities running that software to enable science.

About Computing Sciences at Berkeley Lab

High performance computing plays a critical role in scientific discovery. Researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab's Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.