As demand increases for the scientific community to address a wide range of urgent problems – including climate change, emerging infectious diseases, food insecurity, and economic and social disparities – cross-disciplinary solutions are required to meet the scientific demands and needs as well as to develop standards of practice to achieve greater control and reproducibility in the scientific process.

Computers and automation technologies have long played a central role in research workflows, acquiring, processing, and analyzing data. Automated research workflows (ARWs) integrate computing, laboratory automation, and tools from artificial intelligence (AI) to speed the pace of scientific research, but a recent report finds the research community needs to put more work into developing standards and practices that assure data is findable, accessible, interoperable, and reusable (FAIR). The report, released by the National Academies of Sciences, Engineering, and Medicine (NASEM), is intended to contribute to a next step in the transformative application of computing to scientific discovery.

Shreyas Cholia, group lead for the Usable Software Systems Group in the Scientific Data Division, served as a member of the NASEM’s committee that released the report, “Automated Research Workflows for Accelerated Discovery: Closing the Knowledge Discovery Loop,” in June of 2022. The report examines efforts to develop advanced and automated workflows to accelerate research – including broader use of artificial intelligence – and identifies research needs and priorities in advanced and automated workflows for scientific research.

Identifying Barriers to Progress

Machine learning and ARWs propel research and scientific discoveries at an accelerated pace, but the findings of the report address the use of AI and machine learning as a component in a workflow as well as the need for standardized practices that would make the technology and its outputs broadly accessible and FAIR.

ARWs are cross-disciplinary scientific research processes that integrate computing, laboratory automation, and AI tools to perform research tasks, such as designing experiments, making observations, and creating simulations; collecting and analyzing data; and learning from the results to inform further research. Researchers implement ARWs to accelerate the generation of scientific knowledge, potentially by orders of magnitude, with greater control and reproducibility in the scientific process. Although impressive strides are being made to apply ARWs in a variety of fields, there are significant barriers to progress, including funding, training, and shifts in culture.

“Concerns about the role of humans in the discovery loop, privacy of data, and the impact on current incentive systems need to be addressed,” said Cholia. “The idea is not just to have the steps and processes in place to solve a specific problem but to solve bigger problems that require iteration along the outer loop of knowledge discovery. We want to take scientific workflows into that more iterative space where you have discovery going on.”

Accelerating Science with Machine Learning and ARWs

The tools and techniques researchers are developing under the large umbrella of ARWs promise to transform the centuries-old serial method of research investigation, the report asserts. Instead of one stream of sequential inquiry, thousands or even millions of simulations or experiments could be iterated rapidly in closed loops, with the analysis of data and even the design of experiments or controlled observations assisted by machine learning or optimization techniques. Realizing the potential of ARWs could significantly speed the pace of scientific discovery and expand the scientific communities’ contribution to society, the report says.

For example, research groups in materials science have built systems using a combination of laboratory automation and machine learning, cutting the time for the synthesis and testing of materials from nine months to five days. In drug discovery, an active learning algorithm was able to identify 57 percent of the active compounds in a group of molecules, compared with 20 percent identified through a more traditional approach. Researchers in the social and behavioral sciences are using new data resources and advanced analytics to understand better and address a variety of pressing problems, including poverty alleviation and strengthening the delivery of public services in cities.

Given the evolving nature of this topic, the report states that the findings and recommendations are purposefully broad and “future-oriented.” Experiments don’t commonly use fully realized ARWs, so the report examines how and where progress is currently happening in areas such as advanced computation, the use of workflow management systems, laboratory automation, and the use of AI. The committee is hopeful that the report will “stimulate further discussion, transformations, investments, and meaningful use.”

About Computing Sciences at Berkeley Lab

High performance computing plays a critical role in scientific discovery. Researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab's Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.