Nearly a decade ago, an interdisciplinary team of scientists set out to tackle a particularly vexing challenge in the complex, sprawling world of neurophysiology: develop a data standard and software ecosystem so researchers studying the brain and its functions could more easily and efficiently share information, reanalyze data, and ultimately advance scientific discovery.
The resulting project, Neurodata Without Borders (NWB), sought to create a common language for the neuroscience community. And while the effort – initiated by the nonprofit Kavli Foundation in 2014 and since led by a team at Lawrence Berkeley National Laboratory (Berkeley Lab) – required technical know-how, its success also depended on a cultural shift within the neuroscience community.
“These kinds of efforts are as much a social engineering project as they are a technical engineering project,” said Oliver Rübel, a staff scientist at Berkeley Lab in the Computational Biosciences Group of the Scientific Data (SciData) Division and the NWB principal investigator.
That’s why much of the focus has been on training users and making the ecosystem accessible to a broad audience – in large part via tutorials and workshops, including a high-profile hackathon this month and several other collaborative events this summer and in the coming year.
“They’re super important,” said Benjamin Dichter, the founder of CatalystNeuro – a software consulting company that specializes in data engineering for neuroscience labs – and a longtime NWB team member. “It allows us to get the needed feedback to improve NWB and create a community of neuroscientists that are interested in open science.”
Adoption spreading
The focus on educating the neuroscience community on NWB over the years has already paid dividends. Today – only four years after the second iteration of NWB was released – more than 170 data sets in NWB format are on the DANDI (Distributed Archives for Neurophysiology Data Integration) Archive, a data repository to support collaborative data sharing and analysis.
“More and more of the neuroscience community has heard about NWB, and both the community and sponsors are placing increased emphasis on sharing data,” said Ryan Ly, a scientific data engineer at Berkeley Lab and technical lead for the core NWB software.
“Scientists are starting to reuse publicly shared NWB data in their own work to discover new scientific insights or develop new analysis methods,” he continued. “It is really rewarding to see so much of the community use the standards and software that we have built, and now benefit from the fact that so much public data is in the NWB standard.”
The burgeoning library of public data is a far cry from what the neuroscience landscape used to look like before the advent of NWB.
“Data hardly got published,” said Rübel. “And if it got published, it was in a form that was very specific to each individual lab.”
That siloed approach was not only cumbersome and inefficient, it created a barrier to entry for scientists who lacked the necessary resources and financial backing to conduct costly, time-consuming data gathering and experiments.
The goal for the NWB team, then, was to democratize the data and create a standardized, yet extensible, format across the board, much like mp3 has become the universal standard to share music files.
“In the neuroscience community, I think NWB has really helped facilitate the shift to open data and open science in the community because now you can publish the data and the person you want to share it with can actually understand and use your data,” Rübel said.
User events help refine NWB tools
To raise awareness for the data ecosystem, help promote its use in the neuroscience community, and teach researchers to utilize the new tools, the NWB team has in large part planned and hosted hackathons, workshops, and other events.
In July, for instance, the group hosted its annual NWB User Days to train users in converting their data to NWB and publishing it on the DANDI Archive. That same month, the annual NWB Developer Days brought together neuroscientists, tool builders, and research software engineers to further the development of the NWB software ecosystem, including the data standard, core software packages and community tools.
The Developer Days event was one of the best experiences he’s had being a career engineer at the lab, said Matthew Avaylon, a machine learning engineer with SciData’s Machine Learning and Analytics Group and a core software developer taking part in community workshops for NWB tools.
“It showed me the work I was doing had impact beyond the confines of my cubicle and the team itself,” said Avaylon.
Then, in August, the team held the Open Neurodata Showcase 2023, a virtual event where data publishers presented posters about their work so that the community could learn about their data and engage in discussions about their projects.
And this month in Granada, Spain, the NWB team organized NeuroDataReHack 2023, a hackathon focused on generating new insights from existing neurophysiology data through secondary analysis. The free, four-day event, now in its second year, was held as a satellite of the IBRO World Congress 2023, the congress put on every four years by the International Brain Research Organization (IBRO) to promote the field of neuroscience, increase awareness, and facilitate collaboration.
The goal of the hackathon is to make it more accessible to diverse participants who might not otherwise have the opportunity to participate in the workshop. Participants come into the event with the goal of using public data to answer important neuroscience-driven questions, said Rübel. As part of the hackathon, participants were invited to apply for a Kavli Foundation Neurodata Discovery Award, which awards $50,000 of funding per proposal for three data reanalysis projects that come out of the NeuroDataReHack event to continue.
“That is special to have that associated with an event – that you are not just coming here to learn something but that there’s actually opportunity to continue that project,” said Rübel, adding that the grants are intended for the people on the ground doing the work, from postdocs to Ph.D. students.
‘A TurboTax tool for converting data’
The success of NWB has begun to open up the data ecosystem to a whole new set of neuroscientists who are less comfortable with programming and lack some of the technical knowledge that early adopters had.
“Now, more and more scientists with less formal training in computer science are attending our events,” Ly said. “We have heard from the latest participants that to reduce the barrier to entry to using NWB across the community, we need to build tools that allow scientists to convert their data to NWB and visualize and analyze their data in NWB with as little coding required as possible.”
In order to bridge that gap, Ly is leading a project with CatalystNeuro, sponsored by the Kavli Foundation, to build a graphical user interface, called NWB GUIDE, that allows scientists to convert their data to NWB without writing any code.
“Think of it as a TurboTax tool for converting data,” Rübel said. “You don’t need to program anymore to convert your data, you can enter everything in your web browser and then it will convert your data into NWB.”
The group is aiming to release the tool in November in time for Neuroscience 2023 – the biggest neuroscience conferences in the world – in Washington, D.C., where the NWB team will have a booth together with DANDI.
And next February, the NWB team will have an interactive tutorial at Computational and Systems Neuroscience (COSYNE) 2024 in Portugal. In April the team will organize the NWB Developer Days 2024 hosted by DataJoint, and in July the NeuroDataReHack 2024 at the Howard Hughes Medical Institute Janelia Research Campus.
As the prevalence of NWB in the neuroscience world continues to grow, the future holds exciting possibilities for researchers.
In the long-term, open data enables new types of projects that aren’t possible in the earlier paradigm, said Dichter, the main NWB organizer of NeuroDataReHack 2023.
“I think a second whole new avenue is: How can we get AI to utilize this data and derive scientific insights?” said Dichter, adding that his company is exploring using large language models for the identification of relevant DANDI sets and for the actual analysis. In the future, for example, a researcher could give AI a scientific question; the AI model would identify and analyze data sets, provide results, and then refine the question.
For now, the team is thrilled with the transformative change NWB has already brought to neuroscience.
“I feel deeply connected with the neuroscience community,” said Ly, who has a Ph.D. in neuroscience. “Having trained in it for over seven years, I know firsthand the struggles of managing these increasingly large, complex, multimodal datasets and sharing these datasets with collaborators in an understandable way. So I understand the value of having a data standard and data archive where scientists can access neurophysiology data in a common format, and developers can rely on a stable input and output data format for their tools.”
About Computing Sciences at Berkeley Lab
High performance computing plays a critical role in scientific discovery. Researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab's Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.