Longtime Lab and Pioneering Data Scientist Frank Olken has Died
November 15, 2019
Frank Olken, a data scientist who spent 36 years at Berkeley Lab tackling the challenge of big data before it became Big Data, died Oct. 18 at age 67. He also worked two stints as a National Science Foundation (NSF) program director in Arlington, Va.
Olken joined Berkeley Lab in 1974 after earning his bachelor's degree in Electronics Engineering and Computer Science at UC Berkeley. He spent much of his lab career as a member of the Scientific Data Management Group.
"He was a walking encyclopedia," said Arie Shoshani, leader of the lab's Scientific Data Management Group and Olken's supervisor for much of his time at the lab. "He was extremely knowledgeable about everything -- history, government, economics. I think he took enough classes in economics to get a Ph.D. And he didn't just know the facts, but also the mechanics and processes behind them. Whenever someone had a question we couldn't answer, we just said 'Go ask Frank'."
Olken's research interests were in the areas of database management, semantic web, bioinformatics, computational biology and metadata registry standards development. In particular, he was interested in research on sampling from databases and earned his Ph.D. in computer science from UC Berkeley with his thesis on "Random Sampling from Databases."
Doron Rotem was a professor of computer science at the University of Waterloo when he came to the lab on sabbatical in 1983. Since he had experience working with students, he was asked to help Olken with the research that would lead to his Ph.D. At the time, researchers used databases to answer very specific questions, Rotem said.
"What Frank did was create a method that could get random samples from databases to answer queries directly from the database. Previously, this took two or three steps of moving the data out of the database for analysis then running it through statistical software," Rotem said. "His approach was 10 times more efficient. At the time, we were looking at scientific databases that were very large and beyond the capabilities of commercial applications."
Shoshani said that in order to do this, Olken needed in-depth knowledge of statistics to implement statistics into the database itself. "He was one of the early people to do this and his methods was later used by commercial applications," Shoshani said.
Rotem later joined the lab and worked with Olken for more than 20 years. In all, Olken wrote or co-authored 39 technical papers. "He was very easy to work with, just a nice guy," Rotem said. "He was extremely knowledgeable and a great writer, he wrote very quickly and could easily translate my ideas into text."
Shoshani cited the 1981 paper on "A compression technique for large statistical databases" by Susan Eggers, Olken and himself, which was spawned by a project at the lab doing statistical analysis of Department of Labor data. The way the data was organized and stored made it very unwieldy and hard to analyze. The paper explored "the compression of large statistical databases and proposed techniques for organizing the compressed data, such that the time required to access the data is logarithmic. The techniques exploit special characteristics of statistical databases, namely variation in the space required for the natural encoding of integer attributes, a prevalence of a few repeating values or constants, and the clustering of both data of the same length and constants in long. separate series." The work led the team to coin the term "statistical databases."
Although his research helped speed up data searches and he was known as a fast writer, Olken's quest to finish his Ph.D. thesis was cited by several colleagues as part of his legacy.
"The story of Frank's Ph.D .was indeed epic and I played my part," recalled Frederic Gey, an information scientist at UC Berkeley who left the lab to work on campus in 1989 and pursue his own Ph.D. "Frank was the perennial graduate student who would always finish 'next year'. When he learned in 1992 that I expected to file my dissertation in spring 1993 (at age 52) he suddenly got very competitive and buckled down to writing. The upshot is that he filed his dissertation one day before me in May 1993."
Olken was an avid reader of books and journals and kept stacks of material in both his lab office and his apartment near the UC Berkeley campus. At one point, his home library was so large that his landlord worried it could damage the building, according to former colleague John McCarthy.
"He was a voracious reader," said McCarthy, who joined the lab and met Olken in 1980. "He was a polymath, interested in everything from amateur radio to biology to history and the humanities. I loved having lunch with Frank because there were always interesting things to talk about."
McCarthy and Olken collaborated on "many a proposal." They were members of several international standards committees, including one that developed the World Wide Web Consortium’s XML SchemaDefinition language. They were also co-authors with four lab colleagues of a 1991 paper on "The Chromosome Information System."
In 1990, Olken was one of 12 people named to the of Joint Informatics Task Force formed by DOE and the National Institutes of Health to support the nascent Human Genome Project by developing genome information and analysis tools and make them available to scientists and physicians.
In 2006, Olken took leave from the lab to work as a detailee in the NSF's Computer and Information Science and Engineering directorate in Arlington, Va., working as a program director for the Information Integration & Informatics program in the Intelligent Information Systems Division. After four years, he returned to Berkeley and retired from the lab in 2010. In 2013, he returned to the same NSF program and spent an additional three years as a program director.
Maria Zemankova, program director of the Information Integration & Informatics program, first met Olken at a conference in the 1990s and he was recommended as a candidate when her program was looking for a program director, which was a rotating position.
"It was a yearly position, but we kept Frank on for the maximum of four years," Zemankova said. "He was very, very dedicated and was very supportive of our PIs and their research. The PIs very much enjoyed working with Frank."
Zemankova said she has sent the news of his passing to several of the research communities Olken was active in and has seen an outpouring of sympathy over the news. NSF staff will hold a farewell gathering for him on Wednesday, Nov. 20.
Olken was born in 1952 in Washington, D.C., where his parents met during World War II. His mother was a naval officer in the Pentagon and his father worked as a civilian engineer at the Washington naval shipyard. His father's family was from Tsarist Russia and they emigrated to the United States in 1911. All of his family who stayed behind died in the Holocaust.
The Olken family moved to California in 1958, where his father, Hyman Olken, worked as a development engineer on automatic control systems and reviewed specifications on instrumentation systems at the then-nascent Lawrence Livermore Laboratory. The Olkens helped start the first synagogue in Alameda County east of the Berkeley-Oakland hills.
"Frank was something of a child prodigy," said his brother, Robert Olken. "He was part of a cohort that skipped fifth grade because they were advancing so rapidly. As a result, he never learned cursive handwriting. Frank was taking computer courses at Chabot Junior College and working as a programmer for local businesses before leaving high school."
Although Olken rarely talked about himself, Shoshani remembered two stories he told about his time in high school. When he took the SAT test, he got a nearly perfect score, missing only one question -- in Latin. And once, when Olken was bored in a class, the teacher asked if he wanted to teach it. So Olken did, even preparing materials for the students.
Memorial services are pending and details will be announced when finalized.
About Computing Sciences at Berkeley Lab
High performance computing plays a critical role in scientific discovery. Researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab’s Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.