Is Your Digital Information More at Risk Today than 10 Years Ago?
UNM, Lawrence Berkley National Lab researchers say maybe not
October 12, 2015
It’s easy to form the mental image of a hacker hunched over a computer, probing a way to get your personal information, whether to sell it, acquire credit cards in your name or use your health insurance.
It does happen, but University of New Mexico Department of Computer Science Professor Stephanie Forrest and Ph.D. student Benjamin Edwards, working with Steven Hofmeyr from the Lawrence Berkeley National Laboratory (Berkeley Lab), say it is not happening more frequently than it did a decade ago. Data breaches, in general, are not growing in size.
“Cybersecurity has become a global problem, and to tackle it effectively will require careful analysis of complex datasets from diverse sources,” said Forrest. “This study illustrates how modern data science can shed light on one of today’s most challenging problems.”
In a new paper, titled “Hype and Heavy Tails: A Closer Look at Data Breaches,” which won the Best Paper Award at the Workshop on the Economics of Information Security in June, the researchers looked at both malicious and negligent breaches. Malicious breaches occur when attackers specifically target someone’s personal information. Negligent breaches occur when someone’s private information is accidentally exposed for example if a database of personnel records is stored on a laptop that is lost or stolen.
They used information published by the Privacy Rights Clearinghouse, a private non-profit that tracks public reports of data breaches, and they note that their results are drawn from publicly acknowledged data breaches.
The researchers constructed a statistical model based on public data about breaches collected over the last decade and used the model to analyze trends and make predictions about future breaches. The data clearly showed that information is exposed twice as often through negligence as it is from malicious attacks. Using expanded data that includes high profile data breaches from this summer, the model also predicts that there is a 98.2 percent chance of a breach that exposes more than 5 million records during the next three years.
What is the bottom line, that is, what is the real cost in dollars of these data breaches? Estimating financial costs of breaches accurately also requires analyzing their cost. The research team applied some existing cost models to project that over the next three years, data breaches could cost individuals, companies and public entities up to $180 billion.
“With this work, our goal was to answer the questions: Are security breaches getting bigger? Are they happening more frequently? And when they do happen, are the impacts more catastrophic? When we fit the cyber security data to the statistical model, we found a ‘long tail’ distribution, which is liable to distort public perception,” says Hofmeyr. “It’s kind of like if you’ve just experienced a big earthquake, you may suddenly be scared of big earthquakes, even though the probability for big earthquakes hasn’t changed."
"It’s the same for security," adds Hofmeyr. "And, the reason that we can say that is because we have this principled statistical model, which gives us a more comprehensive and contextual view than simply looking at averages.”
There’s a take away message for public policy experts in this. Industry reports, which are widely circulated and difficult to confirm, often use inappropriate statistical techniques and should be taken with a large grain of salt. Policies that encourage uniform reporting of security problems would provide clarity in this very murky area.
Edwards summed it up. “So much of our current understanding about security problems relies on private data and opaque analysis methods. Studies like ours provide a rational counterpoint for policy makers and they show the benefit of putting data about security problems into the public domain.”
This research was partly supported by U.S. Department of Energy’s Office of Science. the single largest supporter of basic research in the physical sciences in the United States. It is also supported by the National Science Foundation and the Defense Advance Research Projects Agency.
Written by Karen Wentworth, University of New Mexico.
About Computing Sciences at Berkeley Lab
High performance computing plays a critical role in scientific discovery, and researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab’s Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.
Founded in 1931 on the belief that the biggest scientific challenges are best addressed by teams, Lawrence Berkeley National Laboratory and its scientists have been recognized with 13 Nobel Prizes. Today, Berkeley Lab researchers develop sustainable energy and environmental solutions, create useful new materials, advance the frontiers of computing, and probe the mysteries of life, matter, and the universe. Scientists from around the world rely on the Lab’s facilities for their own discovery science. Berkeley Lab is a multiprogram national laboratory, managed by the University of California for the U.S. Department of Energy’s Office of Science.
DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit energy.gov/science.