Is Your Digital Information More at Risk Today than 10 Years Ago?
UNM, Lawrence Berkley National Lab researchers say maybe not
October 12, 2015
It’s easy to form the mental image of a hacker hunched over a computer, probing a way to get your personal information, whether to sell it, acquire credit cards in your name or use your health insurance.
It does happen, but University of New Mexico Department of Computer Science Professor Stephanie Forrest and Ph.D. student Benjamin Edwards, working with Steven Hofmeyr from the Lawrence Berkeley National Laboratory (Berkeley Lab), say it is not happening more frequently than it did a decade ago. Data breaches, in general, are not growing in size.
“Cybersecurity has become a global problem, and to tackle it effectively will require careful analysis of complex datasets from diverse sources,” said Forrest. “This study illustrates how modern data science can shed light on one of today’s most challenging problems.”
In a new paper, titled “Hype and Heavy Tails: A Closer Look at Data Breaches,” which won the Best Paper Award at the Workshop on the Economics of Information Security in June, the researchers looked at both malicious and negligent breaches. Malicious breaches occur when attackers specifically target someone’s personal information. Negligent breaches occur when someone’s private information is accidentally exposed for example if a database of personnel records is stored on a laptop that is lost or stolen.
They used information published by the Privacy Rights Clearinghouse, a private non-profit that tracks public reports of data breaches, and they note that their results are drawn from publicly acknowledged data breaches.
The researchers constructed a statistical model based on public data about breaches collected over the last decade and used the model to analyze trends and make predictions about future breaches. The data clearly showed that information is exposed twice as often through negligence as it is from malicious attacks. Using expanded data that includes high profile data breaches from this summer, the model also predicts that there is a 98.2 percent chance of a breach that exposes more than 5 million records during the next three years.
What is the bottom line, that is, what is the real cost in dollars of these data breaches? Estimating financial costs of breaches accurately also requires analyzing their cost. The research team applied some existing cost models to project that over the next three years, data breaches could cost individuals, companies and public entities up to $180 billion.
“With this work, our goal was to answer the questions: Are security breaches getting bigger? Are they happening more frequently? And when they do happen, are the impacts more catastrophic? When we fit the cyber security data to the statistical model, we found a ‘long tail’ distribution, which is liable to distort public perception,” says Hofmeyr. “It’s kind of like if you’ve just experienced a big earthquake, you may suddenly be scared of big earthquakes, even though the probability for big earthquakes hasn’t changed."
"It’s the same for security," adds Hofmeyr. "And, the reason that we can say that is because we have this principled statistical model, which gives us a more comprehensive and contextual view than simply looking at averages.”
There’s a take away message for public policy experts in this. Industry reports, which are widely circulated and difficult to confirm, often use inappropriate statistical techniques and should be taken with a large grain of salt. Policies that encourage uniform reporting of security problems would provide clarity in this very murky area.
Edwards summed it up. “So much of our current understanding about security problems relies on private data and opaque analysis methods. Studies like ours provide a rational counterpoint for policy makers and they show the benefit of putting data about security problems into the public domain.”
This research was partly supported by U.S. Department of Energy’s Office of Science. the single largest supporter of basic research in the physical sciences in the United States. It is also supported by the National Science Foundation and the Defense Advance Research Projects Agency.
Written by Karen Wentworth, University of New Mexico.
About Computing Sciences at Berkeley Lab
The Lawrence Berkeley National Laboratory (Berkeley Lab) Computing Sciences organization provides the computing and networking resources and expertise critical to advancing the Department of Energy's research missions: developing new energy sources, improving energy efficiency, developing new materials and increasing our understanding of ourselves, our world and our universe.
ESnet, the Energy Sciences Network, provides the high-bandwidth, reliable connections that link scientists at 40 DOE research sites to each other and to experimental facilities and supercomputing centers around the country. The National Energy Research Scientific Computing Center (NERSC) powers the discoveries of 6,000 scientists at national laboratories and universities, including those at Berkeley Lab's Computational Research Division (CRD). CRD conducts research and development in mathematical modeling and simulation, algorithm design, data storage, management and analysis, computer system architecture and high-performance software implementation. NERSC and ESnet are DOE Office of Science User Facilities.
Lawrence Berkeley National Laboratory addresses the world's most urgent scientific challenges by advancing sustainable energy, protecting human health, creating new materials, and revealing the origin and fate of the universe. Founded in 1931, Berkeley Lab's scientific expertise has been recognized with 13 Nobel prizes. The University of California manages Berkeley Lab for the DOE’s Office of Science.
DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.