Berkeley Lab, Pittsburgh Supercomputing Center to Study Network Congestion

July 23, 1997

For all the Internet users who wonder why e-mail sometimes bogs down or it takes so long to call up a favorite Web site, a new study by computer scientists in Berkeley and Pittsburgh may turn up some answers.

According to Vern Paxson of Ernest Orlando Lawrence Berkeley National Laboratory, the year-long study funded by the National Science Foundation could help with troubleshooting problems on the Internet and eventually give users a method to rate the offerings of Internet service providers.

The study by Paxson of the Computing Sciences Network Research Group at Berkeley Lab and researchers at the Pittsburgh Supercomputing Center will involve placing computers at various locations on the Internet to automatically measure traffic between those stations.

“It’s a big win to be able to measure at both ends of a network path,” says Paxson, who earned his Ph.D. in May from UC Berkeley with a study of Internet traffic between 37 different sites. “By documenting when data is sent and when it arrives, we will be able to measure the capacity of that path and how much of a load there is on that path.”

One of the biggest problems with the Internet today is congestion, says Paxson. That problem, though, reflects one of the ‘Net’s strengths -- that all traffic is thrown together and channeled through a vast network of links, meaning that most messages and connections make it. When one link is down or full, the system finds another route to keep the data hopping along. Finding out where the congestion occurs, and where the weak links are, is one of Paxson’s goals.

“The network is very good at hiding the individual hops a message takes from the end users,” says Paxson. “Hops go down all the time and packets have to be re-routed.”

Within the computer science community, the Web is known as a “success disaster,” something that was so useful it grew very large very quickly, Paxson says. The underlying problem is that the Web, a huge hypertext database, is poorly configured to run on the Internet. As the popularity of the Web grows, the problems loom ever larger.

Figuring out how to fix that situation, which is one of scaling a small system up to a much larger one, will require the detailed measurements promised by the study, Paxson says.

“We can build systems with hundreds of things and they work fine, but when we scale them up to hundreds of thousands, they break down--and it’s hard to predict where they will fail,” he adds. “As a result, users may have the perception that something is overloaded when it’s actually mis-engineered.”

A primary focus of the study by Paxson and his Pittsburgh colleagues is the rate at which data packets drop out. “We’d like to know the drop rates because they directly affect performance,” Paxson says. “That’s a basic thing to measure.”

Information is sent along the Internet via IP, or Internet Protocol. When too many data packets come in, routers on the Net hold them in buffers until traffic thins out. If it doesn’t, the buffers fill up and the router throws out packets.

This situation reflects the layered architecture of the Internet. The IP provides the basic building block, which says, “you hand me a packet and I’ll do my best to deliver it. It may arrive as expected, it may get duplicated along the way, it may be corrupted or it may arrive out of order -- there are no guarantees,” Paxson says. “Although that sounds like a weakness, it’s actually a strength. It’s what allows the connectivity so the Internet can grow.”

On top of the IP layer is the Transmission Control Protocol, or TCP. This is the system that says the message will get there and will reliably cope with any problems encountered along the way.

“On the Web, when you do a data transfer and you get the desired item, the transfer is finished,” Paxson says. “When you’re using a browser, like Netscape or Explorer, and it stops, that means packets are being dropped. But the TCP isn’t giving up, it’s really backing off because there’s a problem. When you get a message back saying something can’t be found, your packets did get through, but there’s a problem with another part of the system.”

For Paxson, the pie-in-the-sky goal of his work is to give Internet users a method whereby they could click on a button, query a database and find out exactly what’s causing the problem.

Paxson and his colleagues aren’t the only ones carrying out such studies, he notes, but to date the network has been “woefully undermeasured” and there’s been no big-picture measurement of the Internet. While most users know about the typical problems, Paxson expects to find the unexpected.

“In all the measurement studies I’ve done there have been major surprises,” he says. “Usually things are working much worse than we imagined.”

The reliability of the Internet is a key issue for Paxson and his colleagues at Berkeley Lab, which registers about 500,000 connections every day. The Lab and other national labs in the Department of Energy complex increasingly rely on computer networks for scientific collaboration.

The Computing Sciences organization at Berkeley Lab conducts computer research and provides high-performance computing and networking services to DOE's Energy Research programs at national laboratories, universities and industry.

About Computing Sciences at Berkeley Lab

High performance computing plays a critical role in scientific discovery. Researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab’s Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.