Machine Learning Algorithms Help Predict Traffic Headaches

Berkeley Lab Researchers Team with UC Berkeley and Caltrans on Real-time Traffic Analysis

November 4, 2019

Contact: Kathy Kincade, kkincade@lbl.gov, +1 510 495 2124

Arterial streets surrounding the I-210 freeway in southern California, where the first traffic prediction pilot is taking place. Credit: Connected Corridors

Urban traffic roughly follows a periodic pattern associated with the typical “9 to 5” work schedule. However, when an accident happens, traffic patterns are disrupted. Designing accurate traffic flow models, for use during accidents, is a major challenge for traffic engineers, who must adapt to unforeseen traffic scenarios in real time.

A team of Lawrence Berkeley National Laboratory (Berkeley Lab) computer scientists is working with the California Department of Transportation (Caltrans) to use high performance computing (HPC) and machine learning to help improve Caltrans’ real-time decision making when incidents occur. The research was done in conjunction with California Partners for Advanced Transportation Technology (PATH), part of UC Berkeley's Institute for Transportation Studies (ITS), and Connected Corridors, a collaborative program to research, develop, and test an Integrated Corridor Management approach to managing transportation corridors in California.

Caltrans and Connected Corridors are now implementing the system on a trial basis in Los Angeles County through the I-210 pilot. Using real-time data from partners in southern California at the city, county, and state level, the goal is to improve Caltrans’ real-time decision-making by executing coordinated multijurisdictional traffic incident response plans to limit the negative impacts of these events. The first iteration of this system will be deployed in the cities of Arcadia, Duarte, Monrovia, and Pasadena in 2020, with plans for future deployments around the state.

“Many traffic-flow prediction methods exist, and each can be advantageous in the right situation,” said Sherry Li, a mathematician in Berkeley Lab’s Computational Research Division (CRD). “To alleviate the pain of relying on human operators who sometimes blindly trust one particular model, our goal was to integrate multiple models that produce more stable and accurate traffic predictions. We did this by designing an ensemble-learning algorithm that combines different sub-models. The algorithm works like a committee member voting process: different committee members have their own votes, possibly with different shares, but a final consensus should be reached by a combination of the individual shares.”

Ensemble learning is the art of combining a diverse set of learners (individual models) to improve, on the fly, the stability and predictive power of the model. This idea has been explored by machine learning researchers for a long time. What is special about traffic flow is the temporal characteristic; traffic flow measurements are correlated over time, as are the prediction results from different individual models.

In the Berkeley Lab/Caltrans collaboration, the ensemble model takes into account the mutual dependency of sub-models and assigns the “shares of vote” to balance their individual performance with their co-dependency. The ensemble model also values recent prediction performance more than older historical performance. At the end, the combined model is better than any of the single models used in testing in both prediction accuracy and stability.

The project started with funding from Berkeley Lab’s Laboratory Directed Research and Development (LDRD) program. The goal was to build a computational framework that would enable HPC applications specific to transportation, such as optimization and control of traffic equilibrium. The systems development team is led by Brian Peterson, a systems development manager at PATH who manages Connected Corridors’ systems development team. In addition, Hongyuan Zhan, a former Berkeley Lab Computing Sciences summer student from Penn State, was a major contributor to the Connected Corridors work for this research.

Real-time Data, Real-time Decision Making

Using data collected from Caltrans sensors on California highways, this project yielded novel algorithms that achieved accurate prediction on a 15-minute rolling basis. The team then validated and integrated the new algorithms using real-time traffic data collected using the Connected Corridors system: a streaming-based, real-time transportation data hub in which Spark MLlib – a scalable machine learning library – provides machine learning models that can be utilized within the proposed ensemble learning framework. The specific implementation of this work was to generate predicted traffic flows at points where sensing was present on the freeway. This in turn could be used to predict traffic demands at freeway entrances and traffic flows at freeway exits.

Ensemble learning partly addresses the issue of multi-modality (different types of vehicles) in traffic; however, it does not address sudden changes caused by construction or incidents. Thus, the research team applied online (real-time) learning techniques to enable the algorithm to learn not just from the past, but to adapt to new traffic conditions along the way in real time. Traditionally, online learning techniques are used to adapt the model parameters to the latest data but not to adapt the whole model structure.

Traffic flow prediction by the TDEC algorithm, a model combination scheme that can track the actual traffic closer than a pool of individual candidate models. Green line is the prediction range, blue line is the true flow, red line is the TDEC algorithm prediction. Credit: Hongyuan Zhan

The team took their framework a step further by adaptively learning the model hyperparameters – those that control the model behavior at a higher level than typical parameters. This hyperparameter optimization algorithm saves a lot of computation compared to the usual grid search, while achieving similar or better prediction performance, Li noted. Using the Connected Corridors Apache Kafka-based streaming platform with Spark clusters, the algorithm could be used in combination with these technologies for more accurate and timely traffic prediction and to aid real-time traffic control, such as rerouting traffic, altering traffic light configurations, and other corrective measures.

“The first deployment of the Connected Corridors program is intended to validate the concept and quantify improvements in travel times, traffic flow, and delays under real world conditions,” Peterson said. “Traffic modeling has indicated that significant improvements are possible with the traffic management strategies being developed. Future deployments are in the planning stage with opportunities for ongoing system improvements and new approaches.”

In addition to Li, Peterson, and Zhan, other contributors to this project include CRD’s John Wu and ITS’ Gabriel Gomes.

About Computing Sciences at Berkeley Lab

High performance computing plays a critical role in scientific discovery. Researchers increasingly rely on advances in computer science, mathematics, computational science, data science, and large-scale computing and networking to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab’s Computing Sciences Area researches, develops, and deploys new foundations, tools, and technologies to meet these needs and to advance research across a broad range of scientific disciplines.