AI and Geospatial Data: Traffic Flow Predictions
Accurate traffic predictions are critical for reducing congestion and improving transportation systems. Here's what you need to know:
- Classical machine learning models (like ARIMA and SVR) are simple and efficient but struggle with complex, dynamic traffic patterns and lack spatial awareness.
- Deep learning and graph-based GeoAI models (like GCNs and Transformers) excel in handling spatial-temporal data, offering better accuracy but requiring significant computational power.
- NanoGPT supports traffic forecasting indirectly by enriching data and simplifying workflows, though it's not suited for real-time predictions.
Key Takeaways:
- Classical models are fast but limited in handling unexpected events or large networks.
- Deep learning models provide higher accuracy by modeling traffic as interconnected graphs.
- NanoGPT complements these systems by managing and processing geospatial data efficiently.
Choosing the right approach depends on your goals: short-term predictions, large-scale urban planning, or data processing. Combining methods often yields the best results.
GeoAI for Transportation Webinar
sbb-itb-903b5f2
1. Classical Machine Learning Models
Classical machine learning models like ARIMA, Vector Autoregression (VAR), K-Nearest Neighbor (KNN), and Support Vector Regression (SVR) have long been the backbone of traffic forecasting. These models are relatively straightforward to implement and are computationally efficient, making them suitable for short-term predictions on smaller road networks.
Data Requirements
One of the key aspects of these models is their reliance on manual feature selection. Analysts must carefully select inputs such as historical speed, flow, and occupancy data before training the model. However, models like ARIMA demand linear and stationary data, which limits their effectiveness in dynamic traffic environments. As noted in one study:
"Traditional statistical approaches... are built on restrictive assumptions of linearity and stationarity. Consequently, they often fail to adapt to abrupt changes and nonlinear interactions present in real-world traffic systems." - Ata et al., PLOS One
This rigidity makes classical models less suitable for handling the complexities of modern traffic systems.
Model Accuracy
Under predictable conditions, such as routine daily commutes, classical models can deliver acceptable accuracy. But when unexpected events like accidents or road closures occur, their performance drops significantly. For instance, advanced models such as STF-GGRU have demonstrated RMSE values as low as 27.18 on the PeMSD4 dataset, far surpassing ARIMA's performance on the same dataset.
Scalability
Scaling these models to large, urban networks is a major hurdle. A critical issue is their inability to account for the interconnected nature of road networks. Classical models treat each traffic sensor as an isolated data point, ignoring the spatial relationships between them. As highlighted in a study:
"A more fundamental limitation of these algorithms was their lack of structural awareness: they treated traffic data merely as points in a high-dimensional feature space... disregarding essential spatial correlations embedded in the non-Euclidean structure of road networks." - Nature
This lack of structural awareness leads to performance degradation as the complexity of the network increases, underscoring the need for more sophisticated approaches.
Operational Constraints
Classical models also struggle in real-time Intelligent Transportation Systems (ITS). They cannot dynamically adapt to changing road conditions or account for long-range dependencies between distant sensors. While they work well in small-scale settings, they fail to capture the nonlinear dynamics and broader patterns of large, interconnected networks. These limitations highlight the growing demand for deep learning and graph-based GeoAI solutions, which are better equipped to handle such challenges.
2. Deep Learning and Graph-Based GeoAI
Deep learning and graph-based GeoAI models take a different approach from traditional traffic prediction methods. Instead of treating traffic sensors as isolated data points, these models see them as interconnected nodes within a network, leading to more detailed and accurate predictions.
Data Requirements
These models represent road networks as graphs, where sensors act as nodes and the roads connecting them are edges. This setup captures the irregular structure of road networks. To work effectively, these models need a combination of raw traffic metrics, spatial embeddings (derived from the graph structure), and temporal embeddings (such as time-of-day or day-of-week patterns). Data is usually processed in 5-minute intervals, offering a balance between detail and computational efficiency.
A major improvement over older methods is the use of dynamic graph construction. Instead of relying on static, distance-based maps, modern models can update sensor relationships in real time. For instance, if an accident or road closure alters traffic flow, the model adjusts its understanding of the network immediately. This adaptability sets these models apart from classical approaches, which rely on fixed inputs.
Model Accuracy
The accuracy improvements with these models are impressive. For example, in April 2026, a team supported by Saudi Arabia's Research, Development, and Innovation Authority (RDIA) tested the STF-GGRU model on California's PeMSD4 dataset, which includes data from 307 sensors. By incorporating an Integrated Spatiotemporal Feature Alignment (ISTFA) module, the model achieved an RMSE of 27.18. It also updated sensor relationships every 5 minutes, allowing it to differentiate between sensors that are geographically close but behave differently in terms of traffic patterns - something classical models can't manage.
"Graph Convolutional Networks (GCNs) overcome this limitation by effectively modeling traffic data as graphs, capturing complex spatial correlations across irregular and dynamic urban networks." - Ata KI et al., PLOS One
Similarly, in June 2026, the WaveGFormer framework used Discrete Wavelet Decomposition (DWT) to separate stable traffic trends from sudden disruptions, like accidents. This approach led to a 4.74% reduction in MAE on the PeMS08 benchmark, providing more reliable forecasts compared to models relying solely on attention mechanisms.
Scalability
Scaling these models to large urban networks presents unique challenges. Transformer-based models, for example, face quadratic growth in computational demands as sequence lengths increase, which can be a bottleneck for large-scale applications. The HG-GFNO framework addresses this by using a Graph Fourier Neural Operator, which reduces complexity to linear levels. This model outperformed leading benchmarks by up to 10.9% in RMSE and 11.9% in MAE across various forecasting horizons, all while using fewer parameters. Its "Sequence-as-Token" approach makes it practical for real-time deployment in large cities.
Operational Constraints
While deep learning models excel in accuracy, they are computationally intensive. However, newer architectures are helping to bridge this gap. Another challenge is feature blending, where deep graph convolutions can blur distinctions between neighboring nodes, making it harder to detect localized events. To address this, some models now include Spatial-Aware Gated Pattern Enhancers (SAGPE), which preserve unique node-level features.
Another issue is model drift. Over time, models trained on historical data may lose alignment with current traffic patterns. Continuous real-time calibration pipelines are essential to keep these models accurate and effective. These challenges further highlight the need for adaptive methods over static, classical techniques.
3. NanoGPT as an AI Model Access Platform

NanoGPT works alongside specialized forecasting models by tackling tasks such as data enrichment, text analysis, and embedding generation. It operates as a pay-as-you-go platform, offering flexible, on-demand access to a variety of AI models like ChatGPT, Gemini, and Deepseek. There are no subscription fees or upfront commitments, making it a practical option for teams needing occasional or scalable AI support. Instead of directly forecasting traffic flow, NanoGPT enhances the process by managing and enriching geospatial data, seamlessly integrating with advanced forecasting systems.
Data Requirements
NanoGPT’s unified Data API simplifies the process of accessing contextual geospatial data, making it easier to handle large volumes of information. This is particularly useful for teams working with multilingual datasets or complex geospatial pipelines. By leveraging embedding models from providers such as OpenAI, BAAI, and Jina, NanoGPT can transform raw data - like location descriptions, sensor logs, or incident reports - into vector formats that are ready for further analysis.
Scalability
The platform’s pay-as-you-go model ensures scalability while keeping costs in check. Features like real-time spend tracking and affordable text embeddings make it cost-effective to process large amounts of traffic data. This affordability allows teams to enrich extensive geospatial datasets, processing thousands of traffic reports without incurring excessive expenses.
Operational Constraints
NanoGPT isn’t designed for real-time, sensor-level traffic forecasting. Instead, it excels at tasks like data ingestion, summarization, and embedding generation. For real-time sensor-level predictions, graph-based models remain the better choice. NanoGPT serves as a complementary tool, providing flexible support to enhance broader traffic intelligence workflows without attempting to replace specialized spatiotemporal systems.
Pros and Cons of Each Approach
AI Models for Traffic Prediction: Classical ML vs Deep Learning vs NanoGPT
Every method discussed here offers distinct advantages and challenges when it comes to traffic prediction. By understanding where each approach excels and where it falls short, teams can make informed decisions about which methods best fit their needs. Below, we break down the key points for each.
Classical ML models like ARIMA and SVR are known for their simplicity and ease of use. They're straightforward to implement, require minimal computational power, and are easy to interpret. This makes them a great choice for smaller-scale projects or scenarios where explainability is crucial. However, these models have some notable limitations. They rely on assumptions of linearity and stationarity - conditions that rarely hold in the unpredictable world of traffic. Additionally, they fail to account for spatial relationships between nodes, which are critical for understanding congestion patterns.
Deep learning and graph-based GeoAI models tackle these issues by capturing complex, nonlinear relationships and spatial dependencies. This leads to significantly better accuracy compared to classical methods. However, these models come with their own set of challenges. They are computationally demanding, prone to overfitting, and, in the case of standard Transformers, suffer from quadratic computational complexity ($O(N^2)$), which can hinder scalability. Newer architectures, like Mamba-based models, aim to solve this with linear complexity ($O(N)$), but these approaches are still relatively new.
Deep learning models shine in sensor-level predictions, but platforms like NanoGPT add another layer of value by enhancing data processing and enrichment. NanoGPT can generate embeddings, reason over unstructured data, and enrich forecasting models. In fact, it has been shown to reduce Mean Absolute Error (MAE) by up to 36.06% in few-shot scenarios. That said, NanoGPT isn't without its challenges. Its primary limitation lies in the "modality gap" - traffic data is continuous and spatially dependent, while NanoGPT operates in a discrete text-based space. Despite this, integrating NanoGPT with specialized forecasting models can create a more comprehensive traffic prediction system.
Here's a quick summary of the strengths and limitations of each approach:
| Approach | Key Strengths | Limitations |
|---|---|---|
| Classical ML | Easy to implement and interpret; low computational requirements | Assumes linearity; ignores spatial relationships; requires manual feature engineering |
| Deep Learning (GCN/RNN) | Captures nonlinear and spatial dependencies | Resource-heavy; risk of overfitting; often relies on static graphs |
| Graph-based GeoAI (Transformers/Mamba) | Models long-range dependencies; achieves high accuracy | High complexity; standard Transformers scale poorly ($O(N^2)$) |
| NanoGPT (LLM Platform) | Strong reasoning capabilities; effective in few-shot scenarios | Struggles with continuous spatial data; lacks native spatial topology modeling |
Ultimately, the ideal solution depends on factors like data availability, infrastructure, and whether the focus is on real-time sensor predictions or broader data enrichment. In practice, the most effective systems often combine multiple approaches rather than relying on a single method.
Conclusion
Choosing the right AI model for traffic prediction hinges on the specific problem at hand. While no single solution fits every scenario, research has highlighted some distinct trends.
For short-term freeway predictions, hybrid models like STF-GGRU, which combine GCN and GRU architectures, excel. These models are particularly effective in edge computing environments, adapting swiftly to unexpected events like accidents or lane closures. They have demonstrated strong performance, achieving RMSE scores of 27.18 on PeMSD4 and 11.1 on PeMSD8.
When it comes to urban signal optimization, dynamic hypergraph-based models, such as HSTGHN, stand out. These models are adept at capturing the complex, high-order interactions across multiple intersections, making them ideal for managing traffic in densely populated city grids.
For long-term planning, spectral-temporal models like HG-GFNO offer a compelling solution. These models provide linear computational efficiency and have shown significant improvements, with up to 11.9% better MAE and 10.9% better RMSE compared to previous state-of-the-art models in long-horizon forecasting.
"Reliable long-horizon traffic forecasting is essential for proactive traffic control, congestion mitigation, energy-efficient urban planning, and enhancing the resilience of transportation networks." - HG-GFNO Research Team
These examples showcase how AI can be tailored to address specific challenges in traffic forecasting.
Additionally, for U.S. transportation agencies working in emerging corridors with limited historical data, LLM-based frameworks offer a practical solution for few-shot forecasting. These models complement spatiotemporal approaches, broadening the toolkit for effective traffic management.
FAQs
What traffic data do I need to start forecasting?
To predict traffic flow effectively, start by gathering historical time-series data from road sensors. This includes information on traffic flow, occupancy, and vehicle speed. Incorporating network topology - which details the road layout and connections - helps capture spatial relationships between different areas. For more accurate predictions, advanced models can integrate external factors like weather conditions, roadwork, or accidents. These additional inputs help the models adapt to sudden changes in traffic patterns.
How do graph models handle crashes and road closures?
Graph models face challenges when dealing with crashes and road closures because they typically rely on static graph structures. These fixed structures can't adjust to sudden changes in the road network. To overcome this limitation, researchers are working on dynamic graph structures and attention mechanisms that can update connectivity in real time. By treating road closures as shifting dependencies, these improvements allow models to better handle unexpected events, leading to more accurate traffic predictions.
Where does NanoGPT fit in a traffic prediction pipeline?
NanoGPT offers cutting-edge AI models capable of handling sequential data while tapping into extensive knowledge bases. Traditional traffic prediction systems often rely on specialized approaches like graph neural networks. However, incorporating Large Language Models (LLMs) - such as those provided by NanoGPT - is gaining traction in research. These models can improve forecasting precision, assist with intricate decision-making tasks, and excel in scenarios requiring minimal or even no prior training data.