Feature Selection and Model Evaluation Using Machine Learning in Traffic Flow Prediction

Authors

  • Shengzhe Huang

DOI:

https://doi.org/10.54097/6d3ry666

Keywords:

Machine Learning, Traffic Flow Prediction, Pearson coefficient, XGBoost model, Ablation experiments.

Abstract

Urban traffic congestion is a growing challenge, making accurate prediction of traffic flow at intersections essential for optimizing signal control and reducing congestion. This study leverages machine learning techniques to construct a prediction model based on historical traffic flow data. The Pearson coefficient is used to analyze the correlations between numerical features in the dataset, which are then visualized through a heatmap. The Extreme Gradient Boosting (XGBoost) model is employed to train a classifier and evaluate the importance of these features. Ablation experiments are conducted across 14 models—comprising Boosting, Bagging, Linear, and Traditional Lightweight models—using various numerical features. Results show that only three key features, "BusCount," "TruckCount," and "Total," must be retained in the majority of models to maintain stable classification accuracy. Removing these features causes a significant decline in prediction accuracy, with the exception of the Adaptive Boost (AdaBoost) model. This highlights the critical role these features play in effective traffic flow prediction and model stability.

Downloads

Download data is not yet available.

References

[1] X. Yin, G. Wu, J. Wei, Y. Shen, H. Qi and B. Yin, Transactions on Intelligent Transportation Systems, 23 (6), 4927-4943 (2021).

[2] B. Gomes, J. Coelho and H. Aidos, Intelligent Systems with Applications, 20, 200268.

[3] X. Liu, X. Qin, M. Zhou, H. Sun and S. Han, Transactions on Intelligent Transportation Systems, 2508-2521 (2022).

[4] J. Liu, F. Zheng, X. Liu and G. Guo, Intelligent Transportation Systems Magazine, 221-236 (2009).

[5] T. Lan, X. Zhang, D. Qu, Y. Yang and Y. Chen, Sustainability, 15 (2), 1374 (2023).

[6] C. Xiu, S. Zhan, J. Pan, Q. Peng, Z. Lin and S. C. Wong, Transportmetrica A: Transport Science, 1-37 (2024).

[7] S. Zhang, J. Ma, B. Geng and H. Wang, Electronic Research Archive, 32 (2) (2024).

[8] M. Berlotti, G. S. Di and S. Cavalieri, S. Sensors, 24 (7), 2348 (2024).

[9] L. Liu, C. Li, Y. Yang and J. Wang, J. Sustainability, 16 (23), 10216 (2024).

[10] X. Qi, G. Mei, J. Tu, N. Xi amd F. Piccialli, F. Transactions on intelligent transportation systems, 8687-8700 (2022).

[11] J. Ou, J. Xia, Y. J. Wu and W. Rao, Transportation Research Record, 2645 (1), 157-167 (2017).

[12] Kaggle-Traffic Prediction Dataset, 2024, available at https://www.kaggle.com/code/guanlintao/100-ensemble-traffic-prediction-dataset.

[13] Q. D. Dinh, D. Kunk, T. Son, et al. PloS one, 20 (4), e0319484 (2025).

[14] C. Bentéjac, A. Csörgő, G. Martínez-Muñoz, Artificial Intelligence Review, 54, 1937-1967 (2021).

[15] X. Wu, J. Wang. International Journal of Environmental Research and Public Health, 20 (6), 4977 (2023).

[16] R. Shwartz-Ziv and A. Armon, Information Fusion,81, 84-90 (2022).

[17] M. Alamri, M. Ykhlef. IEEE Access, 12, 14050-14060 (2024).

Downloads

Published

22-07-2025

How to Cite

Huang, S. (2025). Feature Selection and Model Evaluation Using Machine Learning in Traffic Flow Prediction. Highlights in Science, Engineering and Technology, 148, 73-80. https://doi.org/10.54097/6d3ry666