Stacking integration algorithm Based on Regression models and Informer for Olympic Medal Standings Forecasting
DOI:
https://doi.org/10.54097/pq214k97Keywords:
Random Forest, XGBoost, Informer, Mixed-effects model.Abstract
As the most influential sports event in the world, the Olympic Games is a wind vane for evaluating a country's sports strength, resource allocation, development strategy and competitive level. With the end of the 2024 Paris Olympics and the approach of the 2028 Los Angeles Olympics, scientific prediction of the number of medals has become a topic of great interest. However, challenges such as the emergence of first-time medal-winning countries, the host country effect, and the impact of additional events complicate the prediction process. This study proposes a comprehensive medal prediction framework that integrates multiple data science methods. The study introduces a novel country classification method based on K-means++ clustering, which categorises the participating countries into sports powerhouses and emerging countries, thus greatly improving the model's adaptability and prediction accuracy. In addition, the study adopts a hybrid modelling approach that combines multiple regression models (e.g., Random Forest, XGBoost) with time series models (Informer) through stacked integration to improve robustness and generalisation. For prediction uncertainty, the framework calculates 95% confidence intervals and incorporates prediction bias analysis to validate model stability. This study provides a robust and reliable tool for Olympic medal forecasting, offering valuable insights to policy makers and sports analysts.
Downloads
References
[1] Ikotun A M, Ezugwu A E, Abualigah L, et al. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data[J]. Information Sciences, 2023, 622: 178-210.
[2] Koopialipoor M, Asteris P G, Mohammed A S, et al. Introducing stacking machine learning approaches for the prediction of rock deformation[J]. Transportation Geotechnics, 2022, 34: 100756.
[3] Sun Z, Wang G, Li P, et al. An improved random forest based on the classification accuracy and correlation measurement of decision trees[J]. Expert Systems with Applications, 2024, 237: 121549.
[4] Zhang J, Ma X, Zhang J, et al. Insights into geospatial heterogeneity of landslide susceptibility based on the SHAP-XGBoost model[J]. Journal of environmental management, 2023, 332: 117357.
[5] Zhu Q, Han J, Chai K, et al. Time series analysis based on informer algorithms: A survey[J]. Symmetry, 2023, 15(4): 951.
[6] Olympics.com, "Paris 2024 Medals," Olympics.com, 2024. [Online]. Available: https://olympics.com/en/paris-2024/medals. [Accessed: Oct. 10, 2023].
[7] Olympics.com, "Lang Ping Biography," Olympics.com, 2024. [Online]. Available: https://olympics.com/en/athletes/ping-lang. [Accessed: Oct. 10, 2023].
[8] USA Gymnastics, "Bela and Martha Karolyi Coaching Team," USA Gymnastics Hall of Fame, 2024. [Online]. Available: https://usagym.org/halloffame/inductee/coaching-team-bela-martha-karolyi/. [Accessed: Oct. 10, 2023].
[9] Wood S N, Augustin N H. GAMs with integrated model selection using penalized regression splines and applications to environmental modelling[J]. Ecological modelling, 2002, 157(2-3): 157-177.
[10] Montgomery D C, Peck E A, Vining G G. Introduction to linear regression analysis[M]. John Wiley & Sons, 2021.
[11] Ali P, Younas A. Understanding and interpreting regression analysis[J]. Evidence-Based Nursing, 2021, 24(4): 116-118.
[12] Sun Y, Ding S, Zhang Z, et al. An improved grid search algorithm to optimize SVR for prediction[J]. Soft Computing, 2021, 25: 5633-5644.
[13] Bansal M, Goyal A, Choudhary A. A comparative analysis of K-nearest neighbor, genetic, support vector machine, decision tree, and long short term memory algorithms in machine learning[J]. Decision Analytics Journal, 2022, 3: 100071.
[14] Bai Y, Yang E, Han B, et al. Understanding and improving early stopping for learning with noisy labels[J]. Advances in Neural Information Processing Systems, 2021, 34: 24392-24403.
[15] Mahesh T R, Geman O, Margala M, et al. The stratified K-folds cross-validation and class-balancing methods with high-performance ensemble classifiers for breast cancer classification[J]. Healthcare Analytics, 2023, 4: 100247.
[16] Belete D M, Huchaiah M D. Grid search in hyperparameter optimization of machine learning models for prediction of HIV/AIDS test results[J]. International Journal of Computers and Applications, 2022, 44(9): 875-886.
[17] Zhou H, Zhang S, Peng J, et al. Informer: Beyond efficient transformer for long sequence time-series forecasting[C]//Proceedings of the AAAI conference on artificial intelligence. 2021, 35(12): 11106-11115.
[18] Zhang J, Li X, Tian J, et al. An integrated multi-head dual sparse self-attention network for remaining useful life prediction[J]. Reliability Engineering & System Safety, 2023, 233: 109096.
[19] Smithson M. Confidence intervals[M]. Sage, 2003.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Highlights in Science, Engineering and Technology

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







