Football Player Value Prediction: Comparing Machine Learning Models with Cross-Validation
DOI:
https://doi.org/10.54097/5bsf6020Keywords:
Current value; cross-validation; gradient boosting; random forest; ridge regression.Abstract
The project mainly focuses on the evaluation of different models used in order to predict the current value of football players based on a dataset from Kaggle. Three models-random forest, gradient boosting, and ridge regression-are being examined using key indicators such as R-squared (R2), root mean square error (RMSE), and mean absolute percentage error (MAPE). A cross-validation process is applied to ensure the robustness of the model evaluations. Among these models, the gradient boosting model is the most suitable since it provides the lowest RMSE and the highest R2 indicating high accuracy. A young, robust, and healthy player who has had a high market value in the past or present is more likely to have a high current value. The research aims to provide team managers with a relatively accurate model to predict player’s market value. At the same time, this study can help players understand which factors impact the most in terms of current value, encouraging them to improve themselves in certain areas.
Downloads
References
[1] Poli R, Besson R, Ravenel L. Econometric approach to assessing the transfer fees and values of professional football players. Economies, 2021, 10(1): 4.
[2] Anjum S, Fatima A. Predictive Analytics For FIFA Player Prices: An ML Approach. Journal of Scientific Research and Technology, 2023: 204-212.
[3] He Y. Predicting market value of soccer players using linear modeling techniques. University of Berkeley (working paper), 2012.
[4] McHale I G, Holmes B. Estimating transfer fees of professional footballers using advanced performance metrics and machine learning. European Journal of Operational Research, 2023, 306(1): 389-399.
[5] Lee H, Tama B A, Cha M. Prediction of Football Player Value using Bayesian Ensemble Approach. Communications in Statistics-Simulation and Computation, 2022.
[6] Horning N. Random Forests: An algorithm for image classification and generation of continuous fields data sets. Proceedings of the International Conference on Geoinformatics for Spatial Infrastructure Development in Earth and Allied Sciences, Osaka, Japan, 2010, 911: 1-6.
[7] Firinguetti L, Kibria G, Araya R. Study of partial least squares and ridge regression methods. Communications in Statistics-Simulation and Computation, 2017, 46(8): 6631-6644.
[8] Natekin A, Knoll A. Gradient boosting machines, a tutorial. Frontiers in neurorobotics, 2013, 7: 21.
[9] Hodson T O. Root mean square error (RMSE) or mean absolute error (MAE): When to use them or not. Geoscientific Model Development Discussions, 2022, 2022: 1-10.
[10] Chicco D, Warrens M J, Jurman G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. Peer computer science, 2021, 7: e623.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Highlights in Science, Engineering and Technology

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







