Comparative Analysis of CNN-Based Object Detection Models: Faster R-CNN, SSD, and YOLO

Authors

  • Zhenyi Su

DOI:

https://doi.org/10.54097/9r6evm71

Keywords:

Object Detection, Convolutional Neural Networks, Region-based Convolutional Neural Network, Single Shot Multibox Detector, You Only Look Once.

Abstract

Target detection is widely used in the current environment. With the rapid development of deep learning, innovative models like Convolutional Neural Networks (CNNs) were born. CNNs have been widely used in many practical applications for object detection, since CNNS outperform traditional models in terms of speed and accuracy. This paper first introduces three well-known CNN-based target detection models: Region-based Convolutional Neural Network (Faster R-CNN), Single Shot Multibox Detector (SSD) and You Only Look Once (YOLO). Then this paper gives data on speed, accuracy and resource consumption. Based on these data, the advantages and disadvantages of these three models in different scenarios are systematically and comprehensively evaluated and analyzed. The purpose of the paper is to give researchers a deep understanding of the different characteristics of the three models and which model should be used best in what situations by examining their strengths and limitations. Finally, this paper analyzes the different characteristics of the three models to facilitate the researcher's possible subsequent improvements.

Downloads

Download data is not yet available.

References

[1] Alzubaidi L, Zhang J, Humaidi A J, Al-Dujaili A, Duan Y, Al-Shamma O, ... & Farhan L. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. Journal of Big Data, 2021, 8, 1 - 74.

[2] Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, & Berg A C. SSD: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11 – 14, 2016, Proceedings, Part I 14, pp. 21 - 37.

[3] Ren S, He K, Girshick R, & Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39 (6), 1137 - 1149.

[4] Redmon J. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.

[5] Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, & Zagoruyko S. End-to-end object detection with transformers. In European Conference on Computer Vision, Springer International Publishing, 2020, pp. 213 - 229.

[6] Bochkovskiy A, Wang C Y, & Liao H Y M. YOLOv4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934, 2020.

[7] Redmon J. YOLOv3: An incremental improvement. arXiv preprint arXiv: 1804.02767, 2018.

[8] Lin T Y, Dollár P, Girshick R, He K, Hariharan B, & Belongie S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2117 - 2125.

[9] Tan M, Pang R, & Le Q V. EfficientDet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 10781 - 10790.

Downloads

Published

11-05-2025

How to Cite

Su, Z. (2025). Comparative Analysis of CNN-Based Object Detection Models: Faster R-CNN, SSD, and YOLO. Highlights in Science, Engineering and Technology, 138, 147-152. https://doi.org/10.54097/9r6evm71