Optimization of Audio Coding Parameters and Adaptive Denoising Using a Convolutionally Enhanced Transformer Framework

Zhangqi Song; Huan Xu; Yujie Chen; Chenlu Zhao

doi:10.54097/8qaccz90

Authors

Zhangqi Song
Huan Xu
Yujie Chen
Chenlu Zhao

DOI:

https://doi.org/10.54097/8qaccz90

Keywords:

Audio Processing, Storage Optimization, Adaptive Coding, Noise Removal, Time-Frequency Analysis.

Abstract

With the rapid advancement of digital audio technology, this study proposes an intelligent audio processing framework to address two key challenges: storage optimization and adaptive denoising. By modeling the trade-off between sampling rate, bit depth, and compression algorithm, the system recommends optimal encoding parameters for speech and music to balance file size and audio quality. For denoising, an adaptive algorithm based on time-frequency analysis is introduced, which applies targeted strategies according to identified noise types—Wiener filtering for background noise, median filtering with spectral subtraction for burst noise, and band-stop filtering with spectral smoothing for narrowband interference. Experiments on public datasets using ΔSNR, PESQ, and STOI metrics show that the method improves both noise suppression and audio fidelity, with SNR gains of up to 5.11dB. Subjective listening confirms enhanced clarity, and robustness tests reveal stable performance under moderate noise. Overall, the framework outperforms traditional fixed-parameter methods in both efficiency and quality.

Downloads

Download data is not yet available.

References

[1] Liu Z Q, Li L H. Design and implementation of multimedia audio player based on VS1053 [J]. Journal of Minjiang College, 2012, 33 (2): 86–90.

[2] Huang Y, Zhou W, Gan X, et al. Research on fault detection method based on audio feature clustering algorithm [J]. Computer Engineering and Applications, 2023, 59 (15): 281–289.

[3] Mulimani M, Mesaros A. Class-incremental learning for multi-label audio classification [C] // Proceedings of ICASSP 2024 - IEEE International Conference on Acoustics, Speech and Signal Processing. Seoul, South Korea, 2024, 2: 916–920.

[4] Yuan S. Research on audio scene classification based on Transformer [D]. Inner Mongolia: Inner Mongolia University of Science and Technology, 2024.

[5] Resende T M, August K B, Radecki Z D, et al. QSI and DTI of inherited white matter disorders in rat spinal cord: Early detection and comparison with quantitative electron microscopy findings [J]. Diagnostics, 2025, 15 (7): 837.

[6] Zhang H, Zhang W Q, Zhao Y M, et al. Exploration of audio steganography based on Transformer [J]. Digital Technology and Application, 2025, 43 (1): 73–75.

[7] Chen X Y, Qin W, Liu Y C, et al. Fusion of convolutional neural network and linear regression for audio recognition of belt conveyor roller faults [J/OL]. Coal Science and Technology, 2025, 53 (4): 1–9.

[8] Yang G W. Optimization method of digital audio denoising based on fully connected self-encoder [J]. Electroacoustic Technology, 2025, 49 (3): 144–146.2025.03.044.

[9] Altınbaş E A, Konyar Z M. Reverb hiding: A new framework for audio steganography [J]. Applied Acoustics, 2025, 235: 110696.

[10] Zhou S, Pan Q, Zhang Y, et al. Investigation on prediction of noise characteristics in full-frequency spectrum of DC charging pile and design for noise mitigation [J]. Results in Engineering, 2025, 26: 105163.

[11] Tang Y, Wang Y, Cui J, et al. Noise reduction and flow enhancement in vortex pumps through staggered impeller blade configuration [J]. Iranian Journal of Science and Technology, Transactions of Mechanical Engineering, 2025 (prepublished): 1–14.