General Audio Signal Processing with Deep Learning

Kele Xu, Jisheng Bai, Boqing Zhu, Qisheng Xu, Yi Su (Herausgeber)

Buch | Hardcover

2026
Springer Verlag, Singapore
978-981-95-0835-8 (ISBN)

Titel nicht im Sortiment

Artikel merken

Industry case studies and best practices illuminate the path to building efficient and effective deep learning-based audio systems.This book empowers you with the knowledge to leverage the full potential of deep learning in audio processing, offering a comprehensive resource for tackling sophisticated audio tasks.

Dive into the cutting-edge integration of deep learning with audio signal processing in this authoritative guide. Designed for audio engineers, data scientists, and tech enthusiasts, this book demystifies the complex world of deep neural networks, including CNNs and RNNs, and their applications in speech recognition, music transcription, and sound event detection.

Explore the practical side of deep learning with hands-on tutorials using TensorFlow and PyTorch, building your intuition for model architectures and hyperparameter tuning. Gain insights into real-world deployment challenges, from data preprocessing to model evaluation, interpretability, and scalability. Industry case studies and best practices illuminate the path to building efficient and effective deep learning-based audio systems.

This book empowers you with the knowledge to leverage the full potential of deep learning in audio processing, offering a comprehensive resource for tackling sophisticated audio tasks. Whether you're a researcher, engineer, or enthusiast, this guide is your key to mastering the synergy of audio signal processing and deep learning, ensuring you approach audio-related challenges with confidence and proficiency.

Kele Xu (Senior Member, IEEE) is currently an Associate Professor with the School of Computer Science, National University of Defense Technology, Changsha, China. His research interests include audio signal processing, machine learning, and intelligent software systems. He serves as an Associate Editor for IEEE Transactions on Circuits and Systems for Video Technology and a Guest Editor for Science Partner Journal Cyborg and Bionic Systems. He has co-authored more than 100 publications in peer-reviewed journals and conference proceedings, including ICLR, NeurIPS, CVPR, ICML, TASLP, TAI, TMI, JASA, AAAI, IJCAI, ASE, ACM MM, and ICASSP. Jisheng Bai is currently a Lecturer with the Center for Image and Information Processing, School of Communications and Information Engineering, Xi’an University of Posts and Telecommunications, Xi’an, China. His research interests focus on deep learning and audio processing, with particular emphasis on bridging fundamental algorithms with practical deployment in speech enhancement, acoustic sensing, and intelligent audio systems. Boqing Zhu is a Research Fellow at the National University of Defense Technology, China, specializing in computational acoustics, underwater acoustic signal processing, and continual learning methodologies. His research lies at the intersection of traditional signal processing and modern machine learning, with a particular focus on the foundations of deep learning, continual learning paradigms, and their applications to underwater acoustic sensing, audio synthesis, and audio-visual learning. Qisheng Xu is a Ph.D. candidate at the College of Computer Science and Technology, National University of Defense Technology, China. His research interests include audio signal processing, continual learning algorithms, and intelligent computing. Yi Su is currently pursuing the Ph.D. degree in Computer Science at the National University of Defense Technology, Changsha, China. Her research interests include pattern recognition, data engineering, and multimodal signal processing, with a particular focus on audio-language modeling and cross-modal understanding. Mou Wang is currently a Postdoctoral Researcher with the Institute of Acoustics, Chinese Academy of Sciences, Beijing, China. His research interests include machine learning and speech signal processing, with a particular focus on speech enhancement and audio denoising techniques. He has received several distinctions, including the Excellent Paper Award at the International Conference on Ubi-Media Computing and Workshops in 2019, the Best Paper Award at the 19th National Conference on Man-Machine Speech Communication in 2024, and the Outstanding Reviewer recognition from IEEE Transactions on Multimedia in 2022.

"Chapter1-Introduction".- "Chapter2-Traditional Representation of Audio Signals".- "Chapter3-Machine Learning Methods for Audio Signals".- "Chapter4-Semi-Supervised Learning for Audio Signal".- "Chapter5-Self-Supervised Learning for Audio Signal".- "Chapter 6: Active Learning for Audio Signal".- "Chapter 7: Incremental Learning for Audio Signal Processing".- "Chapter 8: Few-Shot Learning for Audio Signal Processing".- "Chapter 9: Data Augmentation for Audio Signal".- "Chapter 10: Audio Classification".- "Chapter 11: Sound Source Localization".- "Chapter 12: Anomalous Sound Detection".- "Chapter 13: Audio Source Separation".- "Chapter 14: Audio Generation".- "Chapter 15: Audio-language Learning".- "Chapter 16: Audio-visual Signal Analysis".- "Chapter 17: Audio super-resolution".- "Chapter 18: EEG Auditory Decoding".- "Chapter 19: Audio Denosing".- "Chapter 20: Underwater Acoustics".- "Chapter 21: Urban Sound".- "Chapter 22: Industry Sound".- "Chapter 23: Medical Sound".- "Chapter 24: Bioacoustics".- "Chapter 25: Future Perspective".

Erscheinungsdatum	29.11.2025
Zusatzinfo	Approx. 650 p.
Verlagsort	Singapore
Sprache	englisch
Maße	155 x 235 mm
Themenwelt	Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik
Themenwelt	Technik ► Elektrotechnik / Energietechnik
Schlagworte	Audio AI • Audio Deep Learning • Audio Deep Learning Frameworks • Audio Signal Processing • Audio Signal Transformations • Deep Learning in Audio Analysis • Music Transcription Techniques • Neural Networks for Audio Processing • Speech Recognition Systems • Time-Frequency Analysis in Audio
ISBN-10	981-95-0835-5 / 9819508355
ISBN-13	978-981-95-0835-8 / 9789819508358
Zustand	Neuware