Man-Machine Speech Communication

20th National Conference, NCMMSC 2025, Zhenjiang, China, October 16–19, 2025, Proceedings

Jia Jia, Zhiyong Wu, Lijian Gao, Gongping Huang, Ya Li (Herausgeber)

Buch | Softcover

539 Seiten

2026
Springer Verlag, Singapore
978-981-95-5381-5 (ISBN)

Titel nicht im Sortiment

Artikel merken

This book constitutes the refereed proceedings of the 20th National Conference on Man-Machine Speech Communication, NCMMSC 2025, held in Zhenjiang, China, during October 16–19, 2025.

The 40 papers included in these proceedings were carefully reviewed and selected from 157 submissions. the conference will feature special events such as a Young Scholars Forum, Student Forum, Industry Forum, and Product and Technology Exhibition. Beyond the main program, the conference will also include publicoutreach activities, grant-writing workshops, and several special sessions.

.- Zero- and One-Shot Data Augmentation for Sentence-Level Dysarthric Speech
Recognition in Constrained Scenarios.

.- Multilevel and Granular L2 Pronunciation Assessment Using Stress-Based
Suprasegmental Features and Proficiency Adaptation.

.- CDMGTU-Net: A Causal Dual-Branch Multi-Channel Speech Enhancement Network
with Multi-Scale Gateted Feature Fusion.

.- A Two-Stage Band-Split Mamba-2 Network For Music Source Separation.

.- Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for Multilingual Speech-to-Text.

.- MambaVoc: State Space Models for High-Fidelity Audio Synthesis.

.- StreamFlow: Streaming Flow Matching with Block-wise Guided Attention Mask for Speech Token Decoding.

.- Automatic Speech Evaluation Method Leveraging Deep Feature Fusion.

.- Curriculum Reinforcement Learning for Robust Low-Resource Chinese Dialect Speech Recognition.

.- An Acoustic Study on Intonation Production of English Learners from Guanzhong Region in Shaanxi Province.

.- Improving Anomalous Sound Detection with Top-M Pseudo-Labeling.

.- Dementia Detection via Speech Temporal Sequences with Shifted Windows.

.- CL-EDiff: Cross-lingual emotional TTS system based on diffusion model.

.- When AI Speaks, Do We Follow? Phonetic Entrainment in Human-AI Dialogues.

.- Aishell1Mix: Towards Robust Mandarin Speech Separation with Scalable Audio Language Models.

.- Study of the Low-Rank Minimum Variance Distortionless Response Beamformer for Speech Enhancement.

.- Exploring Gender Bias in Alzheimer’s Disease Detection: Insights from Mandarin and Greek Speech Perception.

.- UniDaugMamba: A Unimodal Data-augmented Mamba for Speech-Based Depression Detection.

.- Serial-Parallel Dual-Path Architecture for Speaking Style Recognition.

.- Knowledge Augmented Finetuning Matters in Both RAG and Agent Based Dialog Systems.

.- NC-KWS: Few-Shot Class-Incremental Keyword Spotting Based on Neural Collapse.

.- ZSEmo-MTVITS: A Zero-Shot Cross-Lingual Emotional Speech Synthesis Model for Mandarin and Tibetan Based on VITS.

.- CUHK-EE Systems for the vTAD Challenge at NCMMSC 2025.

.- Accent Familiarity and Phonological Weighting in Spoken-Word Recognition.

.- Audio Deepfake Detection via Dual Branch Classifier with Self-Supervised Pre-Trained Model.

.- A Multi-Subspace Attention Approach for Robust Speech Spoofing Detection in Silence-Trimming Conditions.

.- Temporally Consistent Teeth Restoration for Talking Heads.

.- EEG as a Biometric Identifier: The Impact of Electrode Arrangement, Brain Areas, and Frequency Bands.

.- The Phonetic Modification and Facial Movements Made During Mandarin Vowel and Tone Production in Noise.

.- Exploring Audio-Visual Fusion for Sound Event Localization and Detection with BEATs.

.- On Multi-Input Multi-Frame MVDR Filter for Speech Enhancement with Heterophasic Presentation.

.- Adaptive Multi-source Fusion for Uyghur ASR Error Correction.

.- The determinants of Chinese lexical stress.

.- Introducing Discriminative Speaker Embeddings for Voice Timbre Attribute Detection.

.- TSELM: Target Speaker Extraction using Discrete Tokens and Language Models.

.- A Timbre Attribute Discrimination System Fusing Pre-trained Speaker Feature Extractors with Gender Prior Features.

.- Improving the Robustness of Audio-Visual Target Speaker Extraction With AV-HuBERT Based Lip Features.

.- A Hierarchical Fusion Modeling from Perception to Prediction with Personalized Features for Multimodal Depression Detection.

.- Revisiting Target Signal Definitions in Distortionless Superdirective Beamforming for Reverberant Speech Enhancement.

.- HiStyle: Hierarchical Style Embedding Prediction for Text-Prompt-Guided Controllable Speech Synthesis.

Erscheint lt. Verlag	30.1.2026
Reihe/Serie	Communications in Computer and Information Science
Zusatzinfo	138 Illustrations, color; 17 Illustrations, black and white
Verlagsort	Singapore
Sprache	englisch
Maße	155 x 235 mm
Themenwelt	Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik
Themenwelt	Technik ► Elektrotechnik / Energietechnik
Schlagworte	Audio signal analysis • Phonetics, phonology and prosody • Sound event detection • Speaker Recognition • Speech coding • speech emotion recognition • Speech Enhancement • Speech large language model • Speech Perception • Speech processing • Speech Recognition • Speech Science • speech security • speech synthesis and conversion • spoken dialog system
ISBN-10	981-95-5381-4 / 9819553814
ISBN-13	978-981-95-5381-5 / 9789819553815
Zustand	Neuware