Für diesen Artikel ist leider kein Bild verfügbar.

Speech in Mobile and Pervasive Environments

AA Nanavati (Autor)

Software / Digital Media

312 Seiten

2012
Wiley-Blackwell (Hersteller)
978-1-119-96171-0 (ISBN)

Keine Verlagsinformationen verfügbar

Artikel merken

Brings together the advanced research that deals with issues related to speech processing on resource-constrained, wireless, and mobile devices, such as speech recognition in noisy environments, specialized hardware for speech recognition and synthesis, the use of context to enhance recognition, and more.

This book brings together the latest research in one comprehensive volume that deals with issues related to speech processing on resource-constrained, wireless, and mobile devices, such as speech recognition in noisy environments, specialized hardware for speech recognition and synthesis, the use of context to enhance recognition, the emerging and new standards required for interoperability, speech applications on mobile devices, distributed processing between the client and the server, and the relevance of Speech in Mobile and Pervasive Environments for developing regions - an area of explosive growth in the last 2-3 years.

Mr Nitendra Rajput, IBM Research, New Delhi, India Nitendra Rajput is a Research Staff Member with IBM India Reseach Lab (IRL) in New Delhi since 1998. Prior to this, he finished his Masters from Indian Institute of Technology, Bombay in Communications. At IRL, he has been working in the field of conversational systems for the last nine years. He has worked on Audio Visual Speech recognition, speech recognition systems for Indian languages. His interests are in statistical signal processing, dialog management, speech and image processing. Mr Amit A. Nanavati, IBM Research, New Delhi, India Amit A. Nanavati is a Research Staff Member, working in the Telecom Research Innovation Centre at IBM India Research Lab. For the last four years, he has been actively working in the area of mobile and pervasive computing. He was involved with the MDAT (Multi-device Authoring Technology) project (now a product) for adapting applications to pervasive devices. His research interests include information retrieval and constructing models for evaluation. Prior to joining IBM, he was working with Netscape for 4 years.

About the Series Editors List of Contributors Foreword Preface Acknowledgments 1 Introduction 1.1 Application design 1.2 Interaction modality 1.3 Speech processing 1.4 Evaluations 2 Mobile Speech Hardware: The Case for Custom Silicon 2.1 Introduction 2.2 Mobile hardware: Capabilities and limitations 2.2.1 Looking inside a mobile device: Smartphone example 2.2.2 Processing limitations 2.2.3 Memory limitations 2.2.4 Power limitations 2.2.5 Silicon technology and mobile hardware 2.3 Profiling existing software systems 2.3.1 Speech recognition overview 2.3.2 Profiling techniques summary 2.3.3 Processing time breakdown 2.3.4 Memory usage 2.3.5 Power and energy breakdown 2.3.6 Summary 2.4 Recognizers for mobile hardware: Conventional approaches 2.4.1 Reduced-resource embedded recognizers 2.4.2 Network recognizers 2.4.3 Distributed recognizers 2.4.4 An alternative approach: Custom hardware 2.5 Custom hardware for mobile speech recognition 2.5.1 Motivation 2.5.2 Hardware implementation: Feature extraction 2.5.3 Hardware implementation: Feature scoring 2.5.4 Hardware implementation: Search 2.5.5 Hardware implementation: Performance and power evaluation 2.5.6 Hardware implementation: Summary 2.6 Conclusion Bibliography 3 Embedded Automatic Speech Recognition and Text-to-Speech Synthesis 3.1 Automatic speech recognition 3.2 Mathematical formulation 3.3 Acoustic parameterization 3.3.1 Landmark-based approach 3.4 Acoustic modeling 3.4.1 Unit selection 3.4.2 Hidden Markov models 3.5 Language modeling 3.6 Modifications for embedded speech recognition 3.6.1 Feature computation 3.6.2 Likelihood computation 3.7 Applications 3.7.1 Car navigation systems 3.7.2 Smart homes 3.7.3 Interactive toys 3.7.4 Smartphones 3.8 Text-to-speech synthesis 3.9 Text to speech in a nutshell 3.10 Front end 3.11 Back end 3.11.1 Rule-based synthesis 3.11.2 Data-driven synthesis 3.11.3 Statistical parameteric speech synthesis 3.12 Embedded text-to-speech 3.13 Evaluation 3.14 Summary Bibliography 4 Distributed Speech Recognition 4.1 Elements of distributed speech processing 4.2 Front-end processing 4.2.1 Device requirements 4.2.2 Transmission issues in DSR 4.2.3 Back-end processing 4.3 ETSI standards 4.3.1 Basic front-end standard ES 201 108 4.3.2 Noise-robust front-end standard ES 202 050 4.3.3 Tonal-language recognition standard ES 202 211 4.4 Transfer protocol 4.4.1 Signaling 4.4.2 RTP payload format 4.5 Energy-aware distributed speech recognition 4.6 ESR, NSR, DSR Bibliography 5 Context in Conversation 5.1 Context modeling and aggregation 5.1.1 An example of composer specification 5.2 Context-based speech applications: Conspeakuous 5.2.1 Conspeakuous architecture 5.2.2 B-Conspeakuous 5.2.3 Learning as a source of context 5.2.4 Implementation 5.2.5 A tourist portal application 5.3 Context-based speech applications: Responsive information architect 5.4 Conclusion Bibliography 6 Software: Infrastructure, Standards, Technologies 6.1 Introduction 6.2 Mobile operating systems 6.3 Voice over internet protocol 6.3.1 Implications for mobile speech 6.3.2 Sample speech applications 6.3.3 Access channels 6.4 Standards 6.5 Standards: VXML 6.6 Standards: VoiceFleXML 6.6.1 Brief overview of speech-based systems 6.6.2 System architecture 6.6.3 System architecture: VoiceFleXML interpreter 6.6.4 VoiceFleXML: Voice browser 6.6.5 A prototype implementation 6.7 SAMVAAD 6.7.1 Background and problem setting 6.7.2 Reorganization algorithms 6.7.3 Minimizing the number of dialogs 6.7.4 Hybrid call-flows 6.7.5 Minimally altered call-flows 6.7.6 Device-independent call-flow characterization 6.7.7 SAMVAAD: Architecture, implementation and experiments 6.7.8 Splitting dialog call-flows 6.8 Conclusion 6.9 Summary and future work Bibliography 7 Architecture of Mobile Speech-Based and Multimodal Dialog Systems 7.1 Introduction 7.2 Multimodal architectures 7.3 Multimodal frameworks 7.4 Multimodal mobile applications 7.4.1 Mobile companion 7.4.2 MUMS 7.4.3 TravelMan 7.4.4 Stopman 7.5 Architectural models 7.5.1 Client-server systems 7.5.2 Dialog description systems 7.5.3 Generic model for distributed mobile multimodal speech systems 7.6 Distribution in the Stopman system 7.7 Conclusions Bibliography 8 Evaluation of Mobile and Pervasive Speech Applications 8.1 Introduction 8.1.1 Spoken interaction 8.1.2 Mobile-use context 8.1.3 Speech and mobility 8.2 Evaluation of mobile speech-based systems 8.2.1 User interface evaluation methodology 8.2.2 Technical evaluation of speech-based systems 8.2.3 Usability evaluations 8.2.4 Subjective metrics and objective metrics 8.2.5 Laboratory and field studies 8.2.6 Simulating mobility in the laboratory 8.2.7 Studying social context 8.2.8 Long- and short-term studies 8.2.9 Validity 8.3 Case studies 8.3.1 STOPMAN evaluation 8.3.2 TravelMan evaluation 8.3.3 Discussion 8.4 Theoretical measures for dialog call-flows 8.4.1 Introduction 8.4.2 Dialog call-flow characterization 8.4.3 {m,q,a} -characterization 8.4.4 {m,q,a} - complexity 8.4.5 Call-flow analysis using {m,q,a} - complexity 8.5 Conclusions Bibliography 9 Developing Regions 9.1 Introduction 9.2 Applications and studies 9.2.1 VoiKiosk 9.2.2 HealthLine 9.2.3 The spoken web 9.2.4 TapBack 9.3 Systems 9.4 Challenges Bibliography Index

Verlagsort	Hoboken
Sprache	englisch
Maße	156 x 234 mm
Gewicht	666 g
Themenwelt	Technik ► Elektrotechnik / Energietechnik
Themenwelt	Technik ► Nachrichtentechnik
ISBN-10	1-119-96171-8 / 1119961718
ISBN-13	978-1-119-96171-0 / 9781119961710
Zustand	Neuware

Haben Sie eine Frage zum Produkt?

Print-Ausgabe

Buch | Hardcover

CHF 159,95

Sie befinden sich hier:

auf Facebook teilen

bei Twitter

Link zu dieser Seite kopieren