Digital Speech Transmission and Enhancement (eBook)
John Wiley & Sons (Verlag)
9781119060987 (ISBN)
Enables readers to understand the latest developments in speech enhancement/transmission due to advances in computational power and device miniaturization
The Second Edition of Digital Speech Transmission and Enhancement has been updated throughout to provide all the necessary details on the latest advances in the theory and practice in speech signal processing and its applications, including many new research results, standards, algorithms, and developments which have recently appeared and are on their way into state-of-the-art applications.
Besides mobile communications, which constituted the main application domain of the first edition, speech enhancement for hearing instruments and man-machine interfaces has gained significantly more prominence in the past decade, and as such receives greater focus in this updated and expanded second edition.
Readers can expect to find information and novel methods on:
- Low-latency spectral analysis-synthesis, single-channel and dual-channel algorithms for noise reduction and dereverberation
- Multi-microphone processing methods, which are now widely used in applications such as mobile phones, hearing aids, and man-computer interfaces
- Algorithms for near-end listening enhancement, which provide a significantly increased speech intelligibility for users at the noisy receiving side of their mobile phone
- Fundamentals of speech signal processing, estimation and machine learning, speech coding, error concealment by soft decoding, and artificial bandwidth extension of speech signals
Digital Speech Transmission and Enhancement is a single-source, comprehensive guide to the fundamental issues, algorithms, standards, and trends in speech signal processing and speech communication technology, and as such is an invaluable resource for engineers, researchers, academics, and graduate students in the areas of communications, electrical engineering, and information technology.
Peter Vary is former Head of the Institute of Communication Systems at RWTH Aachen University, Germany. Professor Vary is a Fellow of IEEE, EURASIP, and ITG, and has been a Distinguished Lecturer of the IEEE Signal Processing Society.
Rainer Martin is Head of the Institute of Communication Acoustics at Ruhr-Universität Bochum, Germany. Professor Martin is a Fellow of the IEEE.
Both authors have been actively involved in speech processing research and teaching over several decades.
DIGITAL SPEECH TRANSMISSION AND ENHANCEMENT Enables readers to understand the latest developments in speech enhancement/transmission due to advances in computational power and device miniaturization The Second Edition of Digital Speech Transmission and Enhancement has been updated throughout to provide all the necessary details on the latest advances in the theory and practice in speech signal processing and its applications, including many new research results, standards, algorithms, and developments which have recently appeared and are on their way into state-of-the-art applications. Besides mobile communications, which constituted the main application domain of the first edition, speech enhancement for hearing instruments and man-machine interfaces has gained significantly more prominence in the past decade, and as such receives greater focus in this updated and expanded second edition. Readers can expect to find information and novel methods on: Low-latency spectral analysis-synthesis, single-channel and dual-channel algorithms for noise reduction and dereverberation Multi-microphone processing methods, which are now widely used in applications such as mobile phones, hearing aids, and man-computer interfaces Algorithms for near-end listening enhancement, which provide a significantly increased speech intelligibility for users at the noisy receiving side of their mobile phone Fundamentals of speech signal processing, estimation and machine learning, speech coding, error concealment by soft decoding, and artificial bandwidth extension of speech signals Digital Speech Transmission and Enhancement is a single-source, comprehensive guide to the fundamental issues, algorithms, standards, and trends in speech signal processing and speech communication technology, and as such is an invaluable resource for engineers, researchers, academics, and graduate students in the areas of communications, electrical engineering, and information technology.
Peter Vary is former Head of the Institute of Communication Systems at RWTH Aachen University, Germany. Professor Vary is a Fellow of IEEE, EURASIP, and ITG, and has been a Distinguished Lecturer of the IEEE Signal Processing Society. Rainer Martin is Head of the Institute of Communication Acoustics at Ruhr-Universität Bochum, Germany. Professor Martin is a Fellow of the IEEE. Both authors have been actively involved in speech processing research and teaching over several decades.
2
Models of Speech Production and Hearing
Digital speech communication systems are largely based on knowledge of speech production, hearing, and perception. In this chapter, we will discuss some fundamental aspects in so far as they are of importance for optimizing speech‐processing algorithms such as speech coding, speech enhancement, or feature extraction for automatic speech recognition.
In particular, we will study the mechanism of speech production and the typical characteristics of speech signals. The digital speech production model will be derived from acoustical and physical considerations. The resulting all‐pole model of the vocal tract is the key element of most of the current speech‐coding algorithms and standards.
Furthermore, we will provide insights into the human auditory system and we will focus on perceptual fundamentals which can be exploited to improve the quality and the effectiveness of speech‐processing algorithms to be discussed in later chapters. With respect to perception, the main aspects to be considered in digital speech transmission are the masking effect and the spectral resolution of the auditory system.
As a detailed discussion of the acoustic theory of speech production, phonetics, psychoacoustics, and perception is beyond the scope of this book, the reader is referred to the literature (e.g., [Fant 1970], [Flanagan 1972], [Rabiner, Schafer 1978], [Picket 1980], and [Zwicker, Fastl 2007]).
2.1 Sound Waves
Sound is a mechanical vibration that propagates through matter in the form of waves. Sound waves may be described in terms of a sound pressure field and a sound velocity vector field , which are both functions of a spatial co‐ordinate vector and time . While the sound pressure characterizes the density variations (we do not consider the DC component, also known as atmospheric pressure), the sound velocity describes the velocity of dislocation of the physical particles of the medium which carries the waves. This velocity is different from the speed of the traveling sound wave.
In the context of our applications, i.e., sound waves in air, sound pressure and resulting density variations are related by
and also the relation between and may be linearized. Then, in the general case of three spatial dimensions these two quantities are related via differential operators in an infinitesimally small volume of air particles as
where and are the speed of sound and the density at rest, respectively. These equations, also known as Euler's equation and continuity equation [Xiang, Blauert 2021], may be combined into the wave equation
where the Laplace operator in Cartesian coordinates is
A solution of the wave equation (2.3) is plane waves which feature surfaces of constant sound pressure propagating in a given spatial direction. A harmonic plane wave of angular frequency which propagates in positive direction or negative direction may be written in complex notation as
where is the wave number, is the wavelength, and , are the (possibly complex‐valued) amplitudes. Using (2.2), the component of the sound velocity is then given by
Thus, for a plane wave, the sound velocity is proportional to the sound pressure.
In our applications, waves which have a constant sound pressure on concentrical spheres are also of interest. Indeed, the wave equation (2.3) delivers a solution for the spherical wave which propagates in radial direction as
where is the propagating waveform. The amplitude of the sound wave diminishes with increasing distance from the source. We may then use the abstraction of a point source to explain the generation of such spherical waves.
An ideal point source may be represented by its source strength [Xiang, Blauert 2021]. Furthermore, with (2.2) we have
Then, the radial component of the velocity vector may be integrated over a sphere of radius to yield . For , the second term on the right‐hand side of (2.8) is smaller than the first. Therefore, for an infinitesimally small sphere, we find with (2.8)
and, with (2.7), for any
which characterizes, again, a spherical wave. The sound pressure is inversely proportional to the radial distance from the point source. For a harmonic excitation
we find the sound pressure
and hence, with (2.8) and an integration with respect to time, the sound velocity
Clearly, (2.12) and (2.13) satisfy (2.8). Because of the second term in the parentheses in (2.13), sound pressure and sound velocity are not in phase. Depending on the distance of the observation point to the point source, the behavior of the wave is distinctly different. When the second term cannot be neglected, the observation point is in the nearfield of the source. For , the observation point is in the farfield. The transition from the nearfield to the farfield depends on the wave number and, as such, on the wavelength or the frequency of the harmonic excitation.
2.2 Organs of Speech Production
The production of speech sounds involves the manipulation of an airstream. The acoustic representation of speech is a sound pressure wave originating from the physiological speech production system. A simplified schematic of the human speech organs is given in Figure 2.1. The main components and their functions are:
|
|
By contraction, the lungs produce an airflow which is modulated by the larynx, processed by the vocal tract, and radiated via the lips and the nostrils. The larynx provides several biological and sound production functions. In the context of speech production, its purpose is to control the stream of air that enters the vocal tract via the vocal cords.
Speech sounds are produced by means of various mechanisms. Voiced sounds are produced when the airflow is interrupted periodically by the movements (vibration) of the vocal cords (see Figure 2.2). This self‐sustained oscillation, i.e., the repeated opening and closing of the vocal cords, can be explained by the so‐called Bernoulli effect as in fluid dynamics: as airflow velocity increases, local pressure decreases. At the beginning of each cycle, the area between the vocal cords, which is called the glottis, is almost closed by means of appropriate tension of the vocal cords. Then an increased air pressure builds up below the glottis, forcing the vocal cords to open. As the vocal cords diverge, the velocity of the air flowing through the glottis increases steadily, which causes a drop in the local pressure. Then, the vocal cords snap back to their initial position and the next cycle can start if the airflow from the lungs and the tension of the vocal cords are sustained. Due to the abrupt periodic interruptions of the glottal airflow, as schematically illustrated in Figure 2.2, the resulting excitation (pressure wave) of the vocal tract has a fundamental frequency of and has a large number of harmonics. These are spectrally shaped according to the frequency response of the acoustic vocal tract. The duration of a single cycle is called the pitch period.
Figure 2.1 Organs of speech production.
Unvoiced sounds are generated by a constriction at the open glottis or along the vocal tract causing a nonperiodic turbulent airflow.
Plosive sounds (also known as stops) are caused by building up the air pressure behind a complete constriction somewhere in the vocal tract, followed by a sudden opening. The released airflow may create a voiced or an unvoiced sound or even a mixture of both, depending on the actual constellation of the articulators.
The vocal tract can be subdivided into three sections: the pharynx, the oral cavity, and the nasal cavity. As the entrance to the nasal cavity can be closed by the velum, a distinction is often made in the literature between the nasal tract (from velum to nostrils) and the other two sections (from trachea to lips, including the pharynx cavity). In this chapter, we will define the vocal tract as a variable acoustic resonator including the nasal cavity with the velum either open or closed, depending on the specific sound to be produced. From the engineering point of view, the resonance frequencies are varied by changing the size and the shape of the vocal tract using different constellations and movements of the articulators, i.e., tongue, teeth, lips, velum, lower jaw, etc. Thus, humans can produce a variety of different sounds based on different vocal tract constellations and different acoustic...
| Erscheint lt. Verlag | 23.11.2023 |
|---|---|
| Reihe/Serie | IEEE Press | Wiley - IEEE |
| Sprache | englisch |
| Themenwelt | Technik ► Elektrotechnik / Energietechnik |
| Schlagworte | Audio & Speech Processing & Broadcasting • Audio-, Sprachverarbeitung u. Übertragung • bandwidth extension • Drahtlose Kommunikation • Electrical & Electronics Engineering • Elektrotechnik u. Elektronik • hearing aid algorithms • hearing instruments • hearing technology • mobile phones • Near-End Listening Enhancement • packet-loss concealment • Signal Processing • Signalverarbeitung • speech algorithms • speech coding standards • Speech Enhancement • speech signal processing • Speech Technology • Sprachverarbeitung |
| ISBN-13 | 9781119060987 / 9781119060987 |
| Informationen gemäß Produktsicherheitsverordnung (GPSR) | |
| Haben Sie eine Frage zum Produkt? |
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich