Nonlinear Dimensionality Reduction - John A. Lee, Michel Verleysen

Blick ins Buch

Nonlinear Dimensionality Reduction (eBook)

John A. Lee, Michel Verleysen (Autoren)

eBook Download: PDF

2007
XVII, 309 Seiten
Springer New York (Verlag)
978-0-387-39351-3 (ISBN)

Lese- und Medienproben

Ebook-Leseprobe (PDF)

This book describes established and advanced methods for reducing the dimensionality of numerical databases. Each description starts from intuitive ideas, develops the necessary mathematical details, and ends by outlining the algorithmic implementation. The text provides a lucid summary of facts and concepts relating to well-known methods as well as recent developments in nonlinear dimensionality reduction. Methods are all described from a unifying point of view, which helps to highlight their respective strengths and shortcomings. The presentation will appeal to statisticians, computer scientists and data analysts, and other practitioners having a basic background in statistics or computational learning.

Methods of dimensionality reduction provide a way to understand and visualize the structure of complex data sets. Traditional methods like principal component analysis and classical metric multidimensional scaling suffer from being based on linear models. Until recently, very few methods were able to reduce the data dimensionality in a nonlinear way. However, since the late nineties, many new methods have been developed and nonlinear dimensionality reduction, also called manifold learning, has become a hot topic. New advances that account for this rapid growth are, e.g. the use of graphs to represent the manifold topology, and the use of new metrics like the geodesic distance. In addition, new optimization schemes, based on kernel techniques and spectral decomposition, have lead to spectral embedding, which encompasses many of the recently developed methods. This book describes existing and advanced methods to reduce the dimensionality of numerical databases. For each method, the description starts from intuitive ideas, develops the necessary mathematical details, and ends by outlining the algorithmic implementation. Methods are compared with each other with the help of different illustrative examples. The purpose of the book is to summarize clear facts and ideas about well-known methods as well as recent developments in the topic of nonlinear dimensionality reduction. With this goal in mind, methods are all described from a unifying point of view, in order to highlight their respective strengths and shortcomings. The book is primarily intended for statisticians, computer scientists and data analysts. It is also accessible to other practitioners having a basic background in statistics and/or computational learning, like psychologists (in psychometry) and economists.

Notations 13
Acronyms 15
High-Dimensional Data 16
Practical motivations 16
Fields of application 17
The goals to be reached 18
Theoretical motivations 18
How can we visualize high-dimensional spaces? 19
Curse of dimensionality and empty space phenomenon 21
Some directions to be explored 24
Relevance of the variables 25
Dependencies between the variables 25
About topology, spaces, and manifolds 26
Two benchmark manifolds 29
Overview of the next chapters 31
Characteristics of an Analysis Method 32
Purpose 32
Expected functionalities 33
Estimation of the number of latent variables 33
Embedding for dimensionality reduction 34
Embedding for latent variable separation 35
Internal characteristics 37
Underlying model 37
Algorithm 38
Criterion 38
Example: Principal component analysis 39
Data model of PCA 39
Criteria leading to PCA 41
Functionalities of PCA 44
Algorithms 46
Examples and limitations of PCA 48
Toward a categorization of DR methods 52
Hard vs. soft dimensionality reduction 53
Traditional vs. generative model 54
Linear vs. nonlinear model 55
Continuous vs. discrete model 55
Implicit vs. explicit mapping 56
Integrated vs. external estimation of the dimensionality 56
Layered vs. standalone embeddings 57
Single vs. multiple coordinate systems 57
Optional vs. mandatory vector quantization 58
Batch vs. online algorithm 58
Exact vs. approximate optimization 59
The type of criterion to be optimized 59
Estimation of the Intrinsic Dimension 61
Definition of the intrinsic dimension 61
Fractal dimensions 62
The q-dimension 63
Capacity dimension 65
Information dimension 66
Correlation dimension 67
Some inequalities 68
Practical estimation 69
Other dimension estimators 73
Local methods 73
Trial and error 74
Comparisons 76
Data Sets 77
PCA estimator 77
Correlation dimension 77
Local PCA estimator 79
Trial and error 80
Concluding remarks 81
Distance Preservation 82
State-of-the-art 82
Spatial distances 83
Metric space, distances, norms and scalar product 83
Multidimensional scaling 86
Sammon's nonlinear mapping 95
Curvilinear component analysis 101
Graph distances 110
Geodesic distance and graph distance 110
Isomap 115
Geodesic NLM 124
Curvilinear distance analysis 127
Other distances 132
Kernel PCA 133
Semidefinite embedding 138
Topology Preservation 145
State of the art 145
Predefined lattice 147
Self-Organizing Maps 147
Generative Topographic Mapping 155
Data-driven lattice 164
Locally linear embedding 164
Laplacian eigenmaps 171
Isotop 177
Method comparisons 185
Toy examples 185
The Swiss roll 185
Manifolds having essential loops or spheres 205
Cortex unfolding 211
Image processing 215
Artificial faces 218
Real faces 226
Conclusions 236
Summary of the book 236
The problem 236
A basic solution 237
Dimensionality reduction 237
Latent variable separation 239
Intrinsic dimensionality estimation 240
Data flow 241
Variable Selection 241
Calibration 241
Linear dimensionality reduction 242
Nonlinear dimensionality reduction 242
Latent variable separation 243
Further processing 243
Model complexity 243
Taxonomy 244
Distance preservation 247
Topology preservation 248
Spectral methods 249
Nonspectral methods 252
Tentative methodology 253
Perspectives 256
Matrix Calculus 258
Singular value decomposition 258
Eigenvalue decomposition 259
Square root of a square matrix 259
Gaussian Variables 261
One-dimensional Gaussian distribution 261
Multidimensional Gaussian distribution 263
Uncorrelated Gaussian variables 264
Isotropic multivariate Gaussian distribution 264
Linearly mixed Gaussian variables 266
Optimization 268
Newton's method 268
Finding extrema 269
Multivariate version 269
Gradient ascent/descent 270
Stochastic gradient descent 270
Vector quantization 272
Classical techniques 274
Competitive learning 275
Taxonomy 275
Initialization and ``dead units'' 276
Graph Building 278
Without vector quantization 279
K-rule 279
-rule 280
-rule 280
With vector quantization 281
Data rule 281
Histogram rule 283
Implementation Issues 286
Dimension estimation 286
Capacity dimension 286
Correlation dimension 286
Computation of the closest point(s) 288
Graph distances 289
References 292
Index 305

Erscheint lt. Verlag	31.10.2007
Reihe/Serie	Information Science and Statistics
Reihe/Serie	Information Science and Statistics
Zusatzinfo	XVII, 309 p.
Verlagsort	New York
Sprache	englisch
Themenwelt	Informatik ► Datenbanken ► Data Warehouse / Data Mining
	Mathematik / Informatik ► Informatik ► Grafik / Design
	Informatik ► Software Entwicklung ► User Interfaces (HCI)
	Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik
	Mathematik / Informatik ► Mathematik ► Allgemeines / Lexika
	Mathematik / Informatik ► Mathematik ► Angewandte Mathematik
	Mathematik / Informatik ► Mathematik ► Logik / Mengenlehre
	Technik
Schlagworte	algorithms • Computer • Database • Data Visualization • dimensionality reduction • learning • manifold learning • Multidimensional Scaling • nonlinear projection • Optimization • Principal Component Analysis • spectral embedding • Statistics • Topology
ISBN-10	0-387-39351-X / 038739351X
ISBN-13	978-0-387-39351-3 / 9780387393513

Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?

PDF (Wasserzeichen)

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.