An Introduction to Correspondence Analysis (eBook)
John Wiley & Sons (Verlag)
978-1-119-04197-9 (ISBN)
Master the fundamentals of correspondence analysis with this illuminating resource
An Introduction to Correspondence Analysis assists researchers in improving their familiarity with the concepts, terminology, and application of several variants of correspondence analysis. The accomplished academics and authors deliver a comprehensive and insightful treatment of the fundamentals of correspondence analysis, including the statistical and visual aspects of the subject.
Written in three parts, the book begins by offering readers a description of two variants of correspondence analysis that can be applied to two-way contingency tables for nominal categories of variables. Part Two shifts the discussion to categories of ordinal variables and demonstrates how the ordered structure of these variables can be incorporated into a correspondence analysis. Part Three describes the analysis of multiple nominal categorical variables, including both multiple correspondence analysis and multi-way correspondence analysis.
Readers will benefit from explanations of a wide variety of specific topics, for example:
- Simple correspondence analysis, including how to reduce multidimensional space, measuring symmetric associations with the Pearson Ratio, constructing low-dimensional displays, and detecting statistically significant points
- Non-symmetrical correspondence analysis, including quantifying asymmetric associations
- Simple ordinal correspondence analysis, including how to decompose the Pearson Residual for ordinal variables
- Multiple correspondence analysis, including crisp coding and the indicator matrix, the Burt Matrix, and stacking
- Multi-way correspondence analysis, including symmetric multi-way analysis
Perfect for researchers who seek to improve their understanding of key concepts in the graphical analysis of categorical data, An Introduction to Correspondence Analysis will also assist readers already familiar with correspondence analysis who wish to review the theoretical and foundational underpinnings of crucial concepts.
Eric J. Beh is Professor of Statistics at the School of Mathematical & Physical Sciences at the University of Newcastle, Australia. He has been actively researching in many areas of categorical data analysis including ecological inference, measures of association and categorical models. For the past 25 years his research has focused primarily on the technical, computational and practical development of correspondence analysis. He has over 100 publications and, with Rosaria Lombardo, has authored Correspondence Analysis: Theory, Methods and New Strategies published by Wiley. Together, they have given short courses and workshops around the world on this topic.
Rosaria Lombardo is Associate Professor of Statistics at the Department of Economics of the University of Campania 'L. Vanvitelli', Italy. Her research interests include non-linear multivariate data analysis, quantification theory and, in particular, correspondence analysis and data visualization. Since receiving her PhD in Computational Statistics and Applications at the University of Naples 'Federico II', she has authored over 100 publications including those in Statistical Science, Psychometrika, Computational Statistics & Data Analysis, and the Journal of Statistical Planning and Inference.
Master the fundamentals of correspondence analysis with this illuminating resource An Introduction to Correspondence Analysis assists researchers in improving their familiarity with the concepts, terminology, and application of several variants of correspondence analysis. The accomplished academics and authors deliver a comprehensive and insightful treatment of the fundamentals of correspondence analysis, including the statistical and visual aspects of the subject. Written in three parts, the book begins by offering readers a description of two variants of correspondence analysis that can be applied to two-way contingency tables for nominal categories of variables. Part Two shifts the discussion to categories of ordinal variables and demonstrates how the ordered structure of these variables can be incorporated into a correspondence analysis. Part Three describes the analysis of multiple nominal categorical variables, including both multiple correspondence analysis and multi-way correspondence analysis. Readers will benefit from explanations of a wide variety of specific topics, for example: Simple correspondence analysis, including how to reduce multidimensional space, measuring symmetric associations with the Pearson Ratio, constructing low-dimensional displays, and detecting statistically significant points Non-symmetrical correspondence analysis, including quantifying asymmetric associations Simple ordinal correspondence analysis, including how to decompose the Pearson Residual for ordinal variables Multiple correspondence analysis, including crisp coding and the indicator matrix, the Burt Matrix, and stacking Multi-way correspondence analysis, including symmetric multi-way analysis Perfect for researchers who seek to improve their understanding of key concepts in the graphical analysis of categorical data, An Introduction to Correspondence Analysis will also assist readers already familiar with correspondence analysis who wish to review the theoretical and foundational underpinnings of crucial concepts.
Eric J. Beh is Professor of Statistics at the School of Mathematical & Physical Sciences at the University of Newcastle, Australia. He has been actively researching in many areas of categorical data analysis including ecological inference, measures of association and categorical models. For the past 25 years his research has focused primarily on the technical, computational and practical development of correspondence analysis. He has over 100 publications and, with Rosaria Lombardo, has authored Correspondence Analysis: Theory, Methods and New Strategies published by Wiley. Together, they have given short courses and workshops around the world on this topic. Rosaria Lombardo is Associate Professor of Statistics at the Department of Economics of the University of Campania "L. Vanvitelli", Italy. Her research interests include non-linear multivariate data analysis, quantification theory and, in particular, correspondence analysis and data visualization. Since receiving her PhD in Computational Statistics and Applications at the University of Naples "Federico II", she has authored over 100 publications including those in Statistical Science, Psychometrika, Computational Statistics & Data Analysis, and the Journal of Statistical Planning and Inference.
1
Introduction
1.1 Data Visualisation
Every statistical technique has a long and interesting history. Studying how to numerically and graphically analyse the association between categorical variables is no exception. The contributions of some of the most influential statisticians, including Karl Pearson, R.A. Fisher and G.U. Yule, have left an indelible imprint on how categorical data analysis is performed. Excellent descriptions on the historical development of categorical data analysis, in particular the analysis of contingency tables, can be found by referring to, for example, Goodman and Kruskal (1954) and Agresti (2002, Chapter 16). The influence of the early pioneers has led to almost countless statistical techniques that measure, model, visualise and further scrutinise how categorical variables are related to each other. Much of the key focus has been on the numerical assessment of the strength of the association between the variables – whether the analysis is concerned with two, three or more variables. Yule and Kendall (1950), Bishop et al. (1975) and Liebetrau (1983) also provided excellent discussions of a large number of measures of association for contingency tables. The most influential and widely adopted statistical technique for analysing the association between categorical variables is Pearson’s chi-squared statistic (Pearson 1904). The importance and wide applicability of this statistic has been discussed vigourously throughout the literature – see, for example, Lancaster (1969) and Greenwood and Nikulin (1996). The statistic, simply put, is defined as
where “Observed” refers to the observed count made in each cell of a table and “Expected” is its expected value under some model (even if that model reflects independence between the variables). While this statistic can detect if there is a statistically significant association between the variables it does not say anything more about the structure of the association. Various techniques may be considered for examining exactly how the association is structured. These include simple measures such as the product moment correlation (Pearson 1895) which will not only determine the strength of the association but also its direction. Model based approaches such as log-linear models and logistic models are commonly taught as a means of numerically assessing the nature of the association.
Despite the importance of modelling in statistics and her allied fields, there are two issues that need to be considered. Firstly, elementary statistics courses worldwide teach students about the importance of visualising the structure of the data as a means of “seeing” what it looks like before resorting to inferential techniques; this might be through constructing a bar chart, histogram or boxplot of the data. However, in practice many statistical categorical analysis techniques (of course, not all) ignore this visual component altogether and go straight to modelling the structure. Secondly, modelling techniques rely on methodological assumptions of the data, or the perceived behaviour of the data by the analyst. Such thoughts are elegantly, and simply, captured in George Box’s (1979) famous quote
All models are wrong but some are useful
Earlier, Box (1976) had said
Since all models are wrong, the scientist cannot obtain a “correct” one by excessive elaboration.
Of course, such general phrases have caused a stir amongst the statistical community since a model can never fully capture the “truth” of a phenomenon. We certainly see many advantages in the wide range, and flexibility, of models that are now available but we urge caution when adopting some of them.
An alternative philosophy that can be adopted for assessing the association between the variables of a contingency table is to explore how they are associated to each other by visualising the association. There is now a plethora of strategies available for visualising numerical and categorical data. Some of the more popular approaches include the mosaic plot (Friendly 2000, 2002; Theus 2012), the four-fold display (Fienberg 1975) and the cobweb diagram (Upton 2000). The interested reader may also refer to Gabriel (2002) and Wegman and Solka (2002) for the visualisation of multivariate data. The key features of any graphical summary are that what is produced is simple, easy to interpret, and provides a quick and accurate visual representation of data. Cook and Weisberg (1999, p. 29) say of any graphical summary
In statistical graphics, information is contained in observable shapes and patterns. The task of the creator of a graph is to construct an informative view of the data that is appropriately grounded in a statistical context. The task of the viewer is to find the patterns, and then to interpret their meaning in the same context. Just as an interpretation of a painting or drawing requires understanding of the artist’s context, interpreting a graph requires an understanding of the statistical context that surrounds the graph. As in art, conclusions about a graph without understanding the context are likely to be wrong, or off the point at best.
A very good example of the interplay between data visualisation and statistical context can be found by considering Anscombe’s quartet (Anscombe 1973). While discussing his point in terms of simple linear regression, Anscombe (1973) provided a compelling argument for the need to visualise data by highlighting four very different scatterplots with equal correlations and equal parameter estimates from a simple linear regression model. His argument shows that the context of the statistical technique needs to be made in terms of the data being analysed, and a visualisation of this context can help the analyst to better understand the statistical and practical contexts of the data being analysed.
1.2 Correspondence Analysis in a “Nutshell”
So where does correspondence analysis fit into this discussion? It is first important to recognise that often the first task in assessing the association structure between categorical variables is to either model, or measure this association, with the structure reflected in the sign and magnitude of a numerical measure. However, as we shall explore in this book, correspondence analysis (in a nutshell) provides a way to visualise the association between two or more categorical variables that form a contingency table. In doing so we gain an understanding of how particular categories from the same variable, or from different variables, “correspond” to each other. From such visual summaries, one can better understand how the variables (and categories) under inspection are associated. In doing so, the analyst can then refine their research question and postulate other structures that may exist in the data. This is all undertaken without the need to make any assumption about the structure of the data, nor does one need to impose untestable, unnecessary, or unnecessarily complicated assumptions on the data (or on the technique). The analyst, whether they are of a technical or practical persuasion, need not rely on a suite of numbers to interpret the association between the variables (unless they want to of course). Therefore, correspondence analysis is a technique that allows the data to inform the analyst of what it is trying to say rather than the model defining how the structure may be defined. The philosophy of letting the “data speak for itself” in correspondence analysis harks back to Jean-Paul Benzécri and his team at the University of Paris, France. Thus, Benzécri is considered to be the father of correspondence analysis although, in truth, many of the technical (and not visual) features stem back to earlier times. Since the early work of Benzécri and his team, the development of correspondence analysis and its many variants has been dominant in many parts of the European statistical, and allied, communities. This is especially so in France, Italy, The Netherlands and Spain. Outside of Europe, it has developed due to the contribution of researchers in Great Britain, Japan and, to a lesser extent, the USA. Unfortunately, in the Australasian region, correspondence analysis has not received the same level of attention as other parts of the world.
Before we continue with our discussion of correspondence analysis, it is worth highlighting that there are many excellent texts on its historical, computational, practical and theoretical development. The first major work that helped to expose correspondence analysis to the English speaking/reading statistical world was that of Hill (1974). Interestingly, he titled his paper “Correspondence analysis: A neglected multivariate method” which was published in the Journal of the Royal Statistical Society, Series C (Applied Statistics). Since then, the growth of correspondence analysis has been quite slow but further insight was made 10 years later with the publication of a book by Michael Greenacre. This book, titled Theory and Applications of Correspondence Analysis was published by Academic Press and remains the most cited book of all on the topic and brought correspondence analysis out of the (mainly) French statistical literature and exposed it to the vast English reading/speaking research community; it is thus considered a landmark publication in correspondence analysis. Another excellent book is that of Lebart et al....
| Erscheint lt. Verlag | 9.4.2021 |
|---|---|
| Reihe/Serie | Wiley Series in Probability and Statistics | Wiley Series in Probability and Statistics |
| Sprache | englisch |
| Themenwelt | Mathematik / Informatik ► Mathematik ► Statistik |
| Mathematik / Informatik ► Mathematik ► Wahrscheinlichkeit / Kombinatorik | |
| Schlagworte | Angew. Wahrscheinlichkeitsrechn. u. Statistik / Modelle • Applied Probability & Statistics - Models • Burt matrix • categorical data analysis • Kategorielle Datenanalyse • <p>Simple correspondence analysis • Multiple Correspondence Analysis • Multivariate Analyse • multivariate analysis • multi-way correspondence analysis • non-symmetrical correspondence analysis • ordinal correspondence analysis • pearson residual</p> • Statistics • Statistik • symmetrical correspondence analysis |
| ISBN-10 | 1-119-04197-X / 111904197X |
| ISBN-13 | 978-1-119-04197-9 / 9781119041979 |
| Informationen gemäß Produktsicherheitsverordnung (GPSR) | |
| Haben Sie eine Frage zum Produkt? |
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich