Categorical Data Analysis by Example (eBook)
John Wiley & Sons (Verlag)
978-1-119-30793-8 (ISBN)
Introduces the key concepts in the analysis of categoricaldata with illustrative examples and accompanying R code
This book is aimed at all those who wish to discover how to analyze categorical data without getting immersed in complicated mathematics and without needing to wade through a large amount of prose. It is aimed at researchers with their own data ready to be analyzed and at students who would like an approachable alternative view of the subject.
Each new topic in categorical data analysis is illustrated with an example that readers can apply to their own sets of data. In many cases, R code is given and excerpts from the resulting output are presented. In the context of log-linear models for cross-tabulations, two specialties of the house have been included: the use of cobweb diagrams to get visual information concerning significant interactions, and a procedure for detecting outlier category combinations. The R code used for these is available and may be freely adapted. In addition, this book:
• Uses an example to illustrate each new topic in categorical data
• Provides a clear explanation of an important subject
• Is understandable to most readers with minimal statistical and mathematical backgrounds
• Contains examples that are accompanied by R code and resulting output
• Includes starred sections that provide more background details for interested readers
Categorical Data Analysis by Example is a reference for students in statistics and researchers in other disciplines, especially the social sciences, who use categorical data. This book is also a reference for practitioners in market research, medicine, and other fields.
GRAHAM J. G. UPTON is formerly Professor of Applied Statistics, Department of Mathematical Sciences, University of Essex. Dr. Upton is author of The Analysis of Cross-tabulated Data (1978) and joint author of Spatial Data Analysis by Example (2 volumes, 1995), both published by Wiley. He is the lead author of The Oxford Dictionary of Statistics (OUP, 2014). His books have been translated into Japanese, Russian, and Welsh.
GRAHAM J. G. UPTON is formerly Professor of Applied Statistics, Department of Mathematical Sciences, University of Essex. Dr. Upton is author of The Analysis of Cross-tabulated Data (1978) and joint author of Spatial Data Analysis by Example (2 volumes, 1995), both published by Wiley. He is the lead author of The Oxford Dictionary of Statistics (OUP, 2014). His books have been translated into Japanese, Russian, and Welsh.
Introduces the key concepts in the analysis of categoricaldata with illustrative examples and accompanying R code This book is aimed at all those who wish to discover how to analyze categorical data without getting immersed in complicated mathematics and without needing to wade through a large amount of prose. It is aimed at researchers with their own data ready to be analyzed and at students who would like an approachable alternative view of the subject. Each new topic in categorical data analysis is illustrated with an example that readers can apply to their own sets of data. In many cases, R code is given and excerpts from the resulting output are presented. In the context of log-linear models for cross-tabulations, two specialties of the house have been included: the use of cobweb diagrams to get visual information concerning significant interactions, and a procedure for detecting outlier category combinations. The R code used for these is available and may be freely adapted. In addition, this book: Uses an example to illustrate each new topic in categorical data Provides a clear explanation of an important subject Is understandable to most readers with minimal statistical and mathematical backgrounds Contains examples that are accompanied by R code and resulting output Includes starred sections that provide more background details for interested readers Categorical Data Analysis by Example is a reference for students in statistics and researchers in other disciplines, especially the social sciences, who use categorical data. This book is also a reference for practitioners in market research, medicine, and other fields.
GRAHAM J. G. UPTON is formerly Professor of Applied Statistics, Department of Mathematical Sciences, University of Essex. Dr. Upton is author of The Analysis of Cross-tabulated Data (1978) and joint author of Spatial Data Analysis by Example (2 volumes, 1995), both published by Wiley. He is the lead author of The Oxford Dictionary of Statistics (OUP, 2014). His books have been translated into Japanese, Russian, and Welsh.
"Concise introduction to dealing with
categorical data (with supporting R code)
which will help the general data scientist." (Raspberry Pi March 2017)
CHAPTER 1
Introduction
This chapter introduces basic statistical ideas and terminology in what the author hopes is a suitably concise fashion. Many readers will be able to turn to Chapter 2 without further ado!
1.1 What are categorical data?
Categorical data are the observed values of variables such as the color of a book, a person’s religion, gender, political preference, social class, etc. In short, any variable other than a continuous variable (such as length, weight, time, distance, etc.).
If the categories have no obvious order (e.g., Red, Yellow, White, Blue) then the variable is described as a nominal variable. If the categories have an obvious order (e.g., Small, Medium, Large) then the variable is described as an ordinal variable. In the latter case the categories may relate to an underlying continuous variable where the precise value is unrecorded, or where it simplifies matters to replace the measurement by the relevant category. For example, while an individual’s age may be known, it may suffice to record it as belonging to one of the categories “Under 18,” “Between 18 and 65,” “Over 65.”
If a variable has just two categories, then it is a binary variable and whether or not the categories are ordered has no effect on the ensuing analysis.
1.2 A typical data set
The basic data with which we are concerned are counts, also called frequencies. Such data occur naturally when we summarize the answers to questions in a survey such as that in Table 1.1.
Table 1.1 Hypothetical sports preference survey
| Sports preference questionnaire |
| (A) Are you:- Male Female ? |
| (B) Are you:- Aged 45 or under Aged over 45 ? |
| (C) Do you:- Prefer golf to tennis Prefer tennis to golf ? |
The people answering this (fictitious) survey will be classified by each of the three characteristics: gender, age, and sport preference. Suppose that the 400 replies were as given in Table 1.2 which shows that males prefer golf to tennis (142 out of 194 is 73%) whereas females prefer tennis to golf (161 out of 206 is 78%). However, there is a lot of other information available. For example:
Table 1.2 Results of sports preference survey
| Category of response | Frequency |
| Male, aged 45 or under, prefers golf to tennis | 64 |
| Male, aged 45 or under, prefers tennis to golf | 28 |
| Male, aged over 45, prefers golf to tennis | 78 |
| Male, aged over 45, prefers tennis to golf | 24 |
| Female, aged 45 or under, prefers golf to tennis | 22 |
| Female, aged 45 or under, prefers tennis to golf | 86 |
| Female, aged over 45, prefers golf to tennis | 23 |
| Female, aged over 45, prefers tennis to golf | 75 |
- There are more replies from females than males.
- There are more tennis lovers than golf lovers.
- Amongst males, the proportion preferring golf to tennis is greater amongst those aged over 45 (78/102 is 76%) than those aged 45 or under (64/92 is 70%).
This book is concerned with models that can reveal all of these subtleties simultaneously.
1.3 Visualization and cross-tabulation
While Table 1.2 certainly summarizes the results, it does so in a clumsily long-winded fashion. We need a more succinct alternative, which is provided in Table 1.3.
Table 1.3 Presentation of survey results by gender
| Male | Female |
| Sport | 45 and under | Over 45 | Total | Sport | 45 and under | Over 45 | Total |
| Tennis | 28 | 24 | 52 | Tennis | 86 | 75 | 161 |
| Golf | 64 | 78 | 142 | Golf | 22 | 23 | 45 |
| Total | 92 | 102 | 194 | Total | 108 | 98 | 206 |
A table of this type is referred to as a contingency table—in this case it is (in effect) a three-dimensional contingency table. The locations in the body of the table are referred to as the cells of the table. Note that the table can be presented in several different ways. One alternative is Table 1.4.
Table 1.4 Presentation of survey results by sport preference
| Prefers tennis | Prefers golf |
| Gender | 45 and under | Over 45 | Total | Gender | 45 and under | Over 45 | Total |
| Female | 86 | 75 | 161 | Female | 22 | 23 | 45 |
| Male | 28 | 24 | 52 | Male | 64 | 78 | 142 |
| Total | 114 | 99 | 213 | Total | 86 | 101 | 187 |
Figure 1.1 Illustration of results of sports preference survey.
In this example, the problem is that the page of a book is two-dimensional, whereas, with its three classifying variables, the data set is essentially three-dimensional, as Figure 1.1 indicates. Each face of the diagram contains information about the 2 × 2 category combinations for two variables for some particular category of the third variable.
With a small table and just three variables, a diagram is feasible, as Figure 1.1 illustrates. In general, however, there will be too many variables and too many categories for this to be a useful approach.
1.4 Samples, populations, and random variation
Suppose we repeat the survey of sport preferences, interviewing a second group of 100 individuals and obtaining the results summarized in Table 1.5.
Table 1.5 The results of a second survey
| Prefers tennis | Prefers golf |
| Gender | 45 and under | Over 45 | Total | Gender | 45 and under | Over 45 | Total |
| Female | 81 | 76 | 157 | Female | 16 | 24 | 40 |
| Male | 26 | 34 | 60 | Male | 62 | 81 | 143 |
| Total | 107 | 110 | 217 | Total | 78 | 105 | 183 |
As one would expect, the results are very similar to those from the first survey, but they are not identical. All the principal characteristics (for example, the preference of females for tennis and males for golf) are again present, but there are slight...
| Erscheint lt. Verlag | 24.10.2016 |
|---|---|
| Sprache | englisch |
| Themenwelt | Mathematik / Informatik ► Mathematik ► Statistik |
| Mathematik / Informatik ► Mathematik ► Wahrscheinlichkeit / Kombinatorik | |
| Technik | |
| Schlagworte | Analysis • Book • categorical • categoricaldata • categorical data analysis • Code • Concepts • ConText • Data • Datenanalyse • Example • Examples • excerpts • Given • Illustrated • illustrative • Kategoriale Datenanalyse • Kategorielle Datenanalyse • Key • Loglinear • many • Models • New • Output • presented • Readers • Ready • Researchers • Statistical Software / R • Statistics • Statistics for Social Sciences • Statistik • Statistik in den Sozialwissenschaften • Statistiksoftware / R • students • topic |
| ISBN-10 | 1-119-30793-7 / 1119307937 |
| ISBN-13 | 978-1-119-30793-8 / 9781119307938 |
| Informationen gemäß Produktsicherheitsverordnung (GPSR) | |
| Haben Sie eine Frage zum Produkt? |
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich