Zum Hauptinhalt springen
Nicht aus der Schweiz? Besuchen Sie lehmanns.de

An Introduction to Categorical Data Analysis (eBook)

(Autor)

eBook Download: PDF
2018 | 3. Auflage
John Wiley & Sons (Verlag)
978-1-119-40527-6 (ISBN)

Lese- und Medienproben

An Introduction to Categorical Data Analysis - Alan Agresti
Systemvoraussetzungen
132,99 inkl. MwSt
(CHF 129,90)
Der eBook-Verkauf erfolgt durch die Lehmanns Media GmbH (Berlin) zum Preis in Euro inkl. MwSt.
  • Download sofort lieferbar
  • Zahlungsarten anzeigen

A valuable new edition of a standard reference

The use of statistical methods for categorical data has increased dramatically, particularly for applications in the biomedical and social sciences. An Introduction to Categorical Data Analysis, Third Edition summarizes these methods and shows readers how to use them using software. Readers will find a unified generalized linear models approach that connects logistic regression and loglinear models for discrete data with normal regression for continuous data.

Adding to the value in the new edition is:

• Illustrations of the use of R software to perform all the analyses in the book

• A new chapter on alternative methods for categorical data, including smoothing and regularization methods (such as the lasso), classification methods such as linear discriminant analysis and classification trees, and cluster analysis

• New sections in many chapters introducing the Bayesian approach for the methods of that chapter

• More than 70 analyses of data sets to illustrate application of the methods, and about 200 exercises, many containing other data sets

• An appendix showing how to use SAS, Stata, and SPSS, and an appendix with short solutions to most odd-numbered exercises

Written in an applied, nontechnical style, this book illustrates the methods using a wide variety of real data, including medical clinical trials, environmental questions, drug use by teenagers, horseshoe crab mating, basketball shooting, correlates of happiness, and much more.

An Introduction to Categorical Data Analysis, Third Edition is an invaluable tool for statisticians and biostatisticians as well as methodologists in the social and behavioral sciences, medicine and public health, marketing, education, and the biological and agricultural sciences.



ALAN AGRESTI is Distinguished Professor Emeritus at the University of Florida. He has presented short courses on categorical data methods in 35 countries. He is the author of seven books, including the bestselling Categorical Data Analysis (Wiley), Foundations of Linear and Generalized Linear Models (Wiley), Statistics: The Art and Science of Learning from Data (Pearson), and Statistical Methods for the Social Sciences (Pearson).

ALAN AGRESTI is Distinguished Professor Emeritus at the University of Florida. He has presented short courses on categorical data methods in 35 countries. He is the author of seven books, including the bestselling Categorical Data Analysis (Wiley), Foundations of Linear and Generalized Linear Models (Wiley), Statistics: The Art and Science of Learning from Data (Pearson), and Statistical Methods for the Social Sciences (Pearson).

An Introduction to Categorical Data Analysis 3
Contents 7
Preface 11
About the Companion Website 15
1 Introduction 17
1.1 CATEGORICAL RESPONSE DATA 17
1.1.1 Response Variable and Explanatory Variables 18
1.1.2 Binary–Nominal–Ordinal Scale Distinction 18
1.1.3 Organization of this Book 19
1.2 PROBABILITY DISTRIBUTIONS FOR CATEGORICAL DATA 19
1.2.1 Binomial Distribution 19
1.2.2 Multinomial Distribution 21
1.3 STATISTICAL INFERENCE FOR A PROPORTION 21
1.3.1 Likelihood Function and Maximum Likelihood Estimation 21
1.3.2 Significance Test About a Binomial Parameter 23
1.3.3 Example: Surveyed Opinions About Legalized Abortion 23
1.3.4 Confidence Intervals for a Binomial Parameter 24
1.3.5 Better Confidence Intervals for a Binomial Proportion 24
1.4 STATISTICAL INFERENCE FOR DISCRETE DATA 26
1.4.1 Wald, Likelihood-Ratio, and Score Tests 26
1.4.2 Example: Wald, Score, and Likelihood-Ratio Binomial Tests 27
1.4.3 Small-Sample Binomial Inference and the Mid P-Value 28
1.5 BAYESIAN INFERENCE FOR PROPORTIONS 29
1.5.1 The Bayesian Approach to Statistical Inference 30
1.5.2 Bayesian Binomial Inference: Beta Prior Distributions 31
1.5.3 Example: Opinions about Legalized Abortion, Revisited 32
1.5.4 Other Prior Distributions 32
1.6 USING R SOFTWARE FOR STATISTICAL INFERENCE ABOUT PROPORTIONS 33
1.6.1 Reading Data Files and Installing Packages 33
1.6.2 Using R for Statistical Inference about Proportions 34
1.6.3 Summary: Choosing an Inference Method 36
Exercises 37
2 Analyzing Contingency Tables 41
2.1 PROBABILITY STRUCTURE FOR CONTINGENCY TABLES 42
2.1.1 Joint, Marginal, and Conditional Probabilities 42
2.1.2 Example: Sensitivity and Specificity 42
2.1.3 Statistical Independence of Two Categorical Variables 44
2.1.4 Binomial and Multinomial Sampling 44
2.2 COMPARING PROPORTIONS IN 2×2 CONTINGENCY TABLES 45
2.2.1 Difference of Proportions 45
2.2.2 Example: Aspirin and Incidence of Heart Attacks 46
2.2.3 Ratio of Proportions (Relative Risk) 46
2.2.4 Using R for Comparing Proportions in 2×2 Tables 47
2.3 THE ODDS RATIO 47
2.3.1 Properties of the Odds Ratio 48
2.3.2 Example: Odds Ratio for Aspirin Use and Heart Attacks 49
2.3.3 Inference for Odds Ratios and Log Odds Ratios 49
2.3.4 Relationship Between Odds Ratio and Relative Risk 50
2.3.5 Example: The Odds Ratio Applies in Case-Control Studies 51
2.3.6 Types of Studies: Observational Versus Experimental 52
2.4 CHI-SQUARED TESTS OF INDEPENDENCE 52
2.4.1 Pearson Statistic and the Chi-Squared Distribution 53
2.4.2 Likelihood-Ratio Statistic 54
2.4.3 Testing Independence in Two-Way Contingency Tables 54
2.4.4 Example: Gender Gap in Political Party Affiliation 55
2.4.5 Residuals for Cells in a Contingency Table 55
2.4.6 Partitioning Chi-Squared Statistics 57
2.4.7 Limitations of Chi-Squared Tests 58
2.5 TESTING INDEPENDENCE FOR ORDINAL VARIABLES 58
2.5.1 Linear Trend Alternative to Independence 59
2.5.2 Example: Alcohol Use and Infant Malformation 59
2.5.3 Ordinal Tests Usually Have Greater Power 61
2.5.4 Choice of Scores 61
2.5.5 Trend Tests for r×2 and 2×c and Nominal–Ordinal Tables 62
2.6 EXACT FREQUENTIST AND BAYESIAN INFERENCE 62
2.6.1 Fisher’s Exact Test for 2×2 Tables 62
2.6.2 Example: Fisher’s Tea Tasting Colleague 63
2.6.3 Conservatism for Actual (Type I Error) Mid -Values
2.6.4 Small-Sample Confidence Intervals for Odds Ratio 66
2.6.5 Bayesian Estimation for Association Measures 66
2.6.6 Example: Bayesian Inference in a Small Clinical Trial 67
2.7 ASSOCIATION IN THREE-WAY TABLES 68
2.7.1 Partial Tables 69
2.7.2 Example: Death Penalty Verdicts and Race 69
2.7.3 Simpson’s Paradox 70
2.7.4 Conditional and Marginal Odds Ratios 71
2.7.5 Homogeneous Association 72
Exercises 72
3 Generalized Linear Models 81
3.1 COMPONENTS OF A GENERALIZED LINEAR MODEL 82
3.1.1 Random Component 82
3.1.2 Linear Predictor 82
3.1.3 Link Function 82
3.1.4 Ordinary Linear Model: GLM with Normal Random Component 83
GENERALIZED LINEAR MODELS FOR BINARY DATA 84
3.2.1 Linear Probability Model 84
3.2.2 Logistic Regression Model 84
3.2.3 Example: Snoring and Heart Disease 85
3.2.4 Using R to Fit Generalized Linear Models for Binary Data 87
3.2.5 Data Files: Ungrouped or Grouped Binary Data 88
3.3 GENERALIZED LINEAR MODELS FOR COUNTS AND RATES 88
3.3.1 Poisson Distribution for Counts 88
3.3.2 Poisson Loglinear Model 89
3.3.3 Example: Female Horseshoe Crabs and their Satellites 89
3.3.4 Overdispersion: Greater Variability than Expected 91
3.4 STATISTICAL INFERENCE AND MODEL CHECKING 92
3.4.1 Wald, Likelihood-Ratio, and Score Inference Use the Likelihood Function 93
3.4.2 Example: Political Ideology and Belief in Evolution 94
3.4.3 The Deviance of a GLM 96
3.4.4 Model Comparison Using the Deviance 96
3.4.5 Residuals Comparing Observations to the Model Fit 97
3.5 FITTING GENERALIZED LINEAR MODELS 98
3.5.1 The Fisher Scoring Algorithm Fits GLMs 98
3.5.2 Bayesian Methods for Generalized Linear Models 99
3.5.3 GLMs: A Unified Approach to Statistical Analysis 99
Exercises 100
4 Logistic Regression 105
4.1 THE LOGISTIC REGRESSION MODEL 105
4.1.1 The Logistic Regression Model 106
4.1.2 Odds Ratio and Linear Approximation Interpretations 106
4.1.3 Example: Whether a Female Horseshoe Crab Has Satellites 107
4.1.4 Logistic Regression with Retrospective Studies 109
4.1.5 Normally Distributed X Implies Logistic Regression for Y 110
4.2 STATISTICAL INFERENCE FOR LOGISTIC REGRESSION 110
4.2.1 Confidence Intervals for Effects 110
4.2.2 Significance Testing 111
4.2.3 Fitted Values and Confidence Intervals for Probabilities 112
4.2.4 Why Use a Model to Estimate Probabilities? 113
4.3 LOGISTIC REGRESSION WITH CATEGORICAL PREDICTORS 114
4.3.1 Indicator Variables Represent Categories of Predictors 114
4.3.2 Example: Survey about Marijuana Use 115
4.3.3 ANOVA-Type Model Representation of Factors 116
4.3.4 Tests of Conditional Independence and of Homogeneity for Three-Way Contingency Tables 117
4.4 MULTIPLE LOGISTIC REGRESSION 118
4.4.1 Example: Horseshoe Crabs with Color and Width Predictors 118
4.4.2 Model Comparison to Check Whether a Term is Needed 120
4.4.3 Example: Treating Color as Quantitative or Binary 120
4.4.4 Allowing Interaction between Explanatory Variables 122
4.4.5 Effects Depend on Other Explanatory Variables in Model 123
4.5 SUMMARIZING EFFECTS IN LOGISTIC REGRESSION 123
4.5.1 Probability-Based Interpretations 123
4.5.2 Marginal Effects and Their Average 124
4.5.3 Standardized Interpretations 125
4.6 SUMMARIZING PREDICTIVE POWER: CLASSIFICATION TABLES, ROC CURVES, AND MULTIPLE CORRELATION 126
4.6.1 Summarizing Predictive Power: Classification Tables 126
4.6.2 Summarizing Predictive Power: ROC Curves 127
4.6.3 Summarizing Predictive Power: Multiple Correlation 128
EXERCISES 129
5 Building and Applying Logistic Regression Models 139
5.1 STRATEGIES IN MODEL SELECTION 139
5.1.1 How Many Explanatory Variables Can the Model Handle? 140
5.1.2 Example: Horseshoe Crab Satellites Revisited 140
5.1.3 Stepwise Variable Selection Algorithms 141
5.1.4 Purposeful Selection of Explanatory Variables 142
5.1.5 Example: Variable Selection for Horseshoe Crabs 143
5.1.6 AIC and the Bias/Variance Tradeoff 144
5.2 MODEL CHECKING 146
5.2.1 Goodness of Fit: Model Comparison Using the Deviance 146
5.2.2 Example: Goodness of Fit for Marijuana Use Survey 147
5.2.3 Goodness of Fit: Grouped versus Ungrouped Data and Continuous Predictors 147
5.2.4 Residuals for Logistic Models with Categorical Predictors 148
5.2.5 Example: Graduate Admissions at University of Florida 148
5.2.6 Standardized versus Pearson and Deviance Residuals 150
5.2.7 Influence Diagnostics for Logistic Regression 150
5.2.8 Example: Heart Disease and Blood Pressure 151
5.3 INFINITE ESTIMATES IN LOGISTIC REGRESSION 152
5.3.1 Complete and Quasi-Complete Separation: Perfect Discrimination 152
5.3.2 Example: Infinite Estimate for Toy Example 153
5.3.3 Sparse Data and Infinite Effects with Categorical Predictors 154
5.3.4 Example: Risk Factors for Endometrial Cancer Grade 155
5.4 BAYESIAN INFERENCE, PENALIZED LIKELIHOOD, AND CONDITIONAL LIKELIHOOD FOR LOGISTIC REGRESSION 156
5.4.1 Bayesian Modeling: Specification of Prior Distributions 157
5.4.2 Example: Risk Factors for Endometrial Cancer Revisited 157
5.4.3 Penalized Likelihood Reduces Bias in Logistic Regression 159
5.4.4 Example: Risk Factors for Endometrial Cancer Revisited 160
5.4.5 Conditional Likelihood and Conditional Logistic Regression 160
5.4.6 Conditional Logistic Regression and Exact Tests for Contingency Tables 161
5.5 ALTERNATIVE LINK FUNCTIONS: LINEAR PROBABILITY AND PROBIT MODELS 161
5.5.1 Linear Probability Model 162
5.5.2 Example: Political Ideology and Belief in Evolution 162
5.5.3 Probit Model and Normal Latent Variable Model 163
5.5.4 Example: Snoring and Heart Disease Revisited 164
5.5.5 Latent Variable Models Imply Binary Regression Models 165
5.5.6 CDFs and Shapes of Curves for Binary Regression Models 165
5.6 SAMPLE SIZE AND POWER FOR LOGISTIC REGRESSION 166
5.6.1 Sample Size for Comparing Two Proportions 166
5.6.2 Sample Size in Logistic Regression Modeling 166
5.6.3 Example: Modeling the Probability of Heart Disease 167
Exercises 167
6 Multicategory Logit Models 175
6.1 BASELINE-CATEGORY LOGIT MODELS FOR NOMINAL RESPONSES 175
6.1.1 Baseline-Category Logits 176
6.1.2 Example: What Do Alligators Eat? 176
6.1.3 Estimating Response Probabilities 179
6.1.4 Checking Multinomial Model Goodness of Fit 180
6.1.5 Example: Belief in Afterlife 180
6.1.6 Discrete Choice Models 182
6.1.7 Example: Shopping Destination Choice 183
6.2 CUMULATIVE LOGIT MODELS FOR ORDINAL RESPONSES 183
6.2.1 Cumulative Logit Models with Proportional Odds 184
6.2.2 Example: Political Ideology and Political Party Affiliation 185
6.2.3 Inference about Cumulative Logit Model Parameters 187
6.2.4 Increased Power for Ordinal Analyses 188
6.2.5 Example: Happiness and Family Income 188
6.2.6 Latent Variable Linear Models Imply Cumulative Link Models 190
6.2.7 Invariance to Choice of Response Categories 191
6.3 CUMULATIVE LINK MODELS: MODEL CHECKING AND EXTENSIONS 192
6.3.1 Checking Ordinal Model Goodness of Fit 192
6.3.2 Cumulative Logit Model without Proportional Odds 192
6.3.3 Simpler Interpretations Use Probabilities 194
6.3.4 Example: Modeling Mental Impairment 194
6.3.5 A Latent Variable Probability Comparison of Groups 196
6.3.6 Cumulative Probit Model 197
6.3.7 R2 Based on the Latent Variable Model 198
6.3.8 Bayesian Inference for Multinomial Models 199
6.3.9 Example: Modeling Mental Impairment Revisited 199
6.4 PAIRED-CATEGORY LOGIT MODELING OF ORDINAL RESPONSES 200
6.4.1 Adjacent-Categories Logits 200
6.4.2 Example: Political Ideology Revisited 201
6.4.3 Sequential Logits 202
6.4.4 Example: Tonsil Size and Streptococcus 202
Exercises 203
7 Loglinear Models for Contingency Tables and Counts 209
7.1 LOGLINEAR MODELS FOR COUNTS IN CONTINGENCY TABLES 210
7.1.1 Loglinear Model of Independence for Two-Way Contingency Tables 210
7.1.2 Interpretation of Parameters in the Independence Model 210
7.1.3 Example: Happiness and Belief in Heaven 211
7.1.4 Saturated Model for Two-Way Contingency Tables 212
7.1.5 Loglinear Models for Three-Way Contingency Tables 213
7.1.6 Two-Factor Parameters Describe Conditional Associations 213
7.1.7 Example: Student Alcohol, Cigarette, and Marijuana Use 214
7.2 STATISTICAL INFERENCE FOR LOGLINEAR MODELS 216
7.2.1 Chi-Squared Goodness-of-Fit Tests 216
7.2.2 Cell Standardized Residuals for Loglinear Models 217
7.2.3 Significance Tests about Conditional Associations 217
7.2.4 Confidence Intervals for Conditional Odds Ratios 218
7.2.5 Bayesian Fitting of Loglinear Models 219
7.2.6 Loglinear Models for Higher-Dimensional Contingency Tables 219
7.2.7 Example: Automobile Accidents and Seat Belts 220
7.2.8 Interpreting Three-Factor Interaction Terms 221
7.2.9 Statistical Versus Practical Significance: Dissimilarity Index 222
7.3 THE LOGLINEAR – LOGISTIC MODEL CONNECTION 223
7.3.1 Using Logistic Models to Interpret Loglinear Models 223
7.3.2 Example: Auto Accident Data Revisited 224
7.3.3 Condition for Equivalent Loglinear and Logistic Models 225
7.3.4 Loglinear/Logistic Model Selection Issues 225
7.4 INDEPENDENCE GRAPHS AND COLLAPSIBILITY 226
7.4.1 Independence Graphs 226
7.4.2 Collapsibility Conditions for Contingency Tables 227
7.4.3 Example: Loglinear Model Building for Student Substance Use 228
7.4.4 Collapsibility and Logistic Models 229
7.5 MODELING ORDINAL ASSOCIATIONS IN CONTINGENCY TABLES 230
7.5.1 Linear-by-Linear Association Model 231
7.5.2 Example: Linear-by-Linear Association for Sex Opinions 232
7.5.3 Ordinal Significance Tests of Independence 232
7.6 LOGLINEAR MODELING OF COUNT RESPONSE VARIABLES 233
7.6.1 Count Regression Modeling of Rate Data 233
7.6.2 Example: Death Rates for Lung Cancer Patients 234
7.6.3 Negative Binomial Regression Models 236
7.6.4 Example: Female Horseshoe Crab Satellites Revisited 236
Exercises 237
8 Models for Matched Pairs 243
8.1 COMPARING DEPENDENT PROPORTIONS FOR BINARY MATCHED PAIRS 244
8.1.1 McNemar Test Comparing Marginal Proportions 244
8.1.2 Estimating the Difference between Dependent Proportions 246
8.2 MARGINAL MODELS AND SUBJECT-SPECIFIC MODELS FOR MATCHED PAIRS 246
8.2.1 Marginal Models for Marginal Proportions 246
8.2.2 Example: Environmental Opinions Revisited 247
8.2.3 Subject-Specific and Population-Averaged Tables 248
8.2.4 Conditional Logistic Regression for Matched-Pairs 249
8.2.5 Logistic Regression for Matched Case-Control Studies 250
8.3 COMPARING PROPORTIONS FOR NOMINAL MATCHED-PAIRS RESPONSES 251
8.3.1 Marginal Homogeneity for Baseline-Category Logit Models 251
8.3.2 Example: Coffee Brand Market Share 251
8.3.3 Using the Cochran–Mantel–Haenszel Test to Test Marginal Homogeneity 253
8.3.4 Symmetry and Quasi-Symmetry Models for Square Contingency Tables 253
8.3.5 Example: Coffee Brand Market Share Revisited 254
8.4 COMPARING PROPORTIONS FOR ORDINAL MATCHED-PAIRS RESPONSES 255
8.4.1 Marginal Homogeneity and Cumulative Logit Marginal Model 256
8.4.2 Example: Recycle or Drive Less to Help the Environment? 256
8.4.3 An Ordinal Quasi-Symmetry Model 257
8.4.4 Example: Recycle or Drive Less Revisited? 258
8.5 ANALYZING RATER AGREEMENT 259
8.5.1 Example: Agreement on Carcinoma Diagnosis 259
8.5.2 Cell Residuals for Independence Model 259
8.5.3 Quasi-Independence Model 260
8.5.4 Quasi Independence and Odds Ratios Summarizing Agreement 261
8.5.5 Kappa Summary Measure of Agreement 262
8.6 BRADLEY–TERRY MODEL FOR PAIRED PREFERENCES 263
8.6.1 The Bradley–Terry Model and Quasi-Symmetry 263
8.6.2 Example: Ranking Men Tennis Players 263
Exercises 265
9 Marginal Modeling of Correlated, Clustered Responses 269
9.1 MARGINAL MODELS VERSUS SUBJECT-SPECIFIC MODELS 270
9.1.1 Marginal Models for a Clustered Binary Response 270
9.1.2 Example: Repeated Responses on Similar Survey Questions 270
9.1.3 Subject-Specific Models for a Repeated Response 271
9.2 MARGINAL MODELING: THE GENERALIZED ESTIMATING EQUATIONS (GEE) APPROACH 271
9.2.1 Quasi-Likelihood Methods 271
9.2.2 Generalized Estimating Equation Methodology: Basic Ideas 272
9.2.3 Example: Opinion about Legalized Abortion Revisited 273
9.2.4 Limitations of GEE Compared to ML 275
9.3 MARGINAL MODELING FOR CLUSTERED MULTINOMIAL RESPONSES 276
9.3.1 Example: Insomnia Study 276
9.3.2 Alternative GEE Specification of Working Association 278
9.4 TRANSITIONAL MODELING, GIVEN THE PAST 279
9.4.1 Transitional Models with Explanatory Variables 279
9.4.2 Example: Respiratory Illness and Maternal Smoking 279
9.4.3 Group Comparisons Treating Initial Response as a Covariate 281
9.5 DEALING WITH MISSING DATA 282
9.5.1 Missing at Random: Impact on ML and GEE Methods 282
9.5.2 Multiple Imputation: Monte Carlo Prediction of Missing Data 283
Exercises 284
10 Random Effects: Generalized Linear Mixed Models 289
10.1 RANDOM EFFECTS MODELING OF CLUSTERED CATEGORICAL DATA 289
10.1.1 The Generalized Linear Mixed Model (GLMM) 290
10.1.2 A Logistic GLMM for Binary Matched Pairs 290
10.1.3 Example: Environmental Opinions Revisited 291
10.1.4 Differing Effects in GLMMs and Marginal Models 292
10.1.5 Model Fitting for GLMMs 293
10.1.6 Inference for Model Parameters and Prediction 294
10.2 EXAMPLES: RANDOM EFFECTS MODELS FOR BINARY DATA 294
10.2.1 Small-Area Estimation of Binomial Probabilities 294
10.2.2 Example: Estimating Basketball Free Throw Success 295
10.2.3 Example: Opinions about Legalized Abortion Revisited 297
10.2.4 Item Response Models: The Rasch Model 299
10.2.5 Choice of Marginal Model or Random Effects Model 299
10.3 EXTENSIONS TO MULTINOMIAL RESPONSES AND MULTIPLE RANDOM EFFECT TERMS 300
10.3.1 Example: Insomnia Study Revisited 300
10.3.2 Meta-Analysis: Bivariate Random Effects for Association Heterogeneity 301
10.4 MULTILEVEL (HIERARCHICAL) MODELS 304
10.4.1 Example: Two-Level Model for Student Performance 304
10.4.2 Example: Smoking Prevention and Cessation Study 305
10.5 LATENT CLASS MODELS 307
10.5.1 Independence Given a Latent Categorical Variable 307
10.5.2 Example: Latent Class Model for Rater Agreement 308
Exercises 311
11 Classification and Smoothing 315
11.1 CLASSIFICATION: LINEAR DISCRIMINANT ANALYSIS 316
11.1.1 Classification with Fisher’s Linear Discriminant Function 316
11.1.2 Example: Horseshoe Crab Satellites Revisited 317
11.1.3 Discriminant Analysis Versus Logistic Regression 318
11.2 CLASSIFICATION: TREE-BASED PREDICTION 318
11.2.1 Classification Trees 318
11.2.2 Example: A Classification Tree for Horseshoe Crab Mating 319
11.2.3 How Does the Classification Tree Grow? 320
11.2.4 Pruning a Tree and Checking Prediction Accuracy 320
11.2.5 Classification Trees Versus Logistic Regression and Discriminant Analysis 321
11.3 CLUSTER ANALYSIS FOR CATEGORICAL RESPONSES 322
11.3.1 Measuring Dissimilarity Between Observations 322
11.3.2 Hierarchical Clustering Algorithm and Dendrograms 323
11.3.3 Example: Clustering States on Presidential Elections 324
11.4 SMOOTHING: GENERALIZED ADDITIVE MODELS 326
11.4.1 Generalized Additive Models 326
11.4.2 Example: GAMs for Horseshoe Crab Data 327
11.4.3 How Much Smoothing? The Bias/Variance Tradeoff 327
11.4.4 Example: Smoothing to Portray Probability of Kyphosis 328
11.5 REGULARIZATION FOR HIGH-DIMENSIONAL CATEGORICAL DATA (LARGE p) 329
11.5.1 Penalized-Likelihood Methods and Lq-Norm Smoothing 330
11.5.2 Implementing the Lasso 331
11.5.3 Example: Predicting Opinion on Abortion with Student Survey 331
11.5.4 Why Shrink ML Estimates Toward 0? 334
11.5.5 Issues in Variable Selection (Dimension Reduction) 334
11.5.6 Controlling the False Discovery Rate 335
11.5.7 Large p also Makes Bayesian Inference Challenging 337
Exercises 337
12 A Historical Tour of Categorical Data Analysis 341
The Pearson–Yule Association Controversy 341
R.A. Fisher’s Contributions 342
Logistic Regression 343
Multiway Contingency Tables and Loglinear Models 344
Final Comments 345
Appendix: Software for Categorical Data Analysis 347
A.1 R FOR CATEGORICAL DATA ANALYSIS 347
A.2 SAS FOR CATEGORICAL DATA ANALYSIS 348
Chapters 1–2: Introduction and Contingency Tables 348
Chapters 3–5: Generalized Linear Models and Logistic Regression 350
Chapters 6–7: Multicategory Logit Models and Loglinear Models 352
Chapter 8: Matched Pairs 354
Chapters 9–10: Marginal Models and Random Effects Models (GLMMs) 357
Chapter 11: Non-Model-Based Classification and Clustering 358
A.3 STATA FOR CATEGORICAL DATA ANALYSIS 358
Chapters 1–2: Introduction and Contingency Tables 359
Chapters 3–5: Generalized Linear Models and Logistic Regression 360
Chapters 6–7: Multicategory Logit Models and Loglinear Models 361
Chapters 8–11: Correlated Observations, Advanced Methods 362
A.4 SPSS FOR CATEGORICAL DATA ANALYSIS 362
Chapters 1–2: Introduction and Contingency Tables 363
Chapters 3–5: Generalized Linear Models and Logistic Regression 363
Chapters 6–7: Multicategory Logit Models and Loglinear Models 364
Chapters 8–11: Correlated Observations, Advanced Methods 364
Brief Solutions to Odd-Numbered Exercises 365
Chapter 1 365
Chapter 2 366
Chapter 3 367
Chapter 4 368
Chapter 5 370
Chapter 6 371
Chapter 7 373
Chapter 8 374
Chapter 9 375
Chapter 10 376
Chapter 11 377
Bibliography 379
Examples Index 381
Subject Index 385
EULA 392

Erscheint lt. Verlag 11.10.2018
Reihe/Serie Wiley Series in Probability and Statistics
Wiley Series in Probability and Statistics
Wiley Series in Probability and Statistics
Sprache englisch
Themenwelt Informatik Datenbanken Data Warehouse / Data Mining
Mathematik / Informatik Mathematik Analysis
Mathematik / Informatik Mathematik Statistik
Mathematik / Informatik Mathematik Wahrscheinlichkeit / Kombinatorik
Naturwissenschaften
Schlagworte Bayesian inference • Binary Regression Models • Bradley-Terry Model • categorical data • categorical data analysis • Categorical Response Data • Chi-Squared Tests of Independence • Cluster Analysis for Categorical Responses • Conditional Likelihood • contingency tables • Count Response Variables • Data Analysis • Datenanalyse • discrete data • Exact Frequentist • generalized additive models • generalized linear mixed models • Generalized Linear Models • Kategorielle Datenanalyse • Latent class models • linear discriminant analysis • Linear Probability • Link Models • Logistic Regression • Logit Models • Loglinear Models • marginal models • matched pairs • Model Checking • Model Selection • Multilevel Models • Multinomial Responses • Nominal Matched-Pairs Responses • Odds Ratio • ordinal variables • Penalized likelihood • probability distributions • Probability Structure • Probit Models • Random Effects Modeling • Rater Agreement • ROC CURVES • R software • Statistical • Statistical Inference • Statistical Software / R • Statistics • Statistik • Statistiksoftware / R • Tree-Based Prediction
ISBN-10 1-119-40527-0 / 1119405270
ISBN-13 978-1-119-40527-6 / 9781119405276
Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?
PDFPDF (Adobe DRM)

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seiten­layout eignet sich die PDF besonders für Fach­bücher mit Spalten, Tabellen und Abbild­ungen. Eine PDF kann auf fast allen Geräten ange­zeigt werden, ist aber für kleine Displays (Smart­phone, eReader) nur einge­schränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich
Discover advanced techniques and best practices for efficient search …

von Prashant Agrawal; Jon Handler; Soujanya Konka

eBook Download (2025)
Packt Publishing (Verlag)
CHF 29,30
The definitive guide to creating production-ready Python applications …

von Eric Narro

eBook Download (2025)
Packt Publishing (Verlag)
CHF 29,30