Zum Hauptinhalt springen
Nicht aus der Schweiz? Besuchen Sie lehmanns.de

Categorical Data Analysis (eBook)

(Autor)

eBook Download: PDF
2014 | 3. Auflage
John Wiley & Sons (Verlag)
978-1-118-71085-2 (ISBN)

Lese- und Medienproben

Categorical Data Analysis - Alan Agresti
Systemvoraussetzungen
136,99 inkl. MwSt
(CHF 133,80)
Der eBook-Verkauf erfolgt durch die Lehmanns Media GmbH (Berlin) zum Preis in Euro inkl. MwSt.
  • Download sofort lieferbar
  • Zahlungsarten anzeigen

Praise for the Second Edition

'A must-have book for anyone expecting to do research and/or applications in categorical data analysis.'
-Statistics in Medicine

'It is a total delight reading this book.'
-Pharmaceutical Research

'If you do any analysis of categorical data, this is an essential desktop reference.'
-Technometrics

The use of statistical methods for analyzing categorical data has increased dramatically, particularly in the biomedical, social sciences, and financial industries. Responding to new developments, this book offers a comprehensive treatment of the most important methods for categorical data analysis.

Categorical Data Analysis, Third Edition summarizes the latest methods for univariate and correlated multivariate categorical responses. Readers will find a unified generalized linear models approach that connects logistic regression and Poisson and negative binomial loglinear models for discrete data with normal regression for continuous data. This edition also features:

  • An emphasis on logistic and probit regression methods for binary, ordinal, and nominal responses for independent observations and for clustered data with marginal models and random effects models
  • Two new chapters on alternative methods for binary response data, including smoothing and regularization methods, classification methods such as linear discriminant analysis and classification trees, and cluster analysis
  • New sections introducing the Bayesian approach for methods in that chapter
  • More than 100 analyses of data sets and over 600 exercises
  • Notes at the end of each chapter that provide references to recent research and topics not covered in the text, linked to a bibliography of more than 1,200 sources
  • A supplementary website showing how to use R and SAS; for all examples in the text, with information also about SPSS and Stata and with exercise solutions

Categorical Data Analysis, Third Edition is an invaluable tool for statisticians and methodologists, such as biostatisticians and researchers in the social and behavioral sciences, medicine and public health, marketing, education, finance, biological and agricultural sciences, and industrial quality control.



ALAN AGRESTI is Distinguished Professor Emeritus in the Department of Statistics at the University of Florida. He has presented short courses on categorical data methods in thirty countries. He is the author of five other books, including An Introduction to Categorical Data Analysis, Second Edition and Analysis of Ordinal Categorical Data, Second Edition, both published by Wiley.


Praise for the Second Edition "e;A must-have book for anyone expecting to do research and/or applications in categorical data analysis."e; Statistics in Medicine "e;It is a total delight reading this book."e; Pharmaceutical Research "e;If you do any analysis of categorical data, this is an essential desktop reference."e; Technometrics The use of statistical methods for analyzing categorical data has increased dramatically, particularly in the biomedical, social sciences, and financial industries. Responding to new developments, this book offers a comprehensive treatment of the most important methods for categorical data analysis. Categorical Data Analysis, Third Edition summarizes the latest methods for univariate and correlated multivariate categorical responses. Readers will find a unified generalized linear models approach that connects logistic regression and Poisson and negative binomial loglinear models for discrete data with normal regression for continuous data. This edition also features: An emphasis on logistic and probit regression methods for binary, ordinal, and nominal responses for independent observations and for clustered data with marginal models and random effects models Two new chapters on alternative methods for binary response data, including smoothing and regularization methods, classification methods such as linear discriminant analysis and classification trees, and cluster analysis New sections introducing the Bayesian approach for methods in that chapter More than 100 analyses of data sets and over 600 exercises Notes at the end of each chapter that provide references to recent research and topics not covered in the text, linked to a bibliography of more than 1,200 sources A supplementary website showing how to use R and SAS; for all examples in the text, with information also about SPSS and Stata and with exercise solutions Categorical Data Analysis, Third Edition is an invaluable tool for statisticians and methodologists, such as biostatisticians and researchers in the social and behavioral sciences, medicine and public health, marketing, education, finance, biological and agricultural sciences, and industrial quality control.

ALAN AGRESTI is Distinguished Professor Emeritus in the Department of Statistics at the University of Florida. He has presented short courses on categorical data methods in thirty countries. He is the author of five other books, including An Introduction to Categorical Data Analysis, Second Edition and Analysis of Ordinal Categorical Data, Second Edition, both published by Wiley.

Cover 1
Title Page 5
Copyright Page 6
Contents 9
Preface 15
1 Introduction: Distributions and Inference for Categorical Data 19
1.1 Categorical Response Data 19
1.1.1 Response–Explanatory Variable Distinction 20
1.1.2 Binary–Nominal–Ordinal Scale Distinction 20
1.1.3 Discrete–Continuous Variable Distinction 21
1.1.4 Quantitative–Qualitative Variable Distinction 21
1.1.5 Organization of Book and Online Computing Appendix 22
1.2 Distributions for Categorical Data 23
1.2.1 Binomial Distribution 23
1.2.2 Multinomial Distribution 24
1.2.3 Poisson Distribution 24
1.2.4 Overdispersion 25
1.2.5 Connection Between Poisson and Multinomial Distributions 25
1.2.6 The Chi-Squared Distribution 26
1.3 Statistical Inference for Categorical Data 26
1.3.1 Likelihood Functions and Maximum Likelihood Estimation 27
1.3.2 Likelihood Function and ML Estimate for Binomial Parameter 27
1.3.3 Wald–Likelihood Ratio–Score Test Triad 28
1.3.4 Constructing Confidence Intervals by Inverting Tests 30
1.4 Statistical Inference for Binomial Parameters 31
1.4.1 Tests About a Binomial Parameter 31
1.4.2 Confidence Intervals for a Binomial Parameter 32
1.4.3 Example: Estimating the Proportion of Vegetarians 33
1.4.4 Exact Small-Sample Inference and the Mid P- Value 34
1.5 Statistical Inference for Multinomial Parameters 35
1.5.1 Estimation of Multinomial Parameters 35
1.5.2 Pearson Chi-Squared Test of a Specified Multinomial 36
1.5.3 Likelihood-Ratio Chi-Squared Test of a Specified Multinomial 36
1.5.4 Example: Testing Mendel's Theories 37
1.5.5 Testing with Estimated Expected Frequencies 38
1.5.6 Example: Pneumonia Infections in Calves 38
1.5.7 Chi-Squared Theoretical Justification 40
1.6 Bayesian Inference for Binomial and Multinomial Parameters 40
1.6.1 The Bayesian Approach to Statistical Inference 40
1.6.2 Binomial Estimation: Beta and Logit-Normal Prior Distributions 42
1.6.3 Multinomial Estimation: Dirichlet Prior Distributions 43
1.6.4 Example: Estimating Vegetarianism Revisited 44
1.6.5 Binomial and Multinomial Estimation: Improper Priors 44
Notes 45
Exercises 46
2 Describing Contingency Tables 55
2.1 Probability Structure for Contingency Tables 55
2.1.1 Contingency Tables 55
2.1.2 Joint/Marginal/Conditional Distributions for Contingency Tables 56
2.1.3 Example: Sensitivity and Specificity for Medical Diagnoses 57
2.1.4 Independence of Categorical Variables 58
2.1.5 Poisson, Binomial, and Multinomial Sampling 58
2.1.6 Example: Seat Belts and Auto Accident Injuries 59
2.1.7 Example: Case–Control Study of Cancer and Smoking 60
2.1.8 Types of Studies: Observational Versus Experimental 61
2.2 Comparing Two Proportions 61
2.2.1 Difference of Proportions 62
2.2.2 Relative Risk 62
2.2.3 Odds Ratio 62
2.2.4 Properties of the Odds Ratio 63
2.2.5 Example: Association Between Heart Attacks and Aspirin Use 64
2.2.6 Case–Control Studies and the Odds Ratio 64
2.2.7 Relationship Between Odds Ratio and Relative Risk 65
2.3 Conditional Association in Stratified 2 × 2 Tables 65
2.3.1 Partial Tables 66
2.3.2 Example: Racial Characteristics and the Death Penalty 66
2.3.3 Conditional and Marginal Odds Ratios 68
2.3.4 Marginal Independence Versus Conditional Independence 69
2.3.5 Homogeneous Association 71
2.3.6 Collapsibility: Identical Conditional and Marginal Associations 71
2.4 Measuring Association in I × J Tables 72
2.4.1 Odds Ratios in I x J Tables 72
2.4.2 Association Factors 73
2.4.3 Summary Measures of Association 74
2.4.4 Ordinal Trends: Concordant and Discordant Pairs 74
2.4.5 Ordinal Measure of Association: Gamma 75
2.4.6 Probabilistic Comparisons of Two Ordinal Distributions 76
2.4.7 Example: Comparing Pain Ratings After Surgery 77
2.4.8 Correlation for Underlying Normality 77
Exercises 78
Notes 78
3 Inference for Two-Way Contingency Tables 87
3.1 Confidence Intervals for Association Parameters 87
3.1.1 Interval Estimation of the Odds Ratio 87
3.1.2 Example: Seat-Belt Use and Traffic Deaths 88
3.1.3 Interval Estimation of Difference of Proportions and Relative Risk 89
3.1.4 Example: Aspirin and Heart Attacks Revisited 89
3.1.5 Deriving Standard Errors with the Delta Method 90
3.1.6 Delta Method Applied to the Sample Logit 91
3.1.7 Delta Method for the Log Odds Ratio 91
3.1.8 Simultaneous Confidence Intervals for Multiple Comparisons 93
3.2 Testing Independence in Two-way Contingency Tables 93
3.2.1 Pearson and Likelihood-Ratio Chi-Squared Tests 93
3.2.2 Example: Education and Belief in God 95
3.2.3 Adequacy of Chi-Squared Approximations 95
3.2.4 Chi-Squared and Comparing Proportions in 2 x 2 Tables 96
3.2.5 Score Confidence Intervals Comparing Proportions 96
3.2.6 Profile Likelihood Confidence Intervals 97
3.3 Following-up Chi-Squared Tests 98
3.3.1 Pearson Residuals and Standardized Residuals 98
3.3.2 Example: Education and Belief in God Revisited 99
3.3.3 Partitioning Chi-Squared 99
3.3.4 Example: Origin of Schizophrenia 101
3.3.5 Rules for Partitioning 102
3.3.6 Summarizing the Association 102
3.3.7 Limitations of Chi-Squared Tests 102
3.3.8 Why Consider Independence If It's Unlikely to Be True? 103
3.4 Two-Way Tables with Ordered Classifications 104
3.4.1 Linear Trend Alternative to Independence 104
3.4.2 Example: Is Happiness Associated with Political Ideology? 105
3.4.3 Monotone Trend Alternatives to Independence 105
3.4.4 Extra Power with Ordinal Tests 106
3.4.5 Sensitivity to Choice of Scores 106
3.4.6 Example: Infant Birth Defects by Maternal Alcohol Consumption 107
3.4.7 Trend Tests for I x 2 and 2 x J Tables 108
3.4.8 Nominal-Ordinal Tables 108
3.5 Small-Sample Inference for Contingency Tables 108
3.5.1 Fisher's Exact Test for 2 x 2 Tables 108
3.5.2 Example: Fisher's Tea Drinker 109
3.5.3 Two-Sided P-Values for Fisher's Exact Test 110
3.5.4 Confidence Intervals Based on Conditional Likelihood 110
3.5.5 Discreteness and Conservatism Issues 111
3.5.6 Small-Sample Unconditional Tests of Independence 111
3.5.7 Conditional Versus Unconditional Tests 112
3.6 Bayesian Inference for Two-way Contingency Tables 114
3.6.1 Prior Distributions for Comparing Proportions in 2 x 2 Tables 114
3.6.2 Posterior Probabilities Comparing Proportions 115
3.6.3 Posterior Intervals for Association Parameters 115
3.6.4 Example: Urn Sampling Gives Highly Unbalanced Treatment Allocation 116
3.6.5 Highest Posterior Density Intervals 116
3.6.6 Testing Independence 117
3.6.7 Empirical Bayes and Hierarchical Bayesian Approaches 118
3.7 Extensions for Multiway Tables and Nontabulated Responses 118
3.7.1 Categorical Data Need Not Be Contingency Tables 118
Notes 119
Exercises 121
4 Introduction to Generalized Linear Models 131
4.1 The Generalized Linear Model 131
4.1.1 Components of Generalized Linear Models 132
4.1.2 Binomial Logit Models for Binary Data 132
4.1.3 Poisson Loglinear Models for Count Data 133
4.1.4 Generalized Linear Models for Continuous Responses 133
4.1.5 Deviance of a GLM 133
4.1.6 Advantages of GLMs Versus Transforming the Data 134
4.2 Generalized Linear Models for Binary Data 135
4.2.1 Linear Probability Model 135
4.2.2 Example: Snoring and Heart Disease 136
4.2.3 Logistic Regression Model 137
4.2.4 Binomial GLM for 2 x 2 Contingency Tables 138
4.2.5 Probit and Inverse cdf Link Functions 139
4.2.6 Latent Tolerance Motivation for Binary Response Models 140
4.3 Generalized Linear Models for Counts and Rates 140
4.3.1 Poisson Loglinear Models 141
4.3.2 Example: Horseshoe Crab Mating 141
4.3.3 Overdispersion for Poisson GLMs 144
4.3.4 Negative Binomial GLMs 145
4.3.5 Poisson Regression for Rates Using Offsets 146
4.3.6 Example: Modeling Death Rates for Heart Valve Operations 146
4.3.7 Poisson GLM of Independence in Two-Way Contingency Tables 148
4.4 Moments and Likelihood for Generalized Linear Models 148
4.4.1 The Exponential Dispersion Family 148
4.4.2 Mean and Variance Functions for the Random Component 149
4.4.3 Mean and Variance Functions for Poisson and Binomial GLMs 150
4.4.4 Systematic Component and Link Function of a GLM 150
4.4.5 Likelihood Equations for a GLM 151
4.4.6 The Key Role of the Mean–Variance Relationship 152
4.4.7 Likelihood Equations for Binomial GLMs 152
4.4.8 Asymptotic Covariance Matrix of Model Parameter Estimators 153
4.4.9 Likelihood Equations and cov(?) for Poisson Loglinear Model 154
4.5 Inference and Model Checking for Generalized Linear Models 154
4.5.1 Deviance and Goodness of Fit 154
4.5.2 Deviance for Poisson GLMs 155
4.5.3 Deviance for Binomial GLMs: Grouped Versus Ungrouped Data 155
4.5.4 Likelihood-Ratio Model Comparison Using the Deviances 156
4.5.5 Score Tests for Goodness of Fit and for Model Comparison 157
4.5.6 Residuals for GLMs 158
4.5.7 Covariance Matrices for Fitted Values and Residuals 160
4.5.8 The Bayesian Approach for GLMs 160
4.6 Fitting Generalized Linear Models 161
4.6.1 Newton–Raphson Method 161
4.6.2 Fisher Scoring Method 162
4.6.3 Newton–Raphson and Fisher Scoring for Binary Data 163
4.6.4 ML as Iterative Reweighted Least Squares 164
4.6.5 Simplifications for Canonical Link Functions 165
4.7 Quasi-Likelihood and Generalized Linear Models 167
4.7.1 Mean–Variance Relationship Determines Quasi-likelihood Estimates 167
4.7.2 Overdispersion for Poisson GLMs and Quasi-likelihood 167
4.7.3 Overdispersion for Binomial GLMs and Quasi-likelihood 168
4.7.4 Example: Teratology Overdispersion 169
Notes 170
Exercises 171
5 Logistic Regression 181
5.1 Interpreting Parameters in Logistic Regression 181
5.1.1 Interpreting ?: Odds, Probabilities, and Linear Approximations 182
5.1.2 Looking at the Data 183
5.1.3 Example: Horseshoe Crab Mating Revisited 184
5.1.4 Logistic Regression with Retrospective Studies 186
5.1.5 Logistic Regression Is Implied by Normal Explanatory Variables 187
5.2 Inference for Logistic Regression 187
5.2.1 Inference About Model Parameters and Probabilities 187
5.2.2 Example: Inference for Horseshoe Crab Mating Data 188
5.2.3 Checking Goodness of Fit: Grouped and Ungrouped Data 189
5.2.4 Example: Model Goodness of Fit for Horseshoe Crab Data 190
5.2.5 Checking Goodness of Fit with Ungrouped Data by Grouping 190
5.2.6 Wald Inference Can Be Suboptimal 192
5.3 Logistic Models with Categorical Predictors 193
5.3.1 ANOVA-Type Representation of Factors 193
5.3.2 Indicator Variables Represent a Factor 193
5.3.3 Example: Alcohol and Infant Malformation Revisited 194
5.3.4 Linear Logit Model for I × 2 Contingency Tables 195
5.3.5 Cochran–Armitage Trend Test 196
5.3.6 Example: Alcohol and Infant Malformation Revisited 197
5.3.7 Using Directed Models Can Improve Inferential Power 197
5.3.8 Noncentral Chi-Squared Distribution and Power for Narrower Alternatives 198
5.3.9 Example: Skin Damage and Leprosy 199
5.3.10 Model Smoothing Improves Precision of Estimation 200
5.4 Multiple Logistic Regression 200
5.4.1 Logistic Models for Multiway Contingency Tables 201
5.4.2 Example: AIDS and AZT Use 202
5.4.3 Goodness of Fit as a Likelihood-Ratio Test 204
5.4.4 Model Comparison by Comparing Deviances 205
5.4.5 Example: Horseshoe Crab Satellites Revisited 205
5.4.6 Quantitative Treatment of Ordinal Predictor 207
5.4.7 Probability-Based and Standardized Interpretations 208
5.4.8 Estimating an Average Causal Effect 209
5.5 Fitting Logistic Regression Models 210
5.5.1 Likelihood Equations for Logistic Regression 210
5.5.2 Asymptotic Covariance Matrix of Parameter Estimators 211
5.5.3 Distribution of Probability Estimators 212
5.5.4 Newton–Raphson Method Applied to Logistic Regression 212
Notes 213
Exercises 214
6 Building, Checking, and Applying Logistic Regression Models 225
6.1 Strategies in Model Selection 225
6.1.1 How Many Explanatory Variables Can Be in the Model? 226
6.1.2 Example: Horseshoe Crab Mating Data Revisited 226
6.1.3 Stepwise Procedures: Forward Selection and Backward Elimination 227
6.1.4 Example: Backward Elimination for Horseshoe Crab Data 228
6.1.5 Model Selection and the "Correct" Model 229
6.1.6 AIC: Minimizing Distance of the Fit from the Truth 230
6.1.7 Example: Using Causal Hypotheses to Guide Model Building 231
6.1.8 Alternative Strategies, Including Model Averaging 233
6.2 Logistic Regression Diagnostics 233
6.2.1 Residuals: Pearson, Deviance, and Standardized 233
6.2.2 Example: Heart Disease and Blood Pressure 234
6.2.3 Example: Admissions to Graduate School at Florida 236
6.2.4 Influence Diagnostics for Logistic Regression 238
6.3 Summarizing the Predictive Power of a Model 239
6.3.1 Summarizing Predictive Power: R and R-Squared Measures 239
6.3.2 Summarizing Predictive Power: Likelihood and Deviance Measures 240
6.3.3 Summarizing Predictive Power: Classification Tables 241
6.3.4 Summarizing Predictive Power: ROC Curves 242
6.3.5 Example: Evaluating Predictive Power for Horseshoe Crab Data 242
6.4 Mantel–Haenszel and Related Methods for Multiple 2 × 2 Tables 243
6.4.1 Using Logistic Models to Test Conditional Independence 244
6.4.2 Cochran–Mantel–Haenszel Test of Conditional Independence 245
6.4.3 Example: Multicenter Clinical Trial Revisited 246
6.4.4 CMH Test Is Advantageous for Sparse Data 246
6.4.5 Estimation of Common Odds Ratio 247
6.4.6 Meta-analyses for Summarizing Multiple 2 x 2 Tables 248
6.4.7 Meta-analyses for Multiple 2 x 2 Tables: Difference of Proportions 249
6.4.8 Collapsibility and Logistic Models for Contingency Tables 250
6.4.9 Testing Homogeneity of Odds Ratios 250
6.4.10 Summarizing Heterogeneity in Odds Ratios 251
6.4.11 Propensity Scores in Observational Studies 251
6.5 Detecting and Dealing with Infinite Estimates 251
6.5.1 Complete or Quasi-complete Separation 252
6.5.2 Example: Multicenter Clinical Trial with Few Successes 253
6.5.3 Remedies When at Least One ML Estimate Is Infinite 254
6.6 Sample Size and Power Considerations 255
6.6.1 Sample Size and Power for Comparing Two Proportions 255
6.6.2 Sample Size Determination in Logistic Regression 256
6.6.3 Sample Size in Multiple Logistic Regression 257
6.6.4 Power for Chi–Squared Tests in Contingency Tables 257
6.6.5 Power for Testing Conditional Independence 258
6.6.6 Effects of Sample Size on Model Selection and Inference 259
Notes 259
Exercises 261
7 Alternative Modeling of Binary Response Data 269
7.1 Probit and Complementary Log-log Models 269
7.1.1 Probit Models: Three Latent Variable Motivations 270
7.1.2 Probit Models: Interpreting Effects 270
7.1.3 Probit Model Fitting 271
7.1.4 Example: Modeling Flour Beetle Mortality 272
7.1.5 Complementary Log–Log Link Models 273
7.1.6 Example: Beetle Mortality Revisited 275
7.2 Bayesian Inference for Binary Regression 275
7.2.1 Prior Specifications for Binary Regression Models 275
7.2.2 Example: Risk Factors for Endometrial Cancer Grade 276
7.2.3 Bayesian Logistic Regression for Retrospective Studies 278
7.2.4 Probability–Based Prior Specifications for Binary Regression Models 278
7.2.5 Example: Modeling the Probability a Trauma Patient Survives 279
7.2.6 Bayesian Fitting for Probit Models 281
7.2.7 Bayesian Model Checking for Binary Regression 283
7.3 Conditional Logistic Regression 283
7.3.1 Conditional Likelihood 283
7.3.2 Small-Sample Inference for a Logistic Regression Parameter 285
7.3.3 Small-Sample Conditional Inference for 2 x 2 Contingency Tables 285
7.3.4 Small-Sample Conditional Inference for Linear Logit Model 286
7.3.5 Small-Sample Tests of Conditional Independence in 2 x 2 x K Tables 287
7.3.6 Example: Promotion Discrimination 287
7.3.7 Discreteness Complications of Using Exact Conditional Inference 288
7.4 Smoothing: Kernels, Penalized Likelihood, Generalized Additive Models 288
7.4.1 How Much Smoothing? The Variance/Bias Trade-off 288
7.4.2 Kernel Smoothing 289
7.4.3 Example: Smoothing to Portray Probability of Kyphosis 290
7.4.4 Nearest Neighbors Smoothing 290
7.4.5 Smoothing Using Penalized Likelihood Estimation 291
7.4.6 Why Shrink Estimates Toward 0? 293
7.4.7 Firth's Penalized Likelihood for Logistic Regression 293
7.4.8 Example: Complete Separation but Finite Logistic Estimates 293
7.4.9 Generalized Additive Models 294
7.4.10 Example: GAMs for Horseshoe Crab Mating Data 295
7.4.11 Advantages/Disadvantages of Various Smoothing Methods 295
7.5 Issues in Analyzing High–Dimensional Categorical Data 296
7.5.1 Issues in Selecting Explanatory Variables 296
7.5.2 Adjusting for Multiplicity: The Bonferroni Method 297
7.5.3 Adjusting for Multiplicity: The False Discovery Rate 298
7.5.4 Other Variable Selection Methods with High–Dimensional Data 299
7.5.5 Examples: High–Dimensional Applications in Genomics 300
7.5.6 Example: Motif Discovery for Protein Sequences 301
7.5.7 Example: The Netflix Prize 302
7.5.8 Example: Credit Scoring 303
Notes 303
Exercises 305
8 Models for Multinomial Responses 311
8.1 Nominal Responses: Baseline–Category Logit Models 311
8.1.1 Baseline–Category Logits 311
8.1.2 Example: Alligator Food Choice 312
8.1.3 Estimating Response Probabilities 314
8.1.4 Fitting Baseline–Category Logistic Models 315
8.1.5 Multicategory Logit Model as a Multivariate GLM 317
8.1.6 Multinomial Probit Models 317
8.1.7 Example: Effect of Menu Pricing 318
8.2 Ordinal Responses: Cumulative Logit Models 319
8.2.1 Cumulative Logits 319
8.2.2 Proportional Odds Form of Cumulative Logit Model 319
8.2.3 Latent Variable Motivation for Proportional Odds Structure 321
8.2.4 Example: Happiness and Traumatic Events 322
8.2.5 Checking the Proportional Odds Assumption 324
8.3 Ordinal Responses: Alternative Models 326
8.3.1 Cumulative Link Models 326
8.3.2 Cumulative Probit and Log-Log Models 326
8.3.3 Example: Happiness Revisited with Cumulative Probits 327
8.3.4 Adjacent–Categories Logit Models 327
8.3.5 Example: Happiness Revisited 328
8.3.6 Continuation–Ratio Logit Models 329
8.3.7 Example: Developmental Toxicity Study with Pregnant Mice 330
8.3.8 Stochastic Ordering Location Effects Versus Dispersion Effects 331
8.3.9 Summarizing Predictive Power of Explanatory Variables 332
8.4 Testing Conditional Independence in I × J × K Tables 332
8.4.1 Testing Conditional Independence Using Multinomial Models 333
8.4.2 Example: Homosexual Marriage and Religious Fundamentalism 334
8.4.3 Generalized Cochran-Mantel–Haenszel Tests for I x J x K Tables 335
8.4.4 Example: Homosexual Marriage Revisited 337
8.4.5 Related Score Tests for Multinomial Logit Models 337
8.5 Discrete-Choice Models 338
8.5.1 Conditional Logits for Characteristics of the Choices 338
8.5.2 Multinomial Logit Model Expressed as Discrete-Choice Model 339
8.5.3 Example: Shopping Destination Choice 339
8.5.4 Multinomial Probit Discrete–Choice Models 339
8.5.5 Extensions: Nested Logit and Mixed Logit Models 340
8.5.6 Extensions: Discrete Choice with Ordered Categories 340
8.6 Bayesian Modeling of Multinomial Responses 341
8.6.1 Bayesian Fitting of Cumulative Link Models 341
8.6.2 Example: Cannabis Use and Mother's Age 342
8.6.3 Bayesian Fitting of Multinomial Logit and Probit Models 343
8.6.4 Example: Alligator Food Choice Revisited 344
Notes 344
Exercises 347
9 Loglinear Models for Contingency Tables 357
9.1 Loglinear Models for Two-way Tables 357
9.1.1 Independence Model for a Two-Way Table 357
9.1.2 Interpretation of Loglinear Model Parameters 358
9.1.3 Saturated Model for a Two-Way Table 358
9.1.4 Alternative Parameter Constraints 359
9.1.5 Hierarchical Versus Nonhierarchical Models 359
9.1.6 Multinomial Models for Cell Probabilities 360
9.2 Loglinear Models for Independence and Interaction in Three-way Tables 360
9.2.1 Types of Independence 360
9.2.2 Homogeneous Association and Three-Factor Interaction 362
9.2.3 Interpretation of Loglinear Model Parameters 363
9.2.4 Example: Alcohol, Cigarette, and Marijuana Use 364
9.3 Inference for Loglinear Models 366
9.3.1 Chi-Squared Goodness-of-Fit Tests 366
9.3.2 Inference about Conditional Associations 366
9.4 Loglinear Models for Higher Dimensions 368
9.4.1 Models for Four–Way Contingency Tables 368
9.4.2 Example: Automobile Accidents and Seat-Belt Use 368
9.4.3 Large Samples and Statistical Versus Practical Significance 370
9.4.4 Dissimilarity Index 370
9.5 Loglinear—Logistic Model Connection 371
9.5.1 Using Logistic Models to Interpret Loglinear Models 371
9.5.2 Example: Auto Accidents and Seat-Belts Revisited 372
9.5.3 Equivalent Loglinear and Logistic Models 372
9.5.4 Example: Detecting Gene–Environment Interactions in Case–Control Studies 373
9.6 Loglinear Model Fitting: Likelihood Equations and Asymptotic Distributions 374
9.6.1 Minimal Sufficient Statistics 374
9.6.2 Likelihood Equations for Loglinear Models 375
9.6.3 Unique ML Estimates Match Data in Sufficient Marginal Tables 376
9.6.4 Direct Versus Iterative Calculation of Fitted Values 376
9.6.5 Decomposable Models 377
9.6.6 Chi-Squared Goodness-of-Fit Tests 377
9.6.7 Covariance Matrix of ML Parameter Estimators 378
9.6.8 Connection Between Multinomial and Poisson Loglinear Models 379
9.6.9 Distribution of Probability Estimators 380
9.6.10 Proof of Uniqueness of ML Estimates 381
9.6.11 Pseudo ML for Complex Sampling Designs 381
9.7 Loglinear Model Fitting: Iterative Methods and Their Application 382
9.7.1 Newton-Raphson Method 382
9.7.2 Iterative Proportional Fitting 383
9.7.3 Comparison of IPF and Newton–Raphson Iterative Methods 384
9.7.4 Raking a Table: Contingency Table Standardization 385
Notes 386
Exercises 387
10 Building and Extending Loglinear Models 395
10.1 Conditional Independence Graphs and Collapsibility 395
10.1.1 Conditional Independence Graphs 395
10.1.2 Graphical Loglinear Models 396
10.1.3 Collapsibility in Three–Way Contingency Tables 397
10.1.4 Collapsibility for Multiway Tables 398
10.2 Model Selection and Comparison 398
10.2.1 Considerations in Model Selection 398
10.2.2 Example: Model Building for Student Survey 399
10.2.3 Loglinear Model Comparison Statistics 401
10.2.4 Partitioning Chi-Squared with Model Comparisons 402
10.2.5 Identical Marginal and Conditional Tests of Independence 402
10.3 Residuals for Detecting Cell-Specific Lack of Fit 403
10.3.1 Residuals for Loglinear Models 403
10.3.2 Example: Student Survey Revisited 403
10.3.3 Identical Loglinear and Logistic Standardized Residuals 404
10.4 Modeling Ordinal Associations 404
10.4.1 Linear-by-Linear Association Model for Two-Way Tables 405
10.4.2 Corresponding Logistic Model for Adjacent Responses 406
10.4.3 Likelihood Equations and Model Fitting 407
10.4.4 Example: Sex and Birth Control Opinions Revisited 407
10.4.5 Directed Ordinal Test of Independence 409
10.4.6 Row Effects and Column Effects Association Models 409
10.4.7 Example: Estimating Category Scores for Premarital Sex 410
10.4.8 Ordinal Variables in Models for Multiway Tables 410
10.5 Generalized Loglinear and Association Models, Correlation Models, and Correspondence Analysis 411
10.5.1 Generalized Loglinear Model 411
10.5.2 Multiplicative Row and Column Effects Model 412
10.5.3 Example: Mental Health and Parents' SES 413
10.5.4 Correlation Models 413
10.5.5 Correspondence Analysis 414
10.5.6 Model Selection and Score Choice for Ordinal Variables 416
10.6 Empty Cells and Sparseness in Modeling Contingency Tables 416
10.6.1 Empty Cells: Sampling Versus Structural Zeros 416
10.6.2 Existence of Estimates in Loglinear Models 417
10.6.3 Effects of Sparseness on X2, G2, and Model-Based Tests 418
10.6.4 Alternative Sparse Data Asymptotics 419
10.6.5 Adding Constants to Cells of a Contingency Table 419
10.7 Bayesian Loglinear Modeling 419
10.7.1 Estimating Loglinear Model Parameters in Two-Way Tables 420
10.7.2 Example: Polarized Opinions by Political Party 420
10.7.3 Bayesian Loglinear Modeling of Multidimensional Tables 421
10.7.4 Graphical Conditional Independence Models 422
Notes 422
Exercises 425
11 Models for Matched Pairs 431
11.1 Comparing Dependent Proportions 432
11.1.1 Confidence Intervals Comparing Dependent Proportions 432
11.1.2 McNemar Test Comparing Dependent Proportions 433
11.1.3 Example: Changes in Presidential Election Voting 433
11.1.4 Increased Precision with Dependent Samples 434
11.1.5 Small-Sample Test Comparing Dependent Proportions 434
11.1.6 Connection Between McNemar and Cochran-Mantel–Haenszel Tests 435
11.1.7 Subject-Specific and Population–Averaged (Marginal) Tables 436
11.2 Conditional Logistic Regression for Binary Matched Pairs 436
11.2.1 Subject–Specific Versus Marginal Models for Matched Pairs 436
11.2.2 Logistic Models with Subject-Specific Probabilities 437
11.2.3 Conditional ML Inference for Binary Matched Pairs 438
11.2.4 Random Effects in Binary Matched-Pairs Model 439
11.2.5 Conditional Logistic Regression for Matched Case–Control Studies 439
11.2.6 Conditional Logistic Regression for Matched Pairs with Multiple Predictors 440
11.2.7 Marginal Models and Subject-Specific Models: Extensions 441
11.3 Marginal Models for Square Contingency Tables 442
11.3.1 Marginal Models for Nominal Classifications 442
11.3.2 Example: Regional Migration 443
11.3.3 Marginal Models for Ordinal Classifications 443
11.3.4 Example: Opinions on Premarital and Extramarital Sex 444
11.4 Symmetry, Quasi-Symmetry, and Quasi-Independence 444
11.4.1 Symmetry as Logistic and Loglinear Models 445
11.4.2 Quasi-symmetry 445
11.4.3 Marginal Homogeneity and Quasi-symmetry 447
11.4.4 Quasi–independence 447
11.4.5 Example: Migration Revisited 448
11.4.6 Ordinal Quasi-symmetry 449
11.4.7 Example: Premarital and Extramarital Sex Revisited 450
11.5 Measuring Agreement Between Observers 450
11.5.1 Agreement: Departures from Independence 451
11.5.2 Using Quasi–independence to Analyze Agreement 451
11.5.3 Quasi-symmetry and Agreement Modeling 452
11.5.4 Kappa: A Summary Measure of Agreement 452
11.5.5 Weighted Kappa: Quantifying Disagreement 453
11.5.6 Extensions to Multiple Observers 453
11.6 Bradley-Terry Model for Paired Preferences 454
11.6.1 Bradley-Terry Model 454
11.6.2 Example: Major League Baseball Rankings 454
11.6.3 Example: Home Team Advantage in Baseball 455
11.6.4 Bradley-Terry Model and Quasi-symmetry 456
11.6.5 Extensions to Ties and Ordinal Pairwise Evaluations 457
11.7 Marginal Models and Quasi-Symmetry Models for Matched Sets 457
11.7.1 Marginal Homogeneity, Complete Symmetry, and Quasi-symmetry 457
11.7.2 Types of Marginal Symmetry 458
11.7.3 Comparing Binary Marginal Distributions in Multiway Tables 458
11.7.4 Example: Attitudes Toward Legalized Abortion 459
11.7.5 Marginal Homogeneity for a Multicategory Response 460
11.7.6 Wald and Generalized CMH Score Tests of Marginal Homogeneity 460
Notes 461
Exercises 463
12 Clustered Categorical Data: Marginal and Transitional Models 473
12.1 Marginal Modeling: Maximum Likelihood Approach 474
12.1.1 Example: Longitudinal Study of Mental Depression 474
12.1.2 Modeling a Repeated Multinomial Response 476
12.1.3 Example: Insomnia Clinical Trial 476
12.1.4 ML Fitting of Marginal Logistic Models: Constraints on Cell Probabilities 477
12.1.5 ML Fitting of Marginal Logistic Models: Other Methods 479
12.2 Marginal Modeling: Generalized Estimating Equations (GEEs) Approach 480
12.2.1 Generalized Estimating Equations Methodology: Basic Ideas 480
12.2.2 Example: Longitudinal Mental Depression Revisited 481
12.2.3 Example: Multinomial GEE Approach for Insomnia Trial 482
12.3 Quasi-Likelihood and Its GEE Multivariate Extension: Details 483
12.3.1 The Univariate Quasi-likelihood Method 483
12.3.2 Properties of Quasi–likelihood Estimators 484
12.3.3 Sandwich Covariance Adjustment for Variance Misspecification 485
12.3.4 GEE Multivariate Methodology: Technical Details 486
12.3.5 Working Associations Characterized by Odds Ratios 488
12.3.6 GEE Approach: Multinomial Responses 488
12.3.7 Dealing with Missing Data 489
12.4 Transitional Models: Markov Chain and Time Series Models 491
12.4.1 Markov Chains 491
12.4.2 Example: Changes in Evapotranspiration Rates 492
12.4.3 Transitional Models with Explanatory Variables 493
12.4.4 Example: Child's Respiratory Illness and Maternal Smoking 494
12.4.5 Example: Initial Response in Matched Pair as a Covariate 495
12.4.6 Transitional Models and Loglinear Conditional Models 496
Notes 496
Exercises 497
13 Clustered Categorical Data: Random Effects Models 507
13.1 Random Effects Modeling of Clustered Categorical Data 507
13.1.1 Generalized Linear Mixed Model 508
13.1.2 Logistic GLMM with Random Intercept for Binary Matched Pairs 509
13.1.3 Example: Changes in Presidential Voting Revisited 510
13.1.4 Extension: Rasch Model and Item Response Models 510
13.1.5 Random Effects Versus Conditional ML Approaches 511
13.2 Binary Responses: Logistic-Normal Model 512
13.2.1 Shared Random Effect Implies Nonnegative Marginal Correlations 512
13.2.2 Interpreting Heterogeneity in Logistic-Normal Models 512
13.2.3 Connections Between Random Effects Models and Marginal Models 513
13.2.4 Comments About GLMMs Versus Marginal Models 515
13.3 Examples of Random Effects Models for Binary Data 516
13.3.1 Example: Small–Area Estimation of Binomial Proportions 516
13.3.2 Modeling Repeated Binary Responses: Attitudes About Abortion 518
13.3.3 Example: Longitudinal Mental Depression Study Revisited 520
13.3.4 Example: Capture–Recapture Prediction of Population Size 521
13.3.5 Example: Heterogeneity Among Multicenter Clinical Trials 523
13.3.6 Meta-analysis Using a Random Effects Approach 525
13.3.7 Alternative Formulations of Random Effects Models 525
13.3.8 Example: Matched Pairs with a Bivariate Binary Response 526
13.3.9 Time Series Models Using Autocorrelated Random Effects 527
13.3.10 Example: Oxford and Cambridge Annual Boat Race 528
13.4 Random Effects Models for Multinomial Data 529
13.4.1 Cumulative Logit Model with Random Intercept 529
13.4.2 Example: Insomnia Study Revisited 529
13.4.3 Example: Combining Measures on Ordinal Items 530
13.4.4 Example: Cluster Sampling 531
13.4.5 Baseline-Category Logit Models with Random Effects 532
13.4.6 Example: Effectiveness of Housing Program 532
13.5 Multilevel Modeling 533
13.5.1 Hierarchical Random Terms: Partitioning Variability 534
13.5.2 Example: Children's Care for an Unmarried Mother 534
13.6 GLMM Fitting, Inference, and Prediction 537
13.6.1 Marginal Likelihood and Maximum Likelihood Fitting 537
13.6.2 Gauss–Hermite Quadrature Methods for ML Fitting 538
13.6.3 Monte Carlo and EM Methods for ML Fitting 538
13.6.4 Laplace and Penalized Quasi-likelihood Approximations to ML 539
13.6.5 Inference for GLMM Parameters 540
13.6.6 Prediction Using Random Effects 540
13.7 Bayesian Multivariate Categorical Modeling 541
13.7.1 Marginal Homogeneity Analyses for Matched Pairs 541
13.7.2 Bayesian Approaches to Meta-analysis and Multicenter Trials 541
13.7.3 Example: Bayesian Analyses for a Multicenter Trial 542
13.7.4 Bayesian GLMMs and Marginal Models 542
Notes 543
Exercises 545
14 Other Mixture Models for Discrete Data 553
14.1 Latent Class Models 553
14.1.1 Independence Given a Latent Categorical Variable 554
14.1.2 Fitting Latent Class Models 555
14.1.3 Example: Latent Class Model for Rater Agreement 556
14.1.4 Example: Latent Class Models for Capture-Recapture 558
14.1.5 Example: Latent Class Transitional Models 559
14.2 Nonparametric Random Effects Models 560
14.2.1 Logistic Models with Unspecified Random Effects Distribution 560
14.2.2 Example: Attitudes About Legalized Abortion 560
14.2.3 Example: Nonparametric Mixing of Logistic Regressions 561
14.2.4 Is Misspecification of Random Effects a Serious Problem? 561
14.2.5 Rasch Mixture Model 563
14.2.6 Example: Modeling Rater Agreement Revisited 563
14.2.7 Nonparametric Mixtures and Quasi-symmetry 564
14.2.8 Example: Attitudes About Legalized Abortion Revisited 565
14.3 Beta-Binomial Models 566
14.3.1 Beta-Binomial Distribution 566
14.3.2 Models Using the Beta-Binomial Distribution 567
14.3.3 Quasi-likelihood with Beta-Binomial Type Variance 567
14.3.4 Example: Teratology Overdispersion Revisited 568
14.3.5 Conjugate Mixture Models 570
14.4 Negative Binomial Regression 570
14.4.1 Gamma Mixture of Poissons Is Negative Binomial 571
14.4.2 Negative Binomial Regression Modeling 571
14.4.3 Example: Frequency of Knowing Homicide Victims 572
14.5 Poisson Regression with Random Effects 573
14.5.1 A Poisson GLMM 574
14.5.2 Marginal Model Implied by Poisson GLMM 574
14.5.3 Example: Homicide Victim Frequency Revisited 575
14.5.4 Negative Binomial Models versus Poisson GLMMs 575
Notes 575
Exercises 576
15 Non-Model-Based Classification and Clustering 583
15.1 Classification: Linear Discriminant Analysis 583
15.1.1 Classification with Normally Distributed Predictors 584
15.1.2 Example: Horseshoe Crab Satellites Revisited 585
15.1.3 Multicategory Classification and Other Versions of Discriminant Analysis 586
15.1.4 Classification Methods for High Dimensions 587
15.1.5 Discriminant Analysis Versus Logistic Regression 587
15.2 Classification: Tree-Structured Prediction 588
15.2.1 Classification Trees 588
15.2.2 Example: Classification Tree for a Health Care Application 589
15.2.3 How Does the Classification Tree Grow? 590
15.2.4 Pruning a Tree and Checking Prediction Accuracy 591
15.2.5 Classification Trees Versus Logistic Regression 592
15.2.6 Support Vector Machines for Classification 593
15.3 Cluster Analysis for Categorical Data 594
15.3.1 Supervised Versus Unsupervised Learning 595
15.3.2 Measuring Dissimilarity Between Observations 595
15.3.3 Clustering Algorithms: Partitions and Hierarchies 596
15.3.4 Example: Clustering States on Election Results 597
Notes 599
Exercises 600
16 Large- and Small-Sample Theory for Multinomial Models 605
16.1 Delta Method 605
16.1.1 O, o Rates of Convergence 606
16.1.2 Delta Method for a Function of a Random Variable 606
16.1.3 Delta Method for a Function of a Random Vector 607
16.1.4 Asymptotic Normality of Functions of Multinomial Counts 608
16.1.5 Delta Method for a Vector Function of a Random Vector 609
16.1.6 Joint Asymptotic Normality of Log Odds Ratios 609
16.2 Asymptotic Distributions of Estimators of Model Parameters and Cell Probabilities 610
16.2.1 Asymptotic Distribution of Model Parameter Estimator 610
16.2.2 Asymptotic Distribution of Cell Probability Estimators 611
16.2.3 Model Smoothing Is Beneficial 612
16.3 Asymptotic Distributions of Residuals and Goodness-of-fit Statistics 612
16.3.1 Joint Asymptotic Normality of p and ? 612
16.3.2 Asymptotic Distribution of Pearson and Standardized Residuals 613
16.3.3 Asymptotic Distribution of Pearson X2 Statistic 614
16.3.4 Asymptotic Distribution of Likelihood-Ratio Statistic 615
16.3.5 Asymptotic Noncentral Distributions 616
16.4 Asymptotic Distributions for Logit/Loglinear Models 617
16.4.1 Asymptotic Covariance Matrices 617
16.4.2 Connection with Poisson Loglinear Models 618
16.5 Small-Sample Significance Tests for Contingency Tables 619
16.5.1 Exact Conditional Distribution for I x J Tables Under Independence 619
16.5.2 Exact Tests of Independence for I x J Tables 620
16.5.3 Example: Sexual Orientation and Party ID 620
16.6 Small-Sample Confidence Intervals for Categorical Data 621
16.6.1 Small-Sample CIs for a Binomial Parameter 621
16.6.2 CIs Based on Tests Using the Mid P- Value 623
16.6.3 Example: Proportion of Vegetarians Revisited 623
16.6.4 Small-Sample CIs for Odds Ratios 624
16.6.5 Example: Fisher's Tea Taster Revisited 625
16.6.6 Small-Sample CIs for Logistic Regression Parameters 625
16.6.7 Example: Diarrhea and an Antibiotic 626
16.6.8 Unconditional Small-Sample CIs for Difference of Proportions 627
16.7 Alternative Estimation Theory for Parametric Models 628
16.7.1 Weighted Least Squares for Categorical Data 628
16.7.2 Inference Using the WLS Approach to Model Fitting 629
16.7.3 Scope of WLS Versus ML Estimation 630
16.7.4 Minimum Chi-Squared Estimators 631
16.7.5 Minimum Discrimination Information 632
Notes 633
Exercises 634
17 Historical Tour of Categorical Data Analysis 641
17.1 Pearson-Yule Association Controversy 641
17.2 R. A. Fisher's Contributions 643
17.3 Logistic Regression 645
17.4 Multiway Contingency Tables and Loglinear Models 647
17.5 Bayesian Methods for Categorical Data 651
17.6 A Look Forward, and Backward 652
Appendix A Statistical Software for Categorical Data Analysis 655
Appendix B Chi-Squared Distribution Values 659
References 661
Author Index 707
Example Index 719
Subject Index 723

Erscheint lt. Verlag 24.4.2014
Reihe/Serie Wiley Series in Probability and Statistics
Wiley Series in Probability and Statistics
Wiley Series in Probability and Statistics
Sprache englisch
Themenwelt Mathematik / Informatik Mathematik Statistik
Mathematik / Informatik Mathematik Wahrscheinlichkeit / Kombinatorik
Technik
Schlagworte Analysis • anyone expecting • Applications • biomedical • Book • categorical • categorical data • categorical data analysis • Data • Data Mining • Data Mining Statistics • Datenanalyse • Delight • dramatically • Edition • essential desktop • Kategorielle Datenanalyse • Multivariate Analyse • multivariate analysis • musthave • Reference • Research • Second • Statistical Methods • Statistics • Statistik • Total • use
ISBN-10 1-118-71085-1 / 1118710851
ISBN-13 978-1-118-71085-2 / 9781118710852
Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?
PDFPDF (Adobe DRM)

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seiten­layout eignet sich die PDF besonders für Fach­bücher mit Spalten, Tabellen und Abbild­ungen. Eine PDF kann auf fast allen Geräten ange­zeigt werden, ist aber für kleine Displays (Smart­phone, eReader) nur einge­schränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich