Zum Hauptinhalt springen
Nicht aus der Schweiz? Besuchen Sie lehmanns.de

Statistical Analysis with Missing Data (eBook)

eBook Download: EPUB
2019 | 3. Auflage
John Wiley & Sons (Verlag)
978-1-118-59569-5 (ISBN)

Lese- und Medienproben

Statistical Analysis with Missing Data - Roderick J. A. Little, Donald B. Rubin
Systemvoraussetzungen
86,99 inkl. MwSt
(CHF 84,95)
Der eBook-Verkauf erfolgt durch die Lehmanns Media GmbH (Berlin) zum Preis in Euro inkl. MwSt.
  • Download sofort lieferbar
  • Zahlungsarten anzeigen
An up-to-date, comprehensive treatment of a classic text on missing data in statistics
The topic of missing data has gained considerable attention in recent decades. This new edition by two acknowledged experts on the subject offers an up-to-date account of practical methodology for handling missing data problems. Blending theory and application, authors Roderick Little and Donald Rubin review historical approaches to the subject and describe simple methods for multivariate analysis with missing values. They then provide a coherent theory for analysis of problems based on likelihoods derived from statistical models for the data and the missing data mechanism, and then they apply the theory to a wide range of important missing data problems.
Statistical Analysis with Missing Data, Third Edition starts by introducing readers to the subject and approaches toward solving it. It looks at the patterns and mechanisms that create the missing data, as well as a taxonomy of missing data. It then goes on to examine missing data in experiments, before discussing complete-case and available-case analysis, including weighting methods. The new edition expands its coverage to include recent work on topics such as nonresponse in sample surveys, causal inference, diagnostic methods, and sensitivity analysis, among a host of other topics.
  • An updated 'classic' written by renowned authorities on the subject
  • Features over 150 exercises (including many new ones)
  • Covers recent work on important methods like multiple imputation, robust alternatives to weighting, and Bayesian methods
  • Revises previous topics based on past student feedback and class experience
  • Contains an updated and expanded bibliography

The authors were awarded The Karl Pearson Prize in 2017 by the International Statistical Institute, for a research contribution that has had profound influence on statistical theory, methodology or applications. Their work 'has been no less than defining and transforming.' (ISI)
Statistical Analysis with Missing Data, Third Edition is an ideal textbook for upper undergraduate and/or beginning graduate level students of the subject. It is also an excellent source of information for applied statisticians and practitioners in government and industry.

Roderick J. A. Little, PhD., is Richard D. Remington Distinguished University Professor of Biostatistics, Professor of Statistics, and Research Professor, Institute for Social Research, at the University of Michigan.

Donald B. Rubin, PhD., is Professor, Yau Mathematical Sciences Center, Tsinghua University; Murray Shusterman Senior Research Fellow, Department of Statistical Science, Fox School of Business at Temple University; and Professor Emeritus, Harvard University.

Roderick J. A. Little, PhD., is Richard D. Remington Distinguished University Professor of Biostatistics, Professor of Statistics, and Research Professor, Institute for Social Research, at the University of Michigan. Donald B. Rubin, PhD., is Professor, Yau Mathematical Sciences Center, Tsinghua University; Murray Shusterman Senior Research Fellow, Department of Statistical Science, Fox School of Business at Temple University; and Professor Emeritus, Harvard University.

1
Introduction


1.1 The Problem of Missing Data


Standard statistical methods have been developed to analyze rectangular data sets. Traditionally, the rows of the data matrix represent units, also called cases, observations, or subjects depending on context, and the columns represent characteristics or variables measured for each unit. The entries in the data matrix are nearly always real numbers, either representing the values of essentially continuous variables, such as age and income, or representing categories of response, which may be ordered (e.g., level of education) or unordered (e.g., race, sex). This book concerns the analysis of such a data matrix when some of the entries in the matrix are not observed. For example respondents in a household survey may refuse to report income; in an industrial experiment, some results are missing because of mechanical failures unrelated to the experimental process; in an opinion survey, some individuals may be unable to express a preference for one candidate over another.

In the first two examples, it is natural to treat the values that are not observed as missing, in the sense that there are actual underlying values that would have been observed if survey techniques had been better or the industrial equipment had been better maintained. In the third example, however, it is less clear that a well-defined candidate preference has been masked by the nonresponse; thus, it is less natural to treat the unobserved values as missing. Instead, in this example, the lack of a response is essentially an additional point in the sample space of the variable being measured, which identifies a “no preference” or “don't know” stratum of the population for that variable.

Older review articles on the statistical analysis of data with missing values include Afifi and Elashoff (1966), Hartley and Hocking (1971), Orchard and Woodbury (1972), Dempster et al. (1977), Little and Rubin (1983a), Little and Schenker (1994), and Little (1997). More recent literature includes books on the topic, such as Schafer (1997), van Buuren (2012), Carpenter and Kenward (2014), and Raghunathan (2015).

Part I considers basic approaches, including analysis of the complete cases and associated weighting methods, and methods that impute (that is fill in), the missing values. Part II considers more principled approaches based on statistical models and the associated likelihood function, and Part III provides applications of these methods. Our generally preferred philosophy of inference can be termed “calibrated Bayes,” where the inference is Bayesian, using models that yield inferences with good frequentist properties (Rubin 1984, 2019; Little 2006). For example, 95% Bayesian credibility intervals should have approximately 95% confidence coverage in repeated sampling from the population. The method of multiple imputation has such a Bayesian justification but can be used in conjunction with standard frequentist approaches to the complete-data inference.

Most statistical software packages allow the identification of nonrespondents by creating one or more special codes for those entries of the data matrix that are not observed. More than one code might be used to identify particular types of nonresponse, such as “don't know,” or “refuse to answer,” or “out of legitimate range.” Some statistical software excludes units that have missing value codes for any of the variables involved in an analysis. This strategy, which is often termed a “complete-case analysis,” is generally inappropriate because the investigator is usually interested in making inferences about the entire target population, rather than about the portion of the target population that would provide responses on all relevant variables in the analysis. Our aim is to describe a collection of techniques that are more generally appropriate than complete-case analysis when missing entries in the data set mask the underlying values.

Definition 1.1 Missing data are unobserved values that would be meaningful for analysis if observed; in other words, a missing value hides a meaningful value.

When Definition 1.1 applies, it makes sense to consider analyses that effectively predict, or “impute” (that is, fill in), the unobserved values. If, on the other hand, Definition 1.1 does not apply, then imputing the unobserved values makes little sense, and an analysis that creates strata of the population defined by the pattern of observed data is more appropriate. Example 1.1 describes a situation with longitudinal data on obesity where Definition 1.1 clearly makes sense. Example 1.2 describes the case of a randomized experiment where it makes sense for one outcome variable (survival) but not for another (quality of life); and Example 1.3 describes a situation in opinion polling where Definition 1.1 may or may not make sense, depending on the specific setting.

Example 1.1 Nonresponse for a Binary Outcome Measured at Three Times Points. Woolson and Clarke (1984) analyze data from the Muscatine Coronary Risk Factor Study, a longitudinal study of coronary risk factors in schoolchildren. Table 1.1 summarizes the pattern of missing data in the data matrix. Five variables (sex, age, and obesity for three rounds of the survey) are recorded for 4856 units; sex and age are completely recorded, but the three obesity variables are sometimes missing, thereby generating six patterns of missingness. Because age is recorded in five categories and the obesity variables are binary, the data can be displayed as counts in a contingency table. Table 1.2 displays the data in this form, with missingness of obesity treated as a third category of the variable, where O = obese, N = not obese, and M = missing. Thus, the pattern MON denotes missing at the first round, obese at the second round, and not obese at the third round, and the other five patterns are defined analogously.

Table 1.1 Example 1.1: data matrix for children in a survey summarized by the pattern of missing data: 1 = missing, 0 = observed

Variables
Pattern Age Sex Weight 1 Weight 2 Weight 3 No. of children with pattern
A 0 0 0 0 0 1770
B 0 0 0 0 1 631
C 0 0 0 1 0 184
D 0 0 1 0 0 645
E 0 0 0 1 1 756
F 0 0 1 0 1 370
G 0 0 1 1 0 500

Woolson and Clarke analyze these data by fitting multinomial distributions over the 33 − 1 = 26 response categories for each column in Table 1.2. That is missingness is regarded as defining strata of the population. We suspect that for these data, it makes good sense to regard the nonrespondents as having a true underlying value for the obesity variable. Hence, we would argue for treating the nonresponse categories as missing value indicators and estimating the joint distribution of the three dichotomous outcome variables from the partially missing data. Appropriate methods for handling such categorical data with missing values effectively impute the values of obesity that are not observed, as described in Chapter 12. The methods involve quite straightforward modifications of existing algorithms for categorical data analysis, which are now widely available in statistical software packages. For an analysis of these data that averages over patterns of missing data, see Ekholm and Skinner (1998).

Table 1.2 Example 1.1: number of children classified by population and relative weight category in three rounds of a survey

Males Females
Response Age group Age...

Erscheint lt. Verlag 21.3.2019
Reihe/Serie Wiley Series in Probability and Statistics
Wiley Series in Probability and Statistics
Wiley Series in Probability and Statistics
Sprache englisch
Themenwelt Mathematik / Informatik Mathematik Statistik
Mathematik / Informatik Mathematik Wahrscheinlichkeit / Kombinatorik
Schlagworte Angewandte Wahrscheinlichkeitsrechnung u. Statistik • Applied Probability & Statistics • Bayesian analysis • bayesian data analysis • Bayes-Verfahren • Biostatistics • Biostatistik • Data Analysis • disclosure limitation • guide to missing data • handling missing data • Mathematics • Measurement error • methods for handling missing data • missing data • Missing data analysis • missing data and applied statistics • missing-data applications • missing data handbook • missing data theory • Probability • robust inference • Statistical Analysis • Statistical Data Analysis • Statistics • statistics and missing data • Statistik • Statistische Analyse
ISBN-10 1-118-59569-6 / 1118595696
ISBN-13 978-1-118-59569-5 / 9781118595695
Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?
EPUBEPUB (Adobe DRM)

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belle­tristik und Sach­büchern. Der Fließ­text wird dynamisch an die Display- und Schrift­größe ange­passt. Auch für mobile Lese­geräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich