Multiple Imputation and its Application (eBook)
John Wiley & Sons (Verlag)
978-1-118-44261-6 (ISBN)
A practical guide to analysing partially observed data.
Collecting, analysing and drawing inferences from data is central to research in the medical and social sciences. Unfortunately, it is rarely possible to collect all the intended data. The literature on inference from the resulting incomplete data is now huge, and continues to grow both as methods are developed for large and complex data structures, and as increasing computer power and suitable software enable researchers to apply these methods.
This book focuses on a particular statistical method for analysing and drawing inferences from incomplete data, called Multiple Imputation (MI). MI is attractive because it is both practical and widely applicable. The authors aim is to clarify the issues raised by missing data, describing the rationale for MI, the relationship between the various imputation models and associated algorithms and its application to increasingly complex data structures.
Multiple Imputation and its Application:
- Discusses the issues raised by the analysis of partially observed data, and the assumptions on which analyses rest.
- Presents a practical guide to the issues to consider when analysing incomplete data from both observational studies and randomized trials.
- Provides a detailed discussion of the practical use of MI with real-world examples drawn from medical and social statistics.
- Explores handling non-linear relationships and interactions with multiple imputation, survival analysis, multilevel multiple imputation, sensitivity analysis via multiple imputation, using non-response weights with multiple imputation and doubly robust multiple imputation.
Multiple Imputation and its Application is aimed at quantitative researchers and students in the medical and social sciences with the aim of clarifying the issues raised by the analysis of incomplete data data, outlining the rationale for MI and describing how to consider and address the issues that arise in its application.
James Carpenter, Medical Statistics Unit, London School of Hygiene and Tropical Medicine, UK.
Michael G. Kenward, Medical Statistics Unit, London School of Hygiene and Tropical Medicine, UK
Amongst other areas Professor Kenward has worked in pre-clinical and clinical medicine and epidemiology for over twenty years, holding a number of international positions. He has also been a statistical consultant for over twenty years, predominantly in medical research. He has taught over 80 short courses in biostatistics throughout the world, and is the author of the book Analysis of Repeated Measurements.
Both authors act as consultants in missing data problems in biostatistics for several major pharmaceutical companies. They have been funded since 2002 by the UK Economic and Social Research Council to develop multiple imputation software for multilevel data, and to provide training for research scientists in the handling of missing data from observational studies.
A practical guide to analysing partially observed data. Collecting, analysing and drawing inferences from data is central to research in the medical and social sciences. Unfortunately, it is rarely possible to collect all the intended data. The literature on inference from the resulting incomplete data is now huge, and continues to grow both as methods are developed for large and complex data structures, and as increasing computer power and suitable software enable researchers to apply these methods. This book focuses on a particular statistical method for analysing and drawing inferences from incomplete data, called Multiple Imputation (MI). MI is attractive because it is both practical and widely applicable. The authors aim is to clarify the issues raised by missing data, describing the rationale for MI, the relationship between the various imputation models and associated algorithms and its application to increasingly complex data structures. Multiple Imputation and its Application: Discusses the issues raised by the analysis of partially observed data, and the assumptions on which analyses rest. Presents a practical guide to the issues to consider when analysing incomplete data from both observational studies and randomized trials. Provides a detailed discussion of the practical use of MI with real-world examples drawn from medical and social statistics. Explores handling non-linear relationships and interactions with multiple imputation, survival analysis, multilevel multiple imputation, sensitivity analysis via multiple imputation, using non-response weights with multiple imputation and doubly robust multiple imputation. Multiple Imputation and its Application is aimed at quantitative researchers and students in the medical and social sciences with the aim of clarifying the issues raised by the analysis of incomplete data data, outlining the rationale for MI and describing how to consider and address the issues that arise in its application.
James Carpenter, Medical Statistics Unit, London School of Hygiene and Tropical Medicine, UK. Michael G. Kenward, Medical Statistics Unit, London School of Hygiene and Tropical Medicine, UK Amongst other areas Professor Kenward has worked in pre-clinical and clinical medicine and epidemiology for over twenty years, holding a number of international positions. He has also been a statistical consultant for over twenty years, predominantly in medical research. He has taught over 80 short courses in biostatistics throughout the world, and is the author of the book Analysis of Repeated Measurements. Both authors act as consultants in missing data problems in biostatistics for several major pharmaceutical companies. They have been funded since 2002 by the UK Economic and Social Research Council to develop multiple imputation software for multilevel data, and to provide training for research scientists in the handling of missing data from observational studies.
Chapter 1
Introduction
Collecting, analysing and drawing inferences from data are central to research in the medical and social sciences. Unfortunately, for any number of reasons, it is rarely possible to collect all the intended data. The ubiquity of missing data, and the problems this poses for both analysis and inference, has spawned a substantial statistical literature dating from 1950s. At that time, when statistical computing was in its infancy, many analyses were only feasible because of the carefully planned balance in the dataset (for example, the same number of observations on each unit). Missing data meant the available data for analysis were unbalanced, thus complicating the planned analysis and in some instances rendering it unfeasible. Early work on the problem was therefore largely computational (e.g. Healy and Westmacott, 1956; Afifi and Elashoff, 1966; Orchard and Woodbury, 1972; Dempster et al., 1977).
The wider question of the consequences of nontrivial proportions of missing data for inference was neglected until a seminal paper by Rubin (1976). This set out a typology for assumptions about the reasons for missing data, and sketched their implications for analysis and inference. It marked the beginning of a broad stream of research about the analysis of partially observed data. The literature is now huge, and continues to grow, both as methods are developed for large and complex data structures, and as increasing computer power and suitable software enable researchers to apply these methods.
For a broad overview of the literature, a good place to start is one of the recent excellent textbooks. Little and Rubin (2002) write for applied statisticians. They give a good overview of likelihood methods, and give an introduction to multiple imputation. Allison (2002) presents a less technical overview. Schafer (1997) is more algorithmic, focusing on the EM algorithm and imputation using the multivatiate normal and general location model. Molenberghs and Kenward (2007) focus on clinical studies, while Daniels and Hogan (2008) focus on longitudinal studies with a Bayesian emphasis.
The above books concentrate on parametric approaches. However, there is also a growing literature based around using inverse probability weighting, in the spirit of Horvitz and Thompson (1952), and associated doubly robust methods. In particular, we refer to the work of Robins and colleagues (e.g. Robins et al., 1995; Scharfstein et al., 1999). Vansteelandt et al. (2009) give an accessible introduction to these developments. A comparison with multiple imputation in a simple setting is given by Carpenter et al. (2006). The pros and cons are debated in Kang and Schafer (2007) and the theory is brought together by Tsiatis (2006).
This book is concerned with a particular statistical method for analysing and drawing inferences from incomplete data, called Multiple Imputation (MI). Initially proposed by Rubin (1987) in the context of surveys, increasing awareness among researchers about the possible effects of missing data (e.g. Klebanoff and Cole, 2008) has led to an upsurge of interest (e.g. Sterne et al., 2009; Kenward and Carpenter, 2007; Schafer, 1999a; Rubin, 1996).
Multiple imputation (MI) is attractive because it is both practical and widely applicable. Recently developed statistical software (see, for example, issue 45 of the Journal of Statistical Software) has placed it within the reach of most researchers in the medical and social sciences, whether or not they have undertaken advanced training in statistics. However, the increasing use of MI in a range of settings beyond that originally envisaged has led to a bewildering proliferation of algorithms and software. Further, the implication of the underlying assumptions in the context of the data at hand is often unclear.
We are writing for researchers in the medical and social sciences with the aim of clarifying the issues raised by missing data, outlining the rationale for MI, explaining the motivation and relationship between the various imputation algorithms, and describing and illustrating its application to increasingly complex data structures.
Central to the analysis of partially observed data is an understanding of why the data are missing and the implications of this for the analysis. This is the focus of the remainder of this chapter. Introducing some of the examples that run through the book, we show how Rubin's typology (Rubin, 1976) provides the foundational framework for understanding the implications of missing data.
1.1 Reasons for missing data
In this section we consider possible reasons for missing data, illustrate these with examples, and draw some preliminary implications for inference. We use the word ‘possible’ advisedly, since with partially observed data we can rarely be sure of the mechanism giving rise to missing data. Instead, a range of possible mechanisms are consistent with the observed data. In practice, we therefore wish to analyse the data under different mechanisms, to establish the robustness of our inference in the face of uncertainty about the missingness mechanism.
All datasets consist of a series of units each of which provides information on a series of items. For example, in a cross-sectional questionnaire survey, the units would be individuals and the items their answers to the questions. In a household survey, the units would be households, and the items information about the household and members of the household. In longitudinal studies, units would typically be individuals while items would be longitudinal data from those individuals. In this book, units therefore correspond to the highest level in multilevel (i.e., hierarchical) data, and unless stated otherwise data from different units are statistically independent.
Within this framework, it is useful to distinguish between units where all the information is missing, termed unit nonresponse and units who contribute partial information, termed item nonresponse. The statistical issues are the same in both cases, and both can in principle be handled by MI. However, the main focus of this book is the latter.
Figure 1.1 Detail from a senior mandarin's house front in New Territories, Hong Kong. Photograph by H. Goldstein.
1.2 Examples
We now introduce two key examples, which we return to throughout the book.
| Erscheint lt. Verlag | 19.12.2012 |
|---|---|
| Reihe/Serie | Statistics in Practice | Statistics in Practice |
| Sprache | englisch |
| Themenwelt | Mathematik / Informatik ► Mathematik ► Angewandte Mathematik |
| Mathematik / Informatik ► Mathematik ► Statistik | |
| Mathematik / Informatik ► Mathematik ► Wahrscheinlichkeit / Kombinatorik | |
| Medizin / Pharmazie ► Allgemeines / Lexika | |
| Studium ► Querschnittsbereiche ► Epidemiologie / Med. Biometrie | |
| Schlagworte | AS, Stata, WinBUGS, MLwiN, R, clinical trial settings, MI, MI methods, biostatistics, randomized trials, biometrics • Bayesian analysis • Bayes-Verfahren • Biostatistics • Biostatistik • Clinical Trials • Klinische Studien • Statistics • Statistik |
| ISBN-10 | 1-118-44261-X / 111844261X |
| ISBN-13 | 978-1-118-44261-6 / 9781118442616 |
| Informationen gemäß Produktsicherheitsverordnung (GPSR) | |
| Haben Sie eine Frage zum Produkt? |
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich