Zum Hauptinhalt springen
Nicht aus der Schweiz? Besuchen Sie lehmanns.de

Causal Inference in Statistics (eBook)

A Primer
eBook Download: EPUB
2016
John Wiley & Sons (Verlag)
978-1-119-18686-1 (ISBN)

Lese- und Medienproben

Causal Inference in Statistics - Judea Pearl, Madelyn Glymour, Nicholas P. Jewell
Systemvoraussetzungen
34,99 inkl. MwSt
(CHF 34,15)
Der eBook-Verkauf erfolgt durch die Lehmanns Media GmbH (Berlin) zum Preis in Euro inkl. MwSt.
  • Download sofort lieferbar
  • Zahlungsarten anzeigen

Many of the concepts and terminology surrounding modern causal inference can be quite intimidating to the novice. Judea Pearl presents a book ideal for beginners in statistics, providing a comprehensive introduction to the field of causality.  Examples from classical statistics are presented throughout to demonstrate the need for causality in resolving decision-making dilemmas posed by data. Causal methods are also compared to traditional statistical methods, whilst questions are provided at the end of each section to aid student learning.



Judea Pearl, Computer Science and Statistics, University of California, Los Angeles, USA.

Madelyn Glymour, Philosophy, Carnegie Mellon University, Pittsburgh, USA.

Nicholas P. Jewell, Biostatistics and Statistics, University of California, Berkeley, USA.

Judea Pearl, Computer Science and Statistics, University of California, Los Angeles, USA. Madelyn Glymour, Philosophy, Carnegie Mellon University, Pittsburgh, USA. Nicholas P. Jewell, Biostatistics and Statistics, University of California, Berkeley, USA.

About the Authors ix

Preface xi

List of Figures xv

About the Companion Website xix

1 Preliminaries: Statistical and Causal Models 1

1.1 Why Study Causation 1

1.2 Simpson's Paradox 1

1.3 Probability and Statistics 7

1.3.1 Variables 7

1.3.2 Events 8

1.3.3 Conditional Probability 8

1.3.4 Independence 10

1.3.5 Probability Distributions 11

1.3.6 The Law of Total Probability 11

1.3.7 Using Bayes' Rule 13

1.3.8 Expected Values 16

1.3.9 Variance and Covariance 17

1.3.10 Regression 20

1.3.11 Multiple Regression 22

1.4 Graphs 24

1.5 Structural Causal Models 26

1.5.1 Modeling Causal Assumptions 26

1.5.2 Product Decomposition 29

2 Graphical Models and Their Applications 35

2.1 Connecting Models to Data 35

2.2 Chains and Forks 35

2.3 Colliders 40

2.4 d-separation 45

2.5 Model Testing and Causal Search 48

3 The Effects of Interventions 53

3.1 Interventions 53

3.2 The Adjustment Formula 55

3.2.1 To Adjust or not to Adjust? 58

3.2.2 Multiple Interventions and the Truncated Product Rule 60

3.3 The Backdoor Criterion 61

3.4 The Front-Door Criterion 66

3.5 Conditional Interventions and Covariate-Specific Effects 70

3.6 Inverse Probability Weighing 72

3.7 Mediation 75

3.8 Causal Inference in Linear Systems 78

3.8.1 Structural versus Regression Coefficients 80

3.8.2 The Causal Interpretation of Structural Coefficients 81

3.8.3 Identifying Structural Coefficients and Causal Effect 83

3.8.4 Mediation in Linear Systems 87

4 Counterfactuals and Their Applications 89

4.1 Counterfactuals 89

4.2 Defining and Computing Counterfactuals 91

4.2.1 The Structural Interpretation of Counterfactuals 91

4.2.2 The Fundamental Law of Counterfactuals 93

4.2.3 From Population Data to Individual Behavior - An Illustration 94

4.2.4 The Three Steps in Computing Counterfactuals 96

4.3 Nondeterministic Counterfactuals 98

4.3.1 Probabilities of Counterfactuals 98

4.3.2 The Graphical Representation of Counterfactuals 101

4.3.3 Counterfactuals in Experimental Settings 103

4.3.4 Counterfactuals in Linear Models 106

4.4 Practical Uses of Counterfactuals 107

4.4.1 Recruitment to a Program 107

4.4.2 Additive Interventions 109

4.4.3 Personal Decision Making 111

4.4.4 Sex Discrimination in Hiring 113

4.4.5 Mediation and Path-disabling Interventions 114

4.5 Mathematical Tool Kits for Attribution and Mediation 116

4.5.1 A Tool Kit for Attribution and Probabilities of Causation 116

4.5.2 A Tool Kit for Mediation 120

References 127

Index 133

"Despite the fact that quite a few high-quality books on the topic of causal inference
have recently been published, this book clearly fills an important gap: that of providing
a simple and clear primer...Use of
counterfactuals [in the final chapter] is elegantly linked to the structural causal models outlined in the previous
chapters...[while]intriguing examples are used to
introduce and illustrate the main concepts and methods...Several thought provoking
study questions, in the form of exercises, are given throughout the presentation,
and they can be very helpful for a better understanding of the material and
looking further into the subtleties of the concepts introduced. In summary, there is no
doubt that a discussion of the basic ideas in causal inference should be included in all
introductory courses of statistics. This book could serve as a very useful companion to
the lectures." (Mathematical Reviews/MathSciNet April 2017)

Preface


When attempting to make sense of data, statisticians are invariably motivated by causal questions. For example, “How effective is a given treatment in preventing a disease?”; “Can one estimate obesity-related medical costs?”; “Could government actions have prevented the financial crisis of 2008?”; “Can hiring records prove an employer guilty of sex discrimination?”

The peculiar nature of these questions is that they cannot be answered, or even articulated, in the traditional language of statistics. In fact, only recently has science acquired a mathematical language we can use to express such questions, with accompanying tools to allow us to answer them from data.

The development of these tools has spawned a revolution in the way causality is treated in statistics and in many of its satellite disciplines, especially in the social and biomedical sciences. For example, in the technical program of the 2003 Joint Statistical Meeting in San Francisco, there were only 13 papers presented with the word “cause” or “causal” in their titles; the number of such papers exceeded 100 by the Boston meeting in 2014. These numbers represent a transformative shift of focus in statistics research, accompanied by unprecedented excitement about the new problems and challenges that are opening themselves to statistical analysis. Harvard's political science professor Gary King puts this revolution in historical perspective: “More has been learned about causal inference in the last few decades than the sum total of everything that had been learned about it in all prior recorded history.”

Yet this excitement remains barely seen among statistics educators, and is essentially absent from statistics textbooks, especially at the introductory level. The reasons for this disparity is deeply rooted in the tradition of statistical education and in how most statisticians view the role of statistical inference.

In Ronald Fisher's influential manifesto, he pronounced that “the object of statistical methods is the reduction of data” (Fisher 1922). In keeping with that aim, the traditional task of making sense of data, often referred to generically as “inference,” became that of finding a parsimonious mathematical description of the joint distribution of a set of variables of interest, or of specific parameters of such a distribution. This general strategy for inference is extremely familiar not just to statistical researchers and data scientists, but to anyone who has taken a basic course in statistics. In fact, many excellent introductory books describe smart and effective ways to extract the maximum amount of information possible from the available data. These books take the novice reader from experimental design to parameter estimation and hypothesis testing in great detail. Yet the aim of these techniques are invariably the description of data, not of the process responsible for the data. Most statistics books do not even have the word “causal” or “causation” in the index.

Yet the fundamental question at the core of a great deal of statistical inference is causal; do changes in one variable cause changes in another, and if so, how much change do they cause? In avoiding these questions, introductory treatments of statistical inference often fail even to discuss whether the parameters that are being estimated are the relevant quantities to assess when interest lies in cause and effects.

The best that most introductory textbooks do is this: First, state the often-quoted aphorism that “association does not imply causation,” give a short explanation of confounding and how “lurking variables” can lead to a misinterpretation of an apparent relationship between two variables of interest. Further, the boldest of those texts pose the principal question: “How can a causal link between x and y be established?” and answer it with the long-standing “gold standard” approach of resorting to randomized experiment, an approach that to this day remains the cornerstone of the drug approval process in the United States and elsewhere.

However, given that most causal questions cannot be addressed through random experimentation, students and instructors are left to wonder if there is anything that can be said with any reasonable confidence in the absence of pure randomness.

In short, by avoiding discussion of causal models and causal parameters, introductory textbooks provide readers with no basis for understanding how statistical techniques address scientific questions of causality.

It is the intent of this primer to fill this gnawing gap and to assist teachers and students of elementary statistics in tackling the causal questions that surround almost any nonexperimental study in the natural and social sciences. We focus here on simple and natural methods to define causal parameters that we wish to understand and to show what assumptions are necessary for us to estimate these parameters in observational studies. We also show that these assumptions can be expressed mathematically and transparently and that simple mathematical machinery is available for translating these assumptions into estimable causal quantities, such as the effects of treatments and policy interventions, to identify their testable implications.

Our goal stops there for the moment; we do not address in any detail the optimal parameter estimation procedures that use the data to produce effective statistical estimates and their associated levels of uncertainty. However, those ideas—some of which are relatively advanced—are covered extensively in the growing literature on causal inference. We thus hope that this short text can be used in conjunction with standard introductory statistics textbooks like the ones we have described to show how statistical models and inference can easily go hand in hand with a thorough understanding of causation.

It is our strong belief that if one wants to move beyond mere description, statistical inference cannot be effectively carried out without thinking carefully about causal questions, and without leveraging the simple yet powerful tools that modern analysis has developed to answer such questions. It is also our experience that thinking causally leads to a much more exciting and satisfying approach to both the simplest and most complex statistical data analyses. This is not a new observation. Virgil said it much more succinctly than we in 29 BC:

“Felix, qui potuit rerum cognoscere causas” (Virgil 29 BC)
(Lucky is he who has been able to understand the causes of things)

The book is organized in four chapters.

Chapter 1 provides the basic statistical, probabilistic, and graphical concepts that readers will need to understand the rest of the book. It also introduces the fundamental concepts of causality, including the causal model, and explains through examples how the model can convey information that pure data are unable to provide.

Chapter 2 explains how causal models are reflected in data, through patterns of statistical dependencies. It explains how to determine whether a data set complies with a given causal model, and briefly discusses how one might search for models that explain a given data set.

Chapter 3 is concerned with how to make predictions using causal models, with a particular emphasis on predicting the outcome of a policy intervention. Here we introduce techniques of reducing confounding bias using adjustment for covariates, as well as inverse probability weighing. This chapter also covers mediation analysis and contains an in-depth look at how the causal methods discussed thus far work in a linear system. Key to these methods is the fundamental distinction between regression coefficients and structural parameters, and how students should use both to predict causal effects in linear models.

Chapter 4 introduces the concept of counterfactuals—what would have happened, had we chosen differently at a point in the past—and discusses how we can compute them, estimate their probabilities, and what practical questions we can answer using them. This chapter is somewhat advanced, compared to its predecessors, primarily due to the novelty of the notation and the hypothetical nature of the questions asked. However, the fact that we read and compute counterfactuals using the same scientific models that we used in previous chapters should make their analysis an easy journey for students and instructors. Those wishing to understand counterfactuals on a friendly mathematical level should find this chapter a good starting point, and a solid basis for bridging the model-based approach taken in this book with the potential outcome framework that some experimentalists are pursuing in statistics.

Acknowledgments


This book is an outgrowth of a graduate course on causal inference that the first author has been teaching at UCLA in the past 20 years. It owes many of its tools and examples to former members of the Cognitive Systems Laboratory who participated in the development of this material, both as researchers and as teaching assistants. These include Alex Balke, David Chickering, David Galles, Dan Geiger, Moises Goldszmidt, Jin Kim, George Rebane, Ilya Shpitser, Jin Tian, and Thomas Verma.

We are indebted to many colleagues from whom we have learned much about causal problems, their solutions, and how to present them to general audiences. These include Clark and Maria Glymour, for providing patient ears and sound advice on matters of both causation and writing, Felix...

Erscheint lt. Verlag 25.1.2016
Sprache englisch
Themenwelt Mathematik / Informatik Mathematik Statistik
Mathematik / Informatik Mathematik Wahrscheinlichkeit / Kombinatorik
Technik
Schlagworte cause effect relationships • Deduktion • Inferenzstatistik • Interpreting Data • Interventions • Law • <p>causal inference • Medical Statistics & Epidemiology • Medicine • Medizinische Statistik u. Epidemiologie • probability and statistics</p> • Public Policy • Statistics • Statistics for Social Sciences • Statistik • Statistik in den Sozialwissenschaften • tatistical methods
ISBN-10 1-119-18686-2 / 1119186862
ISBN-13 978-1-119-18686-1 / 9781119186861
Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?
EPUBEPUB (Adobe DRM)

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belle­tristik und Sach­büchern. Der Fließ­text wird dynamisch an die Display- und Schrift­größe ange­passt. Auch für mobile Lese­geräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich