Doing Bayesian Data Analysis - John Kruschke

Doing Bayesian Data Analysis (eBook)

A Tutorial with R, JAGS, and Stan

John Kruschke (Autor)

eBook Download: PDF | EPUB

2014 | 2. Auflage
776 Seiten
Elsevier textbooks (Verlag)
978-0-12-405916-0 (ISBN)

Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan, Second Edition provides an accessible approach for conducting Bayesian data analysis, as material is explained clearly with concrete examples. Included are step-by-step instructions on how to carry out Bayesian data analyses in the popular and free software R and WinBugs, as well as new programs in JAGS and Stan. The new programs are designed to be much easier to use than the scripts in the first edition. In particular, there are now compact high-level scripts that make it easy to run the programs on your own data sets.

The book is divided into three parts and begins with the basics: models, probability, Bayes' rule, and the R programming language. The discussion then moves to the fundamentals applied to inferring a binomial probability, before concluding with chapters on the generalized linear model. Topics include metric-predicted variable on one or two groups; metric-predicted variable with one metric predictor; metric-predicted variable with multiple metric predictors; metric-predicted variable with one nominal predictor; and metric-predicted variable with multiple nominal predictors. The exercises found in the text have explicit purposes and guidelines for accomplishment.

This book is intended for first-year graduate students or advanced undergraduates in statistics, data analysis, psychology, cognitive science, social sciences, clinical sciences, and consumer sciences in business.

Accessible, including the basics of essential concepts of probability and random sampling
Examples with R programming language and JAGS software
Comprehensive coverage of all scenarios addressed by non-Bayesian textbooks: t-tests, analysis of variance (ANOVA) and comparisons in ANOVA, multiple regression, and chi-square (contingency table analysis)
Coverage of experiment planning
R and JAGS computer programming code on website
Exercises have explicit purposes and guidelines for accomplishment
Provides step-by-step instructions on how to conduct Bayesian data analyses in the popular and free software R and WinBugs

John K. Kruschke is Professor of Psychological and Brain Sciences, and Adjunct Professor of Statistics, at Indiana University in Bloomington, Indiana, USA. He is eight-time winner of Teaching Excellence Recognition Awards from Indiana University. He won the Troland Research Award from the National Academy of Sciences (USA), and the Remak Distinguished Scholar Award from Indiana University. He has been on the editorial boards of various scientific journals, including Psychological Review, the Journal of Experimental Psychology: General, and the Journal of Mathematical Psychology, among others.

After attending the Summer Science Program as a high school student and considering a career in astronomy, Kruschke earned a bachelor's degree in mathematics (with high distinction in general scholarship) from the University of California at Berkeley. As an undergraduate, Kruschke taught self-designed tutoring sessions for many math courses at the Student Learning Center. During graduate school he attended the 1988 Connectionist Models Summer School, and earned a doctorate in psychology also from U.C. Berkeley. He joined the faculty of Indiana University in 1989. Professor Kruschke's publications can be found at his Google Scholar page. His current research interests focus on moral psychology.

Professor Kruschke taught traditional statistical methods for many years until reaching a point, circa 2003, when he could no longer teach corrections for multiple comparisons with a clear conscience. The perils of p values provoked him to find a better way, and after only several thousand hours of relentless effort, the 1st and 2nd editions of Doing Bayesian Data Analysis emerged.

Front Cover 1
Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan 4
Copyright 5
Dedication 6
Contents 8
Chapter 1: What's in This Book (Read This First!) 14
1.1 Real People Can Read This Book 14
1.1.1 Prerequisites 15
1.2 What's in This Book 16
1.2.1 You're busy. What's the least you can read? 16
1.2.2 You're really busy! Isn't there even less you can read? 17
1.2.3 You want to enjoy the view a little longer. But not too much longer 17
1.2.4 If you just gotta reject a null hypothesis… 18
1.2.5 Where's the equivalent of traditional test X in this book? 18
1.3 What's New in the Second Edition? 19
1.4 Gimme Feedback (Be Polite) 21
1.5 Thank You! 21
Part I: The Basics: Models, Probability, Bayes' Rule, and R 26
Chapter 2: Introduction: Credibility, Models, and Parameters 28
2.1 Bayesian Inference Is Reallocation of CredibilityAcross Possibilities 29
2.1.1 Data are noisy and inferences are probabilistic 32
2.2 Possibilities Are Parameter Values in Descriptive Models 35
2.3 The Steps of Bayesian Data Analysis 38
2.3.1 Data analysis without parametric models? 43
2.4 Exercises 44
Chapter
46
3.1 Get the Software 48
3.1.1 A look at RStudio 48
3.2 A Simple Example of R in Action 49
3.2.1 Get the programs used with this book 51
3.3 Basic Commands and Operators in R 51
3.3.1 Getting help in R 52
3.3.2 Arithmetic and logical operators 52
3.3.3 Assignment, relational operators, and tests of equality 53
3.4 Variable Types 55
3.4.1 Vector 55
3.4.1.1 The combine function 55
3.4.1.2 Component-by-component vector operations 55
3.4.1.3 The colon operator and sequence function 56
3.4.1.4 The replicate function 57
3.4.1.5 Getting at elements of a vector 58
3.4.2 Factor 59
3.4.3 Matrix and array 61
3.4.4 List and data frame 64
3.5 Loading and Saving Data 66
3.5.1 The read.csv and read.table functions 66
3.5.2 Saving data from R 68
3.6 Some Utility Functions 69
3.7 Programming in R 74
3.7.1 Variable names in R 74
3.7.2 Running a program 75
3.7.3 Programming a function 77
3.7.4 Conditions and loops 78
3.7.5 Measuring processing time 79
3.7.6 Debugging 80
3.8 Graphical Plots: Opening and Saving 82
3.9 Conclusion 82
3.10 Exercises 83
Chapter
84
4.1 The Set of All Possible Events 85
4.1.1 Coin flips: Why you should care 86
4.2 Probability: Outside or Inside the Head 86
4.2.1 Outside the head: Long-run relative frequency 87
4.2.1.1 Simulating a long-run relative frequency 87
4.2.1.2 Deriving a long-run relative frequency 89
4.2.2 Inside the head: Subjective belief 89
4.2.2.1 Calibrating a subjective belief by preferences 89
4.2.2.2 Describing a subjective belief mathematically 90
4.2.3 Probabilities assign numbers to possibilities 90
4.3 Probability Distributions 91
4.3.1 Discrete distributions: Probability mass 91
4.3.2 Continuous distributions: Rendezvous with density 93
4.3.2.1 Properties of probability density functions 95
4.3.2.2 The normal probability density function 96
4.3.3 Mean and variance of a distribution 97
4.3.3.1 Mean as minimized variance 99
4.3.4 Highest density interval (HDI) 100
4.4 Two-Way Distributions 102
4.4.1 Conditional probability 104
4.4.2 Independence of attributes 105
4.5 Appendix: R Code for Figure 4.1 106
4.6 Exercises 108
Chapter
112
5.1 Bayes' Rule 113
5.1.1 Derived from definitions of conditional probability 113
5.1.2 Bayes' rule intuited from a two-way discrete table 114
5.2 Applied to Parameters and Data 118
5.2.1 Data-order invariance 120
5.3 Complete Examples: Estimating Bias in a Coin 121
5.3.1 Influence of sample size on the posterior 125
5.3.2 Influence of the prior on the posterior 126
5.4 Why Bayesian Inference Can Be Difficult 128
5.5 Appendix: R Code for Figures 5.1, 5.2, etc. 129
5.6 Exercises 131
Part
134
Chapter
136
6.1 The Likelihood Function: Bernoulli Distribution 137
6.2 A Description of Credibilities: The Beta Distribution 139
6.2.1 Specifying a beta prior 140
6.3 The Posterior Beta 145
6.3.1 Posterior is compromise of prior and likelihood 146
6.4 Examples 147
6.4.1 Prior knowledge expressed as a beta distribution 147
6.4.2 Prior knowledge that cannot be expressed as a beta distribution 149
6.5 Summary 151
6.6 Appendix: R Code for Figure 6.4 151
6.7 Exercises 152
Chapter
156
7.1 Approximating a Distribution with a Large Sample 158
7.2 A Simple Case of the Metropolis Algorithm 159
7.2.1 A politician stumbles upon the Metropolis algorithm 159
7.2.2 A random walk 160
7.2.3 General properties of a random walk 162
7.2.4 Why we care 165
7.2.5 Why it works 165
7.3 The Metropolis Algorithm More Generally 169
7.3.1 Metropolis algorithm applied to Bernoulli likelihood and beta prior 170
7.3.2 Summary of Metropolis algorithm 174
7.4 Toward Gibbs Sampling: Estimating Two Coin Biases 175
7.4.1 Prior, likelihood and posterior for two biases 176
7.4.2 The posterior via exact formal analysis 178
7.4.3 The posterior via the Metropolis algorithm 181
7.4.4 Gibbs sampling 183
7.4.5 Is there a difference between biases? 189
7.4.6 Terminology: MCMC 190
7.5 MCMC Representativeness, Accuracy, and Efficiency 191
7.5.1 MCMC representativeness 191
7.5.2 MCMC accuracy 195
7.5.3 MCMC efficiency 200
7.6 Summary 201
7.7 Exercises 202
Chapter
206
8.1 JAGS and its Relation to R 206
8.2 A Complete Example 208
8.2.1 Load data 210
8.2.2 Specify model 211
8.2.3 Initialize chains 213
8.2.4 Generate chains 215
8.2.5 Examine chains 216
8.2.5.1 The plotPost function 218
8.3 Simplified Scripts for Frequently Used Analyses 219
8.4 Example: Difference of Biases 221
8.5 Sampling from the Prior Distribution in JAGS 224
8.6 Probability Distributions Available in JAGS 226
8.6.1 Defining new likelihood functions 227
8.7 Faster Sampling with Parallel Processing in RunJAGS 228
8.8 Tips for Expanding JAGS Models 231
8.9 Exercises 231
Chapter
234
9.1 A Single Coin from a Single Mint 236
9.1.1 Posterior via grid approximation 239
9.2 Multiple Coins from a Single Mint 243
9.2.1 Posterior via grid approximation 244
9.2.2 A realistic model with MCMC 248
9.2.3 Doing it with JAGS 252
9.2.4 Example: Therapeutic touch 253
9.3 Shrinkage in Hierarchical Models 258
9.4 Speeding up JAGS 262
9.5 Extending the Hierarchy: Subjects Within Categories 264
9.5.1 Example: Baseball batting abilities by position 266
9.6 Exercises 273
Chapter
278
10.1 General Formula and the Bayes Factor 279
10.2 Example: Two Factories of Coins 281
10.2.1 Solution by formal analysis 283
10.2.2 Solution by grid approximation 284
10.3 Solution by MCMC 287
10.3.1 Nonhierarchical MCMC computation of each model'smarginal likelihood 287
10.3.1.1 Implementation with JAGS 290
10.3.2 Hierarchical MCMC computation of relative model probability 291
10.3.2.1 Using pseudo-priors to reduce autocorrelation 292
10.3.3 Models with different "noise" distributions in JAGS 301
10.4 Prediction: Model Averaging 302
10.5 Model Complexity Naturally Accounted for 302
10.5.1 Caveats regarding nested model comparison 304
10.6 Extreme Sensitivity to Prior Distribution 305
10.6.1 Priors of different models should be equally informed 307
10.7 Exercises 308
Chapter
310
11.1 Paved with Good Intentions 313
11.1.1 Definition of p value 313
11.1.2 With intention to fix N 315
11.1.3 With intention to fix z 318
11.1.4 With intention to fix duration 321
11.1.5 With intention to make multiple tests 323
11.1.6 Soul searching 326
11.1.7 Bayesian analysis 327
11.2 Prior Knowledge 328
11.2.1 NHST analysis 328
11.2.2 Bayesian analysis 328
11.2.2.1 Priors are overt and relevant 330
11.3 Confidence Interval and Highest Density Interval 330
11.3.1 CI depends on intention 331
11.3.1.1 CI is not a distribution 336
11.3.2 Bayesian HDI 337
11.4 Multiple Comparisons 338
11.4.1 NHST correction for experimentwise error 338
11.4.2 Just one Bayesian posterior no matter how you look at it 341
11.4.3 How Bayesian analysis mitigates false alarms 341
11.5 What a Sampling Distribution Is Good For 342
11.5.1 Planning an experiment 342
11.5.2 Exploring model predictions (posterior predictive check) 343
11.6 Exercises 344
Chapter
348
12.1 The Estimation Approach 349
12.1.1 Region of practical equivalence 349
12.1.2 Some examples 353
12.1.2.1 Differences of correlated parameters 353
12.1.2.2 Why HDI and not equal-tailed interval? 355
12.2 The Model-Comparison Approach 356
12.2.1 Is a coin fair or not? 357
12.2.1.1 Bayes' factor can accept null with poor precision 360
12.2.2 Are different groups equal or not? 361
12.2.2.1 Model specification in JAGS 364
12.3 Relations of Parameter Estimation and Model Comparison 365
12.4 Estimation or Model Comparison? 367
12.5 Exercises 368
Chapter
372
13.1 The Will to Power 373
13.1.1 Goals and obstacles 373
13.1.2 Power 374
13.1.3 Sample size 377
13.1.4 Other expressions of goals 378
13.2 Computing Power and Sample Size 379
13.2.1 When the goal is to exclude a null value 379
13.2.2 Formal solution and implementation in R 381
13.2.3 When the goal is precision 383
13.2.4 Monte Carlo approximation of power 385
13.2.5 Power from idealized or actual data 389
13.3 Sequential Testing and the Goal of Precision 396
13.3.1 Examples of sequential tests 398
13.3.2 Average behavior of sequential tests 401
13.4 Discussion 406
13.4.1 Power and multiple comparisons 406
13.4.2 Power: prospective, retrospective, and replication 406
13.4.3 Power analysis requires verisimilitude of simulated data 407
13.4.4 The importance of planning 408
13.5 Exercises 409
Chapter
412
14.1 HMC Sampling 413
14.2 Installing Stan 420
14.3 A Complete Example 420
14.3.1 Reusing the compiled model 423
14.3.2 General structure of Stan model specification 423
14.3.3 Think log probability to think like Stan 424
14.3.4 Sampling the prior in Stan 425
14.3.5 Simplified scripts for frequently used analyses 426
14.4 Specify Models Top-Down in Stan 427
14.5 Limitations and Extras 428
14.6 Exercises 428
Part
430
Chapter
432
15.1 Types of Variables 433
15.1.1 Predictor and predicted variables 433
15.1.2 Scale types: metric, ordinal, nominal, and count 434
15.2 Linear Combination of Predictors 436
15.2.1 Linear function of a single metric predictor 436
15.2.2 Additive combination of metric predictors 438
15.2.3 Nonadditive interaction of metric predictors 440
15.2.4 Nominal predictors 442
15.2.4.1 Linear model for a single nominal predictor 442
15.2.4.2 Additive combination of nominal predictors 443
15.2.4.3 Nonadditive interaction of nominal predictors 445
15.3 Linking from Combined Predictors to Noisy Predicted data 448
15.3.1 From predictors to predicted central tendency 448
15.3.1.1 The logistic function 449
15.3.1.2 The cumulative normal function 452
15.3.2 From predicted central tendency to noisy data 453
15.4 Formal Expression of the GLM 457
15.4.1 Cases of the GLM 457
15.5 Exercises 459
Chapter
462
16.1 Estimating the Mean and Standard Deviationof a Normal Distribution 463
16.1.1 Solution by mathematical analysis 464
16.1.2 Approximation by MCMC in JAGS 468
16.2 Outliers and Robust Estimation: The t Distribution 471
16.2.1 Using the t distribution in JAGS 475
16.2.2 Using the t distribution in Stan 477
16.3 Two Groups 481
16.3.1 Analysis by NHST 483
16.4 Other Noise Distributions and Transforming Data 485
16.5 Exercises 486
Chapter
490
17.1 Simple Linear Regression 491
17.2 Robust Linear Regression 492
17.2.1 Robust linear regression in JAGS 496
17.2.1.1 Standardizing the data for MCMC sampling 497
17.2.2 Robust linear regression in Stan 500
17.2.2.1 Constants for vague priors 500
17.2.3 Stan or JAGS? 501
17.2.4 Interpreting the posterior distribution 502
17.3 Hierarchical Regression on Individuals Within Groups 503
17.3.1 The model and implementation in JAGS 504
17.3.2 The posterior distribution: Shrinkage and prediction 508
17.4 Quadratic Trend and Weighted Data 508
17.4.1 Results and interpretation 512
17.4.2 Further extensions 513
17.5 Procedure and Perils for Expanding a Model 514
17.5.1 Posterior predictive check 514
17.5.2 Steps to extend a JAGS or Stan model 515
17.5.3 Perils of adding parameters 516
17.6 Exercises 517
Chapter
522
18.1 Multiple Linear Regression 523
18.1.1 The perils of correlated predictors 523
18.1.2 The model and implementation 527
18.1.3 The posterior distribution 530
18.1.4 Redundant predictors 532
18.1.5 Informative priors, sparse data, and correlated predictors 536
18.2 Multiplicative Interaction of Metric Predictors 538
18.2.1 An example 540
18.3 Shrinkage of Regression Coefficients 543
18.4 Variable Selection 549
18.4.1 Inclusion probability is strongly affected by vagueness of prior 552
18.4.2 Variable selection with hierarchical shrinkage 555
18.4.3 What to report and what to conclude 557
18.4.4 Caution: Computational methods 560
18.4.5 Caution: Interaction variables 561
18.5 Exercises 562
Chapter
566
19.1 Describing Multiple Groups of Metric Data 567
19.2 Traditional Analysis of Variance 569
19.3 Hierarchical Bayesian Approach 570
19.3.1 Implementation in JAGS 573
19.3.2 Example: Sex and death 574
19.3.3 Contrasts 578
19.3.4 Multiple comparisons and shrinkage 580
19.3.5 The two-group case 581
19.4 Including a Metric Predictor 581
19.4.1 Example: Sex, death, and size 584
19.4.2 Analogous to traditional ANCOVA 584
19.4.3 Relation to hierarchical linear regression 586
19.5 Heterogeneous variances and robustnessagainst outliers 586
19.5.1 Example: Contrast of means with different variances 588
19.6 Exercises 592
Chapter
596
20.1 Describing Groups of Metric Data with MultipleNominal Predictors 597
20.1.1 Interaction 598
20.1.2 Traditional ANOVA 600
20.2 Hierarchical Bayesian Approach 601
20.2.1 Implementation in JAGS 602
20.2.2 Example: It's only money 603
20.2.3 Main effect contrasts 608
20.2.4 Interaction contrasts and simple effects 610
20.2.4.1 Interaction effects: High uncertainty and shrinkage 611
20.3 Rescaling can Change Interactions, Homogeneity,and Normality 612
20.4 Heterogeneous Variances and RobustnessAgainst Outliers 615
20.5 Within-Subject Designs 619
20.5.1 Why use a within-subject design? And why not? 621
20.5.2 Split-plot design 623
20.5.2.1 Example: Knee high by the fourth of July 624
20.5.2.2 The descriptive model 625
20.5.2.3 Implementation in JAGS 627
20.5.2.4 Results 627
20.6 Model Comparison Approach 629
20.7 Exercises 631
Chapter
634
21.1 Multiple Metric Predictors 635
21.1.1 The model and implementation in JAGS 635
21.1.2 Example: Height, weight, and gender 639
21.2 Interpreting the Regression Coefficients 642
21.2.1 Log odds 642
21.2.2 When there are few 1's or 0's in the data 644
21.2.3 Correlated predictors 645
21.2.4 Interaction of metric predictors 646
21.3 Robust Logistic Regression 648
21.4 Nominal Predictors 649
21.4.1 Single group 651
21.4.2 Multiple groups 654
21.4.2.1 Example: Baseball again 654
21.4.2.2 The model 655
21.4.2.3 Results 657
21.5 Exercises 659
Chapter
662
22.1 Softmax Regression 663
22.1.1 Softmax reduces to logistic for two outcomes 666
22.1.2 Independence from irrelevant attributes 667
22.2 Conditional logistic regression 668
22.3 Implementation in JAGS 672
22.3.1 Softmax model 672
22.3.2 Conditional logistic model 674
22.3.3 Results: Interpreting the regression coefficients 675
22.3.3.1 Softmax model 675
22.3.3.2 Conditional logistic model 677
22.4 Generalizations and Variations of the Models 680
22.5 Exercises 681
Chapter
684
23.1 Modeling Ordinal Data with an UnderlyingMetric Variable 685
23.2 The Case of a Single Group 688
23.2.1 Implementation in JAGS 689
23.2.2 Examples: Bayesian estimation recovers true parameter values 690
23.2.2.1 Not the same results as pretending the data are metric 693
23.2.2.2 Ordinal outcomes versus Likert scales 694
23.3 The Case of Two Groups 695
23.3.1 Implementation in JAGS 696
23.3.2 Examples: Not funny 696
23.4 The Case of Metric Predictors 698
23.4.1 Implementation in JAGS 701
23.4.2 Example: Happiness and money 702
23.4.3 Example: Movies—They don't make 'em like they used to 706
23.4.4 Why are some thresholds outside the data? 708
23.5 Posterior Prediction 711
23.6 Generalizations and Extensions 712
23.7 Exercises 713
Chapter
716
24.1 Poisson Exponential Model 717
24.1.1 Data structure 717
24.1.2 Exponential link function 718
24.1.3 Poisson noise distribution 720
24.1.4 The complete model and implementation in JAGS 721
24.2 Example: Hair Eye Go Again 724
24.3 Example: Interaction Contrasts, Shrinkage,and Omnibus Test 726
24.4 Log-Linear Models for Contingency Tables 728
24.5 Exercises 728
Chapter
734
25.1 Reporting a Bayesian analysis 734
25.1.1 Essential points 735
25.1.2 Optional points 737
25.1.3 Helpful points 737
25.2 Functions for Computing Highest Density Intervals 738
25.2.1 R code for computing HDI of a grid approximation 738
25.2.2 HDI of unimodal distribution is shortest interval 739
25.2.3 R code for computing HDI of a MCMC sample 740
25.2.4 R code for computing HDI of a function 741
25.3 Reparameterization 742
25.3.1 Examples 743
25.3.2 Reparameterization of two parameters 743
25.4 Censored Data in JAGS 745
25.5 What Next? 749
Bibliography 750
Index 760

Chapter 2

Introduction

Credibility, Models, and Parameters

Contents

2.1 Bayesian Inference Is Reallocation of Credibility Across Possibilities 16

2.1.1 Data are noisy and inferences are probabilistic 19

2.2 Possibilities Are Parameter Values in Descriptive Models 22

2.3 The Steps of Bayesian Data Analysis 25

2.3.1 Data analysis without parametric models? 30

2.4 Exercises 31

I just want someone who I can believe in,

Someone at home who will not leave me grievin'.

Show me a sign that you'll always be true,

and I'll be your model of faith and virtue.1

The goal of this chapter is to introduce the conceptual framework of Bayesian data analysis. Bayesian data analysis has two foundational ideas. The first idea is that Bayesian inference is reallocation of credibility across possibilities. The second foundational idea is that the possibilities, over which we allocate credibility, are parameter values in meaningful mathematical models. These two fundamental ideas form the conceptual foundation for every analysis in this book. Simple examples of these ideas are presented in this chapter. The rest of the book merely fills in the mathematical and computational details for specific applications of these two ideas. This chapter also explains the basic procedural steps shared by every Bayesian analysis.

2.1 Bayesian inference is reallocation of credibility across possibilities

Suppose we step outside one morning and notice that the sidewalk is wet, and wonder why. We consider all possible causes of the wetness, including possibilities such as recent rain, recent garden irrigation, a newly erupted underground spring, a broken sewage pipe, a passerby who spilled a drink, and so on. If all we know until this point is that some part of the sidewalk is wet, then all those possibilities will have some prior credibility based on previous knowledge. For example, recent rain may have greater prior probability than a spilled drink from a passerby. Continuing on our outside journey, we look around and collect new observations. If we observe that the sidewalk is wet for as far as we can see, as are the trees and parked cars, then we re-allocate credibility to the hypothetical cause of recent rain. The other possible causes, such as a passerby spilling a drink, would not account for the new observations. On the other hand, if instead we observed that the wetness was localized to a small area, and there was an empty drink cup a few feet away, then we would re-allocate credibility to the spilled-drink hypothesis, even though it had relatively low prior probability. This sort of reallocation of credibility across possibilities is the essence of Bayesian inference.

Another example of Bayesian inference has been immortalized in the words of the fictional detective Sherlock Holmes, who often said to his sidekick, Doctor Watson: “How often have I said to you that when you have eliminated the impossible, whatever remains, however improbable, must be the truth?” (Doyle, 1890, chap. 6) Although this reasoning was not described by Holmes or Watson or Doyle as Bayesian inference, it is. Holmes conceived of a set of possible causes for a crime. Some of the possibilities may have seemed very improbable, a priori. Holmes systematically gathered evidence that ruled out a number of the possible causes. If all possible causes but one were eliminated, then (Bayesian) reasoning forced him to conclude that the remaining possible cause was fully credible, even if it seemed improbable at the start.

Figure 2.1 illustrates Holmes' reasoning. For the purposes of illustration, we suppose that there are just four possible causes of the outcome to be explained. We label the causes A, B, C, and D. The heights of the bars in the graphs indicate the credibility of the candidate causes. (“Credibility” is synonymous with “probability”; here I use the everyday term “credibility” but later in the book, when mathematical formalisms are introduced, I will also use the term “probability.”) Credibility can range from zero to one. If the credibility of a candidate cause is zero, then the cause is definitely not responsible. If the credibility of a candidate cause is one, then the cause definitely is responsible. Because we assume that the candidate causes are mutually exclusive and exhaust all possible causes, the total credibility across causes sums to one.

Figure 2.1 The upper-left graph shows the credibilities of the four possible causes for an outcome. The causes, labeled A, B, C, and D, are mutually exclusive and exhaust all possibilities. The causes happen to be equally credible at the outset; hence all have prior credibility of 0.25. The lower-left graph shows the credibilities when one cause is learned to be impossible. The resulting posterior distribution is used as the prior distribution in the middle column, where another cause is learned to be impossible. The posterior distribution from the middle column is used as the prior distribution for the right column. The remaining possible cause is fully implicated by Bayesian reallocation of credibility.

The upper-left panel of Figure 2.1 shows that the prior credibilities of the four candidate causes are equal, all at 0.25. Unlike the case of the wet sidewalk, in which prior knowledge suggested that rain may be a more likely cause than a newly erupted underground spring, the present illustration assumes equal prior credibilities of the candidate causes. Suppose we make new observations that rule out candidate cause A. For example, if A is a suspect in a crime, we may learn that A was far from the crime scene at the time. Therefore, we must re-allocate credibility to the remaining candidate causes, B through D, as shown in the lower-left panel of Figure 2.1. The re-allocated distribution of credibility is called the posterior distribution because it is what we believe after taking into account the new observations. The posterior distribution gives zero credibility to cause A, and allocates credibilities of 0.33 (i.e., 1/3) to candidate causes B, C, and D.

The posterior distribution then becomes the prior beliefs for subsequent observations. Thus, the prior distribution in the upper-middle of Figure 2.1 is the posterior distribution from the lower left. Suppose now that additional new evidence rules out candidate cause B. We now must re-allocate credibility to the remaining candidate causes, C and D, as shown in the lower-middle panel of Figure 2.1. This posterior distribution becomes the prior distribution for subsequent data collection, as shown in the upper-right panel of Figure 2.1. Finally, if new data rule out candidate cause C, then all credibility must fall on the remaining cause, D, as shown in the lower-right panel of Figure 2.1, just as Holmes declared. This reallocation of credibility is not only intuitive, it is also what the exact mathematics of Bayesian inference prescribe, as will be explained later in the book.

The complementary form of reasoning is also Bayesian, and can be called judicial exoneration. Suppose there are several possible culprits for a crime, and that these suspects are mutually unaffiliated and exhaust all possibilities. If evidence accrues that one suspect is definitely culpable, then the other suspects are exonerated.

This form of exoneration is illustrated in Figure 2.2. The upper panel assumes that there are four possible causes for an outcome, labeled A, B, C, and D. We assume that the causes are mutually exclusive and exhaust all possibilities. In the context of suspects for a crime, the credibility of the hypothesis that suspect A committed the crime is the culpability of the suspect. So it might be easier in this context to think of culpability instead of credibility. The prior culpabilities of the four suspects are, for this illustration, set to be equal, so the four bars in the upper panel of Figure 2.2 are all of height 0.25. Suppose that new evidence firmly implicates suspect D as the culprit. Because the other suspects are known to be unaffiliated, they are exonerated, as shown in the lower panel of Figure 2.2. As in the situation of Holmesian deduction, this exoneration is not only intuitive, it is also what the exact mathematics of Bayesian inference prescribe, as will be explained later in the book.

Figure 2.2 The upper graph shows the credibilities of the four possible causes for an outcome. The causes, labeled A, B, C and D, are mutually exclusive and exhaust all possibilities. The causes happen to be equally credible at the outset, hence all have prior credibility of 0.25. The lower graph shows the credibilities when one cause is learned to be responsible. The nonresponsible causes are “exonerated” (i.e., have zero credibility as causes) by Bayesian reallocation of credibility.

2.1.1 Data are noisy and inferences are probabilistic

The cases of Figures 2.1 and 2.2 assumed that observed data...

Erscheint lt. Verlag	11.11.2014
Sprache	englisch
Themenwelt	Mathematik / Informatik ► Mathematik ► Angewandte Mathematik
Themenwelt	Technik
ISBN-10	0-12-405916-3 / 0124059163
ISBN-13	978-0-12-405916-0 / 9780124059160

Haben Sie eine Frage zum Produkt?

PDF (Adobe DRM)
Größe: 27,2 MB

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

EPUB (Adobe DRM)
Größe: 22,3 MB

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Print-Ausgabe

Buch | Hardcover

CHF 123,90