An Introduction to Bootstrap Methods with Applications to R (eBook)
John Wiley & Sons (Verlag)
978-1-118-62541-5 (ISBN)
Bootstrap methods provide a powerful approach to statistical data analysis, as they have more general applications than standard parametric methods. An Introduction to Bootstrap Methods with Applications to R explores the practicality of this approach and successfully utilizes R to illustrate applications for the bootstrap and other resampling methods. This book provides a modern introduction to bootstrap methods for readers who do not have an extensive background in advanced mathematics. Emphasis throughout is on the use of bootstrap methods as an exploratory tool, including its value in variable selection and other modeling environments.
The authors begin with a description of bootstrap methods and its relationship to other resampling methods, along with an overview of the wide variety of applications of the approach. Subsequent chapters offer coverage of improved confidence set estimation, estimation of error rates in discriminant analysis, and applications to a wide variety of hypothesis testing and estimation problems, including pharmaceutical, genomics, and economics. To inform readers on the limitations of the method, the book also exhibits counterexamples to the consistency of bootstrap methods.
An introduction to R programming provides the needed preparation to work with the numerous exercises and applications presented throughout the book. A related website houses the book's R subroutines, and an extensive listing of references provides resources for further study.
Discussing the topic at a remarkably practical and accessible level, An Introduction to Bootstrap Methods with Applications to R is an excellent book for introductory courses on bootstrap and resampling methods at the upper-undergraduate and graduate levels. It also serves as an insightful reference for practitioners working with data in engineering, medicine, and the social sciences who would like to acquire a basic understanding of bootstrap methods.
MICHAEL R. CHERNICK, PhD, is Manager of Biostatistical Services at Lankenau Institute for Medical Research, where he conducts statistical design and analysis for pharmaceutical research. He has more than thirty years of experience in the application of statistical methods to such areas as medicine, energy, engineering, insurance, and pharmaceuticals. Dr. Chernick is the author of Bootstrap Methods: A Guide for Practitioners and Researchers, Second Edition and The Essentials of Biostatistics for Physicians, Nurses, and Clinicians, and the coauthor of Introductory Biostatistics for the Health Sciences: Modern Applications Including Bootstrap, all published by Wiley.
ROBERT A. LaBUDDE, PhD, is President of Least Cost Formulations, Ltd., a mathematical software development company that specializes in optimization and process control software for manufacturing companies. He has extensive experience in industry and academia and currently serves as Adjunct Associate Professor in the Department of Mathematics and Statistics at Old Dominion University.
MICHAEL R. CHERNICK, PhD, is Manager of Biostatistical Services at Lankenau Institute for Medical Research, where he conducts statistical design and analysis for pharmaceutical research. He has more than thirty years of experience in the application of statistical methods to such areas as medicine, energy, engineering, insurance, and pharmaceuticals. Dr. Chernick is the author of Bootstrap Methods: A Guide for Practitioners and Researchers, Second Edition and The Essentials of Biostatistics for Physicians, Nurses, and Clinicians, and the coauthor of Introductory Biostatistics for the Health Sciences: Modern Applications Including Bootstrap, all published by Wiley. ROBERT A. LaBUDDE, PhD, is President of Least Cost Formulations, Ltd., a mathematical software development company that specializes in optimization and process control software for manufacturing companies. He has extensive experience in industry and academia and currently serves as Adjunct Associate Professor in the Department of Mathematics and Statistics at Old Dominion University.
"I recommend this text to anyone wishing to apply
computationally intensive methods and if you only purchase one book
on bootstrap methods then this could be the book for
you!." (International Statistical Review,
2012)
1
INTRODUCTION
1.1 HISTORICAL BACKGROUND
The “bootstrap” is one of a number of techniques that is now part of the broad umbrella of nonparametric statistics that are commonly called resampling methods. Some of the techniques are far older than the bootstrap. Permutation methods go back to Fisher (1935) and Pitman (1937, 1938), and the jackknife started with Quenouille (1949). Bootstrapping was made practical through the use of the Monte Carlo approximation, but it too goes back to the beginning of computers in the early 1940s.
However, 1979 is a critical year for the bootstrap because that is when Brad Efron’s paper in the Annals of Statistics was published (Efron, 1979). Efron had defined a resampling procedure that he coined as bootstrap. He constructed it as a simple approximation to the jackknife (an earlier resampling method that was developed by John Tukey), and his original motivation was to derive properties of the bootstrap to better understand the jackknife. However, in many situations, the bootstrap is as good as or better than the jackknife as a resampling procedure. The jackknife is primarily useful for small samples, becoming computationally inefficient for larger samples but has become more feasible as computer speed increases. A clear description of the jackknife and its connecton to the bootstrap can be found in the SIAM monograph Efron (1982). A description of the jackknife is also given in Section 1.2.1.
Although permutation tests were known in the 1930s, an impediment to their use was the large number (i.e., n!) of distinct permutations available for samples of size n. Since ordinary bootstrapping involves sampling with replacement n times for a sample of size n, there are nn possible distinct ordered bootstrap samples (though some are equivalent under the exchangeability assumption because they are permutations of each other). So, complete enumeration of all the bootstrap samples becomes infeasible except in very small sample sizes. Random sampling from the set of possible bootstrap samples becomes a viable way to approximate the distribution of bootstrap samples. The same problem exists for permutations and the same remedy is possible. The only difference is that n! does not grow as fast as nn, and complete enumeration of permutations is possible for larger n than for the bootstrap.
The idea of taking several Monte Carlo samples of size n with replacement from the original observations was certainly an important idea expressed by Efron but was clearly known and practiced prior to Efron (1979). Although it may not be the first time it was used, Julian Simon laid claim to priority for the bootstrap based on his use of the Monte Carlo approximation in Simon (1969). But Simon was only recommending the Monte Carlo approach as a way to teach probability and statistics in a more intuitive way that does not require the abstraction of a parametric probability model for the generation of the original sample. After Efron made the bootstrap popular, Simon and Bruce joined the campaign (see Simon and Bruce, 1991, 1995).
Efron, however, starting with Efron (1979), first connected bootstrapping to the jackknife, delta method, cross-validation, and permutation tests. He was the first to show it to be a real competitor to the jackknife and delta method for estimating the standard error of an estimator. Also, quite early on, Efron recognized the broad applicability of bootstrapping for confidence intervals, hypothesis testing, and more complex problems. These ideas were emphasized in Efron and Gong (1983), Diaconis and Efron (1983), Efron and Tibshirani (1986), and the SIAM monograph (Efron 1982). These influential articles along with the SIAM monograph led to a great deal of research during the 1980s and 1990s. The explosion of bootstrap papers grew at an exponential rate. Key probabilistic results appeared in Singh (1981), Bickel and Freedman (1981, 1984), Beran (1982), Martin (1990), Hall (1986, 1988), Hall and Martin (1988), and Navidi (1989).
In a very remarkable paper, Efron (1983) used simulation comparisons to show that the use of bootstrap bias correction could provide better estimates of classification error rate than the very popular cross-validation approach (often called leave-one-out and originally proposed by Lachenbruch and Mickey, 1968. These results applied when the sample size was small, and classification was restricted to two or three classes only, and the predicting features had multivariate Gaussian distributions. Efron compared several variants of the bootstrap with cross-validation and the resubstitution methods. This led to several follow-up articles that widened the applicability and superiority of a version of the bootstrap called 632. See Chatterjee and Chatterjee (1983), Chernick et al. (1985, 1986, 1988a, b), Jain et al. (1987), and Efron and Tibshirani (1997).
Chernick was a graduate student at Stanford in the late 1970s when the bootstrap activity began on the Stanford and Berkeley campuses. However, oddly the bootstrap did not catch on with many graduate students. Even Brad Efron’s graduate students chose other topics for their dissertation. Gail Gong was the first student of Efron to do a dissertation on the bootstrap. She did very useful applied work on using the bootstrap in model building (particularly for logistic regression subset selection). See Gong (1986). After Gail Gong, a number of graduate students wrote dissertations on the bootstrap under Efron, including Terry Therneau, Rob Tibshirani, and Tim Hesterberg. Michael Martin visited Stanford while working on his dissertation on bootstrap confidence intervals under Peter Hall. At Berkeley, William Navidi did his thesis on bootstrapping in regression and econometric models under David Freedman.
While exciting theoretical results developed for the bootstrap in the 1980s and 1990s, there were also negative results where it was shown that the bootstrap estimate is not “consistent” in the probabilistic sense (i.e., approaches the true parameter value as the sample size becomes infinite). Examples included the mean when the population distribution does not have a finite variance and when the maximum or minimum is taken from a sample. This is illustrated in Athreya (1987a, b), Knight (1989). Angus (1993), and Hall et al. (1993). The first published example of an inconsistent bootstrap estimate appeared in Bickel and Freedman (1981). Shao et al. (2000) showed that a particular approach to bootstrap estimation of individual bioequivalence is also inconsistent. They also provide a modification that is consistent. Generally, the bootstrap is consistent when the central limit theorem applies (a sufficient condition is Lyapanov’s condition that requires existence of the 2 + δ moment of the population distribution). Consistency results in the literature are based on the existence of Edgeworth expansions; so, additional smoothness conditions for the expansion to exist have also been assumed (but it is not known whether or not they are necessary).
One extension of the bootstrap called m-out-of-n was suggested by Bickel and Ren (1996) in light of previous research on it, and it has been shown to be a method to overcome inconsistency of the bootstrap in several instances. In the m-out-of-n bootstrap, sampling is with replacement from the original sample but with a value of m that is smaller than n. See Bickel et al. (1997), Gine and Zinn (1989), Arcones and Gine (1989), Fukuchi (1994), and Politis et al. (1999).
Some bootstrap approaches in time series have been shown to be inconsistent. Lahiri (2003) covered the use of bootstrap in time series and other dependent cases. He showed that there are remedies for the m-dependent and moving block bootstrap cases (see Section 5.5 for some coverage of moving block bootstrap) that are consistent.
1.2 DEFINITION AND RELATIONSHIP TO THE DELTA METHOD AND OTHER RESAMPLING METHODS
We will first provide an informal definition of bootstrap to provide intuition and understanding before a more formal mathematical definition. The objective of bootstrapping is to estimate a parameter based on the data, such as a mean, median, or standard deviation. We are also interested in the properties of the distribution for the parameter’s estimate and may want to construct confidence intervals. But we do not want to make overly restrictive assumptions about the form of the distribution that the observed data came from.
For the simple case of independent observations coming from the same population distribution, the basic element for bootstrapping is the empirical distribution. The empirical distribution is just the discrete distribution that gives equal weight to each data point (i.e., it assigns probability 1/n to each of the original n observations and shall be denoted Fn).
Most of the common parameters that we consider are functionals of the unknown population distribution. A functional is simply a mapping that takes a function F into a real number. In our case, we are only interested in the functionals of cumulative probability distribution functions. So, for example, the mean and variance of a distribution can be represented as functionals in the following way. Let μ be the mean for a distribution function F, then μ = ∫ xdF (x) Let σ2 be the variance then σ2 = ∫(x – μ)2 dF (x). These integrals over the entire possible set of x values in the domain of...
| Erscheint lt. Verlag | 21.8.2014 |
|---|---|
| Sprache | englisch |
| Themenwelt | Mathematik / Informatik ► Mathematik ► Statistik |
| Mathematik / Informatik ► Mathematik ► Wahrscheinlichkeit / Kombinatorik | |
| Technik ► Elektrotechnik / Energietechnik | |
| Schlagworte | Angewandte Wahrscheinlichkeitsrechnung u. Statistik • Applications • Applied Probability & Statistics • Approach • Bootstrap • Comprehensive • Data • Electrical & Electronics Engineering • Elektrotechnik u. Elektronik • Finanz- u. Wirtschaftsstatistik • General • Introduction • Methods • Numerical Methods & Algorithms • Numerische Methoden u. Algorithmen • Parametric • Powerful • practicality • programming environment • Standard • Statistical • Statistics • Statistics for Finance, Business & Economics • Statistik • Stochastik • utilizes |
| ISBN-10 | 1-118-62541-2 / 1118625412 |
| ISBN-13 | 978-1-118-62541-5 / 9781118625415 |
| Informationen gemäß Produktsicherheitsverordnung (GPSR) | |
| Haben Sie eine Frage zum Produkt? |
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich