Mathematical Statistics with Resampling and R (eBook)
576 Seiten
Wiley (Verlag)
978-1-119-87404-1 (ISBN)
This thoroughly updated third edition combines the latest software applications with the benefits of modern resampling techniques
Resampling helps students understand the meaning of sampling distributions, sampling variability, P-values, hypothesis tests, and confidence intervals. The third edition of Mathematical Statistics with Resampling and R combines modern resampling techniques and mathematical statistics. This book is classroom-tested to ensure an accessible presentation, and uses the powerful and flexible computer language R for data analysis.
This book introduces permutation tests and bootstrap methods to motivate classical inference methods, as well as to be utilized as useful tools in their own right when classical methods are inaccurate or unavailable. The book strikes a balance between simulation, computing, theory, data, and applications.
Throughout the book, new and updated case studies representing a diverse range of subjects, such as flight delays, birth weights of babies, U.S. demographics, views on sociological issues, and problems at Google and Instacart, illustrate the relevance of mathematical statistics to real-world applications.
Changes and additions to the third edition include:
- New and updated case studies that incorporate contemporary subjects like COVID-19
- Several new sections, including introductory material on causal models and regression methods for causal modeling in practice
- Modern terminology distinguishing statistical discernibility and practical importance
- New exercises and examples, data sets, and R code, using dplyr and ggplot2
- A complete instructor's solutions manual
- A new github site that contains code, data sets, additional topics, and instructor resources
Mathematical Statistics with Resampling and R is an ideal textbook for undergraduate and graduate students in mathematical statistics courses, as well as practitioners and researchers looking to expand their toolkit of resampling and classical techniques.
Laura M. Chihara, PhD, is Professor of Mathematics at Carleton College with extensive experience teaching mathematical statistics and applied regression analysis. Dr. Chihara has experience with S+ and R from her work at Insightful Corporation (formerly MathSoft) and in statistical consulting.
Tim C. Hesterberg, PhD, is a Staff Data Scientist at Instacart. He was previously a data scientist at Google and research scientist at Insightful Corporation, led the development of S+Resample, and wrote the R resample package.
Laura M. Chihara, PhD, is Professor of Mathematics at Carleton College with extensive experience teaching mathematical statistics and applied regression analysis. Dr. Chihara has experience with S+ and R from her work at Insightful Corporation (formerly MathSoft) and in statistical consulting. Tim C. Hesterberg, PhD, is a Staff Data Scientist at Instacart. He was previously a data scientist at Google and research scientist at Insightful Corporation, led the development of S+Resample, and wrote the R resample package.
Chapter 1 - Data and Case Studies
Chapter 2 - Exploratory Data Analysis
Chapter 3 - Introduction to Hypothesis Testing: Permutation Tests
Chapter 4 - Sampling Distributions
Chapter 5 - Introduction to Confidence Intervals: The Bootstrap
Chapter 6 - Estimation
Chapter 7 - More Confidence Intervals
Chapter 8 - More Hypothesis Testing
Chapter 9 - Regression
Chapter 10 - Categorical Data
Chapter 11 - Bayesian Methods
Chapter 12 - One-Way ANOVA
Chapter 13 - Additional Topics
1
Data and Case Studies
Statistics is the art and science of collecting and analyzing data and understanding the nature of variability. Mathematics, especially probability, governs the underlying theory, but statistics is driven by applications to real problems.
In this chapter, we introduce several data sets that we will encounter throughout the text in the examples and exercises. These data sets are available in the R package resampledata3 or at the textbook website https://github.com/lchihara/MathStatsResamplingR.
1.1 Case Study: Flight Delays
If you have ever traveled by air, you probably have experienced the frustration of flight delays. The Bureau of Transportation Statistics maintains data on all aspects of air travel, including flight delays at departure and arrival.1
LaGuardia Airport (LGA) is one of three major airports that serves the New York City metropolitan area. In 2008, over 23 million passengers and over 375 000 planes flew in or out of LGA. United Airlines and American Airlines are two major airlines that schedule services at LGA. The data set FlightDelays contains information on all 4029 departures of these two airlines from LGA during May and June 2009 (Tables 1.1 and 1.2).
Table 1.1 Partial view of FlightDelays data.
| Flight | Carrier | FlightNo | Destination | DepartTime | Day |
|---|
| 1 | UA | 403 | DEN | 4–8 a.m. | Friday |
| 2 | UA | 405 | DEN | 8–noon | Friday |
| 3 | UA | 409 | DEN | 4–8 p.m. | Friday |
| 4 | UA | 511 | ORD | 8–noon | Friday |
Table 1.2 Variables in data set FlightDelays.
| Variable | Description |
|---|
| Carrier | UA = United Airlines, AA = American Airlines |
| FlightNo | Flight number |
| Destination | Airport code |
| DepartTime | Scheduled departure time in 4 h intervals |
| Day | Day of week |
| Month | May or June |
| Delay | Minutes flight delayed (negative indicates early departure) |
| Delayed30 | Departure delayed more than 30 min? |
| FlightLength | Length of time of flight (minutes) |
Each row of the data set is an observation. Each column represents a variable – some characteristic that is obtained for each observation. For instance, on the first observation listed, the flight was a United Airlines plane, flight number 403, destined for Denver, and departing on Friday between 4 and 8 a.m. This data set consists of 4029 observations and 9 variables.
Questions we might ask include the following: Are flight delay times different between the two airlines? Are flight delay times different depending on the day of the week? Are flights scheduled in the morning less likely to be delayed by more than 15 min?
1.2 Case Study: Birth Weights of Babies
The birth weight of a baby is of interest to health officials since many studies have shown possible links between this weight and conditions in later life, such as obesity or diabetes. Researchers look for possible relationships between the birth weight of a baby and the age of the mother or whether or not she smoked cigarettes or drank alcohol during her pregnancy. The Centers for Disease Control and Prevention (CDC) maintains a database on all babies born in a given year,2 incorporating data provided by the US Department of Health and Human Services, the National Center for Health Statistics, and the Division of Vital Statistics. We will investigate different samples taken from the CDC's database of births.
One data set that we will investigate consists of a random sample of 1009 babies born in North Carolina during 2004 (Table 1.3). The babies in the sample had a gestation period of at least 37 weeks and were single births (i.e. not a twin or triplet).
Table 1.3 Variables in data set NCBirths2004.
| Variable | Description |
|---|
| MothersAge | Mother's age |
| Smoker | Mother smoker or non‐smoker |
| Gender | Gender of baby |
| Weight | Weight at birth (grams) |
| Gestation | Gestation time (weeks) |
In addition, we will also investigate a data set, Girls2004, consisting of a random sample of 40 baby girls born in Alaska and 40 baby girls born in Wyoming. These babies also had a gestation period of at least 37 weeks and were single births.
The data set TXBirths2004 contains a random sample of 1587 babies born in Texas in 2004. In this case, the sample was not restricted to single births, nor to a gestation period of at least 37 weeks. The numeric variable Number indicates whether the baby was a single birth, or one of a twin, triplet, and so on. The variable Multiple is a factor variable indicating whether or not the baby was a multiple birth.
1.3 Case Study: Verizon Repair Times
Verizon is the primary local telephone company (incumbent local exchange carrier (ILEC)) for a large area of the Eastern United States. As such, it is responsible for providing repair service for the customers of other telephone companies known as competing local exchange carriers (CLECs) in this region. Verizon is subject to fines if the repair times (the time it takes to fix a problem) for CLEC customers are substantially worse than those for Verizon customers.
The data set Verizon contains a sample of repair times for 1664 ILEC and 23 CLEC customers (Table 1.4). The mean repair times are 8.4 h for ILEC customers and 16.5 h for CLEC customers. Could a difference this large be easily explained by chance?
Table 1.4 Variables in data set Verizon.
| Variable | Description |
|---|
| Time | Repair times (in hours) |
| Group | ILEC or CLEC |
1.4 Case Study: Iowa Recidivism
When a person is released from prison, will he or she relapse into criminal behavior and be sent back? The state of Iowa tracks offenders over a 3‐year period, and records the number of days until recidivism for those who are readmitted to prison. The Department of Corrections uses this recidivism data to determine whether or not their strategies for preventing offenders from relapsing into criminal behavior are effective.
The data set Recidivism contains all offenders convicted of either a misdemeanor or felony who were released from an Iowa prison during the 2010 fiscal year (ending in June) (Table 1.5). There were 17 022 people released in that period, of whom 5386 were sent back to prison in the following 3 years (through the end of the 2013 fiscal year).3
Table 1.5 Variables in data set Iowa Recidivism.
| Variable | Description |
|---|
| Gender | F, M |
| Age | Age at release: Under 25, 25–34, 35–44, 45–54, 55 and Older |
| Age25 | Under 25, Over 25 (binary) |
| Offense | Original conviction: Felony or Misdemeanor |
| Recid | Recidivate? No, Yes |
| Type | New (crime), No Recidivism, Tech (technical violation, |
| such as a parole violation) |
| Days | Number of days to recidivism; NA if no recidivism |
The recidivism rate for those under the age of 25 years was 36.5% compared to 30.6% for those 25 years or older. Does this indicate a real difference in the behavior of those in these age groups, or could this be explained by chance variability?
1.5 Sampling
In analyzing data, we need to determine whether the data represent a population or a sample. A population represents all the individual cases, whether they are babies, fish, cars, or coin flips. The data from the flight delays case study in Section 1.1 are all the flight departures of United Airlines and...
| Erscheint lt. Verlag | 9.8.2022 |
|---|---|
| Sprache | englisch |
| Themenwelt | Mathematik / Informatik ► Mathematik ► Statistik |
| Mathematik / Informatik ► Mathematik ► Wahrscheinlichkeit / Kombinatorik | |
| Schlagworte | Mathematische Statistik • Probability & Mathematical Statistics • R (Programm) • Statistical Software / R • Statistics • Statistics - Text & Reference • Statistik • Statistik / Lehr- u. Nachschlagewerke • Statistiksoftware / R • Wahrscheinlichkeitsrechnung • Wahrscheinlichkeitsrechnung u. mathematische Statistik |
| ISBN-10 | 1-119-87404-1 / 1119874041 |
| ISBN-13 | 978-1-119-87404-1 / 9781119874041 |
| Informationen gemäß Produktsicherheitsverordnung (GPSR) | |
| Haben Sie eine Frage zum Produkt? |
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich