Mathematical Statistics with Resampling and R - Laura M. Chihara, Tim C. Hesterberg

Blick ins Buch

Mathematical Statistics with Resampling and R (eBook)

Laura M. Chihara, Tim C. Hesterberg (Autoren)

eBook Download: EPUB

2022 | 3. Auflage
576 Seiten
Wiley (Verlag)
978-1-119-87404-1 (ISBN)

Lese- und Medienproben

Ebook-Leseprobe (EPUB)

Mathematical Statistics with Resampling and R

This thoroughly updated third edition combines the latest software applications with the benefits of modern resampling techniques

Resampling helps students understand the meaning of sampling distributions, sampling variability, P-values, hypothesis tests, and confidence intervals. The third edition of Mathematical Statistics with Resampling and R combines modern resampling techniques and mathematical statistics. This book is classroom-tested to ensure an accessible presentation, and uses the powerful and flexible computer language R for data analysis.

This book introduces permutation tests and bootstrap methods to motivate classical inference methods, as well as to be utilized as useful tools in their own right when classical methods are inaccurate or unavailable. The book strikes a balance between simulation, computing, theory, data, and applications.

Throughout the book, new and updated case studies representing a diverse range of subjects, such as flight delays, birth weights of babies, U.S. demographics, views on sociological issues, and problems at Google and Instacart, illustrate the relevance of mathematical statistics to real-world applications.

Changes and additions to the third edition include:

New and updated case studies that incorporate contemporary subjects like COVID-19
Several new sections, including introductory material on causal models and regression methods for causal modeling in practice
Modern terminology distinguishing statistical discernibility and practical importance
New exercises and examples, data sets, and R code, using dplyr and ggplot2
A complete instructor's solutions manual
A new github site that contains code, data sets, additional topics, and instructor resources

Mathematical Statistics with Resampling and R is an ideal textbook for undergraduate and graduate students in mathematical statistics courses, as well as practitioners and researchers looking to expand their toolkit of resampling and classical techniques.

Laura M. Chihara, PhD, is Professor of Mathematics at Carleton College with extensive experience teaching mathematical statistics and applied regression analysis. Dr. Chihara has experience with S+ and R from her work at Insightful Corporation (formerly MathSoft) and in statistical consulting.

Tim C. Hesterberg, PhD, is a Staff Data Scientist at Instacart. He was previously a data scientist at Google and research scientist at Insightful Corporation, led the development of S+Resample, and wrote the R resample package.

Laura M. Chihara, PhD, is Professor of Mathematics at Carleton College with extensive experience teaching mathematical statistics and applied regression analysis. Dr. Chihara has experience with S+ and R from her work at Insightful Corporation (formerly MathSoft) and in statistical consulting. Tim C. Hesterberg, PhD, is a Staff Data Scientist at Instacart. He was previously a data scientist at Google and research scientist at Insightful Corporation, led the development of S+Resample, and wrote the R resample package.

Chapter 1 - Data and Case Studies

Chapter 2 - Exploratory Data Analysis

Chapter 3 - Introduction to Hypothesis Testing: Permutation Tests

Chapter 4 - Sampling Distributions

Chapter 5 - Introduction to Confidence Intervals: The Bootstrap

Chapter 6 - Estimation

Chapter 7 - More Confidence Intervals

Chapter 8 - More Hypothesis Testing

Chapter 9 - Regression

Chapter 10 - Categorical Data

Chapter 11 - Bayesian Methods

Chapter 12 - One-Way ANOVA

Chapter 13 - Additional Topics

1
Data and Case Studies

Statistics is the art and science of collecting and analyzing data and understanding the nature of variability. Mathematics, especially probability, governs the underlying theory, but statistics is driven by applications to real problems.

In this chapter, we introduce several data sets that we will encounter throughout the text in the examples and exercises. These data sets are available in the R package resampledata3 or at the textbook website https://github.com/lchihara/MathStatsResamplingR.

1.1 Case Study: Flight Delays

If you have ever traveled by air, you probably have experienced the frustration of flight delays. The Bureau of Transportation Statistics maintains data on all aspects of air travel, including flight delays at departure and arrival.1

LaGuardia Airport (LGA) is one of three major airports that serves the New York City metropolitan area. In 2008, over 23 million passengers and over 375 000 planes flew in or out of LGA. United Airlines and American Airlines are two major airlines that schedule services at LGA. The data set FlightDelays contains information on all 4029 departures of these two airlines from LGA during May and June 2009 (Tables 1.1 and 1.2).

Table 1.1 Partial view of FlightDelays data.

Flight	Carrier	FlightNo	Destination	DepartTime	Day

403

DEN

4–8 a.m.

Friday

405

DEN

8–noon

Friday

409

DEN

4–8 p.m.

Friday

511

ORD

8–noon

Friday

Table 1.2 Variables in data set FlightDelays.

Variable	Description

Carrier

UA = United Airlines, AA = American Airlines

FlightNo

Flight number

Destination

Airport code

DepartTime

Scheduled departure time in 4 h intervals

Day	Day of week

Month

May or June

Delay

Minutes flight delayed (negative indicates early departure)

Delayed30

Departure delayed more than 30 min?

FlightLength

Length of time of flight (minutes)

Each row of the data set is an observation. Each column represents a variable – some characteristic that is obtained for each observation. For instance, on the first observation listed, the flight was a United Airlines plane, flight number 403, destined for Denver, and departing on Friday between 4 and 8 a.m. This data set consists of 4029 observations and 9 variables.

Questions we might ask include the following: Are flight delay times different between the two airlines? Are flight delay times different depending on the day of the week? Are flights scheduled in the morning less likely to be delayed by more than 15 min?

1.2 Case Study: Birth Weights of Babies

The birth weight of a baby is of interest to health officials since many studies have shown possible links between this weight and conditions in later life, such as obesity or diabetes. Researchers look for possible relationships between the birth weight of a baby and the age of the mother or whether or not she smoked cigarettes or drank alcohol during her pregnancy. The Centers for Disease Control and Prevention (CDC) maintains a database on all babies born in a given year,2 incorporating data provided by the US Department of Health and Human Services, the National Center for Health Statistics, and the Division of Vital Statistics. We will investigate different samples taken from the CDC's database of births.

One data set that we will investigate consists of a random sample of 1009 babies born in North Carolina during 2004 (Table 1.3). The babies in the sample had a gestation period of at least 37 weeks and were single births (i.e. not a twin or triplet).

Table 1.3 Variables in data set NCBirths2004.

Variable	Description

MothersAge

Mother's age

Smoker

Mother smoker or non‐smoker

Gender

Gender of baby

Weight

Weight at birth (grams)

Gestation

Gestation time (weeks)

In addition, we will also investigate a data set, Girls2004, consisting of a random sample of 40 baby girls born in Alaska and 40 baby girls born in Wyoming. These babies also had a gestation period of at least 37 weeks and were single births.

The data set TXBirths2004 contains a random sample of 1587 babies born in Texas in 2004. In this case, the sample was not restricted to single births, nor to a gestation period of at least 37 weeks. The numeric variable Number indicates whether the baby was a single birth, or one of a twin, triplet, and so on. The variable Multiple is a factor variable indicating whether or not the baby was a multiple birth.

1.3 Case Study: Verizon Repair Times

Verizon is the primary local telephone company (incumbent local exchange carrier (ILEC)) for a large area of the Eastern United States. As such, it is responsible for providing repair service for the customers of other telephone companies known as competing local exchange carriers (CLECs) in this region. Verizon is subject to fines if the repair times (the time it takes to fix a problem) for CLEC customers are substantially worse than those for Verizon customers.

The data set Verizon contains a sample of repair times for 1664 ILEC and 23 CLEC customers (Table 1.4). The mean repair times are 8.4 h for ILEC customers and 16.5 h for CLEC customers. Could a difference this large be easily explained by chance?

Table 1.4 Variables in data set Verizon.

Variable	Description

Time	Repair times (in hours)

Group

ILEC or CLEC

1.4 Case Study: Iowa Recidivism

When a person is released from prison, will he or she relapse into criminal behavior and be sent back? The state of Iowa tracks offenders over a 3‐year period, and records the number of days until recidivism for those who are readmitted to prison. The Department of Corrections uses this recidivism data to determine whether or not their strategies for preventing offenders from relapsing into criminal behavior are effective.

The data set Recidivism contains all offenders convicted of either a misdemeanor or felony who were released from an Iowa prison during the 2010 fiscal year (ending in June) (Table 1.5). There were 17 022 people released in that period, of whom 5386 were sent back to prison in the following 3 years (through the end of the 2013 fiscal year).3

Table 1.5 Variables in data set Iowa Recidivism.

Variable	Description

Gender

F, M

Age	Age at release: Under 25, 25–34, 35–44, 45–54, 55 and Older

Age25

Under 25, Over 25 (binary)

Offense

Original conviction: Felony or Misdemeanor

Recid

Recidivate? No, Yes

Type	New (crime), No Recidivism, Tech (technical violation,

such as a parole violation)

Days	Number of days to recidivism; NA if no recidivism

The recidivism rate for those under the age of 25 years was 36.5% compared to 30.6% for those 25 years or older. Does this indicate a real difference in the behavior of those in these age groups, or could this be explained by chance variability?

1.5 Sampling

In analyzing data, we need to determine whether the data represent a population or a sample. A population represents all the individual cases, whether they are babies, fish, cars, or coin flips. The data from the flight delays case study in Section 1.1 are all the flight departures of United Airlines and...

Erscheint lt. Verlag	9.8.2022
Sprache	englisch
Themenwelt	Mathematik / Informatik ► Mathematik ► Statistik
Themenwelt	Mathematik / Informatik ► Mathematik ► Wahrscheinlichkeit / Kombinatorik
Schlagworte	Mathematische Statistik • Probability & Mathematical Statistics • R (Programm) • Statistical Software / R • Statistics • Statistics - Text & Reference • Statistik • Statistik / Lehr- u. Nachschlagewerke • Statistiksoftware / R • Wahrscheinlichkeitsrechnung • Wahrscheinlichkeitsrechnung u. mathematische Statistik
ISBN-10	1-119-87404-1 / 1119874041
ISBN-13	978-1-119-87404-1 / 9781119874041

Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?

EPUB (Adobe DRM)

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.