Zum Hauptinhalt springen
Nicht aus der Schweiz? Besuchen Sie lehmanns.de

Regression Analysis By Example Using R (eBook)

eBook Download: EPUB
2023 | 6. Auflage
John Wiley & Sons (Verlag)
978-1-119-83089-4 (ISBN)

Lese- und Medienproben

Regression Analysis By Example Using R - Ali S. Hadi, Samprit Chatterjee
Systemvoraussetzungen
125,99 inkl. MwSt
(CHF 122,95)
Der eBook-Verkauf erfolgt durch die Lehmanns Media GmbH (Berlin) zum Preis in Euro inkl. MwSt.
  • Download sofort lieferbar
  • Zahlungsarten anzeigen
Regression Analysis By Example Using R

A STRAIGHTFORWARD AND CONCISE DISCUSSION OF THE ESSENTIALS OF REGRESSION ANALYSIS

In the newly revised sixth edition of Regression Analysis By Example Using R, distinguished statistician Dr Ali S. Hadi delivers an expanded and thoroughly updated discussion of exploratory data analysis using regression analysis in R. The book provides in-depth treatments of regression diagnostics, transformation, multicollinearity, logistic regression, and robust regression.

The author clearly demonstrates effective methods of regression analysis with examples that contain the types of data irregularities commonly encountered in the real world. This newest edition also offers a brand-new, easy to read chapter on the freely available statistical software package R.

Readers will also find:

  • Reorganized, expanded, and upgraded exercises at the end of each chapter with an emphasis on data analysis
  • Updated data sets and examples throughout the book
  • Complimentary access to a companion website that provides data sets in xlsx, csv, and txt format

Perfect for upper-level undergraduate or beginning graduate students in statistics, mathematics, biostatistics, and computer science programs, Regression Analysis By Example Using R will also benefit readers who need a reference for quick updates on regression methods and applications.

Ali S. Hadi, PhD, Fellow ASA (1997), Member ISI (1998), Fellow AAS (2019) is Distinguished University Professor and former Chair of the Department of Mathematics and Actuarial Science at the American University in Cairo (AUC). He is also the Founder of the Actuarial Science Program at AUC (2004), the Founder of the Data Science Program at AUC (2019), and the former Vice Provost and Director of Graduate Studies and Research at AUC. Dr. Hadi is also a Stephen H. Weiss Presidential Fellow and Professor Emeritus at Cornell University, USA. He is the author and co-author of four other books and numerous articles. For more info, see his Website at: www1.aucegypt.edu/faculty/hadi.

Ali S. Hadi, PhD, Fellow ASA (1997), Member ISI (1998), Fellow AAS (2019) is Distinguished University Professor and former Chair of the Department of Mathematics and Actuarial Science at the American University in Cairo (AUC). He is also the Founder of the Actuarial Science Program at AUC (2004), the Founder of the Data Science Program at AUC (2019), and the former Vice Provost and Director of Graduate Studies and Research at AUC. Dr. Hadi is also a Stephen H. Weiss Presidential Fellow and Professor Emeritus at Cornell University, USA. He is the author and co-author of four other books and numerous articles. For more info, see his Website at: www1.aucegypt.edu/faculty/hadi.

CHAPTER 1
INTRODUCTION


1.1 WHAT IS REGRESSION ANALYSIS?


Regression analysis is a conceptually simple method for investigating functional relationships among variables. A real estate appraiser may wish to relate the sale price of a home from selected physical characteristics of the building and taxes (local, school, county) paid on the building. We may wish to examine whether cigarette consumption is related to various socioeconomic and demographic variables such as age, education, income, and price of cigarettes. The relationship is expressed in the form of an equation or a model connecting the response or dependent variable and one or more explanatory or predictor variables. In the cigarette consumption example, the response variable is cigarette consumption (measured by the number of packs of cigarette sold in a given state on a per capita basis during a given year) and the explanatory or predictor variables are the various socioeconomic and demographic variables. In the real estate appraisal example, the response variable is the price of a home and the explanatory or predictor variables are the characteristics of the building and taxes paid on the building.

We denote the response variable by and the set of predictor variables by , , where denotes the number of predictor variables. The true relationship between and can be approximated by the regression model

where is assumed to be a random error representing the discrepancy in the approximation. It accounts for the failure of the model to fit the data exactly. The function describes the relationship between and , , , . An example is the linear regression model

(1.2)

where , called the regression parameters or coefficients, are unknown constants to be determined (estimated) from the data. We follow the commonly used notational convention of denoting unknown parameters by Greek letters.

The predictor or explanatory variables are also called by other names such as independent variables, covariates, regressors, factors, and carriers. The name independent variable, though commonly used, is the least preferred, because in practice the predictor variables are rarely independent of each other.

1.2 PUBLICLY AVAILABLE DATA SETS


Regression analysis has numerous areas of applications. A partial list would include economics, finance, business, law, meteorology, medicine, biology, chemistry, engineering, physics, education, sports, history, sociology, and psychology. A few examples of such applications are given in Section 1.3. Regression analysis is learned most effectively by analyzing data that are of direct interest to the reader. We invite the readers to think about questions (in their own areas of work, research, or interest) that can be addressed using regression analysis. Readers should collect the relevant data and then apply the regression analysis techniques presented in this book to their own data. To help the reader locate real-life data, this section provides some sources and links to a wealth of data sets that are available for public use.

A number of data sets are available in books and on the Internet. The book by Hand et al. (1994) contains data sets from many fields. These data sets are small in size and are suitable for use as exercises. The book by Chatterjee et al. (1995) provides numerous data sets from diverse fields. The data are included in a diskette that comes with the book and can also be found at the Website.1

Data sets are also available on the Internet at many other sites. Some of the Websites given below allow the direct copying and pasting into the statistical package of choice, while others require downloading the data file and then importing them into a statistical package. Some of these sites also contain further links to yet other data sets or statistics-related Websites.

The Data and Story Library (DASL, pronounced “dazzle”) is one of the most interesting sites that contains a number of data sets accompanied by the “story” or background associated with each data set. DASL is an online library2 of data files and stories that illustrate the use of basic statistical methods. The data sets cover a wide variety of topics. DASL comes with a powerful search engine to locate the story or data file of interest.

Another Website, which also contains data sets arranged by the method used in the analysis, is the Electronic Dataset Service.3 The site also contains many links to other data sources on the Internet.

Finally, this book has a Website,4 which contains, among other things, all the data sets that are included in this book and more. These and other data sets can be found at the Book's Website.

1.3 SELECTED APPLICATIONS OF REGRESSION ANALYSIS


Regression analysis is one of the most widely used statistical tools because it provides simple methods for establishing a functional relationship among variables. It has extensive applications in many subject areas. The cigarette consumption and the real estate appraisal, mentioned above, are but two examples. In this section, we give a few additional examples demonstrating the wide applicability of regression analysis in real-life situations. Some of the data sets described here will be used later in the book to illustrate regression techniques or in the exercises at the end of various chapters.

1.3.1 Agricultural Sciences


The Dairy Herd Improvement Cooperative (DHI) in upstate New York collects and analyzes data on milk production. One question of interest here is how to develop a suitable model to predict current milk production from a set of measured variables. The response variable (current milk production in pounds) and the predictor variables are given in Table 1.1. Samples are taken once a month during milking. The period that a cow gives milk is called lactation. Number of lactations is the number of times a cow has calved or given milk. The recommended management practice is to have the cow produce milk for about 305 days and then allow a 60-day rest period before beginning the next lactation. The data set, consisting of 199 observations, was compiled from the DHI milk production records. The Milk Production data can be found at the Book's Website.

Table 1.1 Variables in Milk Production Data

Variable Definition
Current Current month milk production in pounds
Previous Previous month milk production in pounds
Fat Percent of fat in milk
Protein Percent of protein in milk
Days Number of days since present lactation
Lactation Number of lactations
I79 Indicator variable (0 if Days and 1 if Days )

1.3.2 Industrial and Labor Relations


In 1947, the United States Congress passed the Taft–Hartley Amendments to the Wagner Act. The original Wagner Act had permitted the unions to use a Closed Shop Contract5 unless prohibited by state law. The Taft–Hartley Amendments made the use of Closed Shop Contract illegal and gave individual states the right to prohibit union shops6 as well. These right-to-work laws have caused a wave of concern throughout the labor movement. A question of interest here is: What are the effects of these laws on the cost of living for a four-person family living on an intermediate budget in the United States? To answer this question a data set consisting of 38 geographic locations has been assembled from various sources. The variables used are defined in Table 1.2. The Right-To-Work Laws data can be found at the Book's Website.

Table 1.2 Variables in Right-To-Work Laws Data

Variable Definition
COL Cost of living for a four-person family
PD Population density (person per square mile)
URate State unionization rate in 1978
Pop Population in 1975
Taxes Property taxes in 1972
Income Per capita income in 1974
RTWL Indicator variable (1 if there are right-to-work laws in the state and 0 otherwise)

1.3.3 Government


Information about domestic immigration (the movement of people from one state or area of a country to another) is important to state and local governments. It is of interest to build a model that predicts domestic immigration or to answer the question of why do people leave one place to go to another? There are many factors that influence domestic immigration, such as weather conditions, crime, taxes, and unemployment rates. A data set for the 48 contiguous states has been created. Alaska and Hawaii are excluded from the analysis because the environments of these states are significantly different from the other 48, and their locations present certain barriers...

Erscheint lt. Verlag 11.10.2023
Reihe/Serie Wiley Series in Probability and Statistics
Wiley Series in Probability and Statistics
Wiley Series in Probability and Statistics
Sprache englisch
Themenwelt Mathematik / Informatik Mathematik Statistik
Mathematik / Informatik Mathematik Wahrscheinlichkeit / Kombinatorik
Schlagworte Angew. Wahrscheinlichkeitsrechn. u. Statistik / Modelle • Applied Probability & Statistics - Models • Data Analysis • data analysis with regression • Datenanalyse • Exploratory data analysis • linear regression • Logistic Regression • Multicollinearity • multiple regression • R • Regression Analysis • regression diagnostics • regression methods • Regressionsanalyse • regression transformation • robust regression • R (Programm) • Statistics • Statistik
ISBN-10 1-119-83089-3 / 1119830893
ISBN-13 978-1-119-83089-4 / 9781119830894
Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?
EPUBEPUB (Adobe DRM)

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belle­tristik und Sach­büchern. Der Fließ­text wird dynamisch an die Display- und Schrift­größe ange­passt. Auch für mobile Lese­geräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich