Zum Hauptinhalt springen
Nicht aus der Schweiz? Besuchen Sie lehmanns.de

Applied Regression Analysis (eBook)

eBook Download: EPUB
2014 | 3. Auflage
736 Seiten
Wiley-Interscience (Verlag)
978-1-118-62568-2 (ISBN)

Lese- und Medienproben

Applied Regression Analysis -  Norman R. Draper,  Harry Smith
Systemvoraussetzungen
180,99 inkl. MwSt
(CHF 176,80)
Der eBook-Verkauf erfolgt durch die Lehmanns Media GmbH (Berlin) zum Preis in Euro inkl. MwSt.
  • Download sofort lieferbar
  • Zahlungsarten anzeigen
An outstanding introduction to the fundamentals of regression analysis-updated and expanded The methods of regression analysis are the most widely used statistical tools for discovering the relationships among variables. This classic text, with its emphasis on clear, thorough presentation of concepts and applications, offers a complete, easily accessible introduction to the fundamentals of regression analysis. Assuming only a basic knowledge of elementary statistics, Applied Regression Analysis, Third Edition focuses on the fitting and checking of both linear and nonlinear regression models, using small and large data sets, with pocket calculators or computers. This Third Edition features separate chapters on multicollinearity, generalized linear models, mixture ingredients, geometry of regression, robust regression, and resampling procedures. Extensive support materials include sets of carefully designed exercises with full or partial solutions and a series of true/false questions with answers. All data sets used in both the text and the exercises can be found on the companion disk at the back of the book. For analysts, researchers, and students in university, industrial, and government courses on regression, this text is an excellent introduction to the subject and an efficient means of learning how to use a valuable analytical tool. It will also prove an invaluable reference resource for applied scientists and statisticians.

NORMAN R. DRAPER teaches in the Department of Statistics at the University of Wisconsin. HARRY SMITH is a former faculty member of the Mt. Sinai School of Medicine.

NORMAN R. DRAPER teaches in the Department of Statistics at the University of Wisconsin. HARRY SMITH is a former faculty member of the Mt. Sinai School of Medicine.

Basic Prerequisite Knowledge.

Fitting a Straight Line by Least Squares.

Checking the Straight Line Fit.

Fitting Straight Lines: Special Topics.

Regression in Matrix Terms: Straight Line Case.

The General Regression Situation.

Extra Sums of Squares and Tests for Several Parameters Being Zero.

Serial Correlation in the Residuals and the Durbin-Watson Test.

More of Checking Fitted Models.

Multiple Regression: Special Topics.

Bias in Regression Estimates, and Expected Values of Mean Squares and Sums of Squares.

On Worthwhile Regressions, Big F's, and R?2.

Models Containing Functions of the Predictors, Including Polynomial Models.

Transformation of the Response Variable.

"Dummy" Variables.

Selecting the "Best" Regression Equation.

Ill-Conditioning in Regression Data.

Ridge Regression.

Generalized Linear Models (GLIM).

Mixture Ingredients as Predictor Variables.

The Geometry of Least Squares.

More Geometry of Least Squares.

Orthogonal Polynomials and Summary Data.

Multiple Regression Applied to Analysis of Variance Problems.

An Introduction to Nonlinear Estimation.

Robust Regression.

Resampling Procedures (Bootstrapping).

Bibliography.

True/False Questions.

Answers to Exercises.

Tables.

Indexes.

"I would wholeheartedly recommend this book to any statistician. The third edition has many advantages over the second." (Statistical Methods in Medical Research, Vol. 9, 5)

"this is an excellently written book" (Statistics & Decisions, Vol. 19, No.3, 2001)

CHAPTER 0


Basic Prerequisite Knowledge


Readers need some of the knowledge contained in a basic course in statistics to tackle regression. We summarize some of the main requirements very briefly in this chapter. Also useful is a pocket calculator capable of getting sums of squares and sums of products easily. Excellent calculators of this type cost about $25–50 in the United States. Buy the most versatile you can afford.

0.1. DISTRIBUTIONS: NORMAL, t, AND F


Normal Distribution

The normal distribution occurs frequently in the natural world, either for data “as they come” or for transformed data. The heights of a large group of people selected randomly will look normal in general, for example. The distribution is symmetric about its mean μ and has a standard deviation σ, which is such that practically all of the distribution (99.73%) lies inside the range μ – 3σxμ + 3σ. The frequency function is

 

(0.1.1)

 

We usually write that x ~ N(μ, σ2), read as “x is normally distributed with mean μ and variance σ2.” Most manipulations are done in terms of the standard normal or unit normal distribution, N(0, 1), for which μ = 0 and σ = 1. To move from a general normal variable x to a standard normal variable z, we set

 

(0.1.2)

 

A standard normal distribution is shown in Figure 0.1 along with some properties useful in certain regression contexts. All the information shown is obtainable from the normal table in the Tables section. Check that you understand how this is done. Remember to use the fact that the total area under each curve is 1.

Gamma Function

The gamma function Γ(q), which occurs in Eqs. (0.1.3) and (0.1.4), is defined as an integral in general:

 

 

Figure 0.1. The standard (or unit) normal distribution N(0, 1) and some of its properties.

 

However, it is easier to think of it as a generalized factorial with the basic property that, for any q,

 

 

and so on. Moreover,

 

 

So, for the applications of Eqs. (0.1.3) and (0.1.4), where v, m, and n are integers, the gamma functions are either simple factorials or simple products ending in π1/2.

Example 1

 

 

Example 2

 

 

t-Distribution

There are many t-distributions, because the form of the curve, defined by

 

(0.1.3)

 

Figure 0.2. The t-distributions for v = 1, 9, ∞ t(∞) = N(0, 1).

 

depends on v, the number of degrees of freedom. In general, the t(v) distribution looks somewhat like a standard (unit) normal but is “heavier in the tails,” and so lower in the middle, because the total area under the curve is 1. As v increases, the distribution becomes “more normal.” In fact, t(∞) is the N(0, 1) distribution, and, when v exceeds about 30, there is so little difference between t(v) and N(0, 1) that it has become conventional (but not mandatory) to use the N(0, 1) instead. Figure 0.2 illustrates the situation. A two-tailed table of percentage points is given in the Tables section.

F-Distribution

The F-distribution depends on two separate degrees of freedom m and n, say. Its curve is defined by

 

(0.1.4)

 

The distribution rises from zero, sometimes quite steeply for certain m and n, and reaches a peak, falling off very skewed to the right. See Figure 0.3. Percentage points for the upper tail levels of 10%, 5%, and 1% are in the Tables section.

 

Figure 0.3. Some selected f(m, n) distributions.

 

The F-distribution is usually introduced in the context of testing to see whether two variances are equal, that is, the null hypothesis that H0: / = 1, versus the alternative hypothesis that H1: / ≠ 1. The test uses the statistic and being statistically independent estimates of and , with v1 and v2 degrees of freedom (df), respectively, and depends on the fact that, if the two samples that give rise to and are independent and normal, then (/)/(/) follows the F(v1, v2) distribution. Thus if = , F = / follows F(v1, v2). When given in basic statistics courses, this is usually described as a two-tailed test, which it usually is. In regression applications, it is typically a one-tailed, upper-tailed test. This is because regression tests always involve putting the “s2 that could be too big, but cannot be too small” at the top and the “s2 that we think estimates the true σ2 well” at the bottom of the F-statistic. In other words, we are in the situation where we test H0: = versus H1: > .

0.2. CONFIDENCE INTERVALS (OR BANDS) AND t-TESTS


Let θ be a parameter (or “thing”) that we want to estimate. Let be an estimate of θ (“estimate of thing”). Typically, will follow a normal distribution, either exactly because of the normality of the observations in , or approximately due to the effect of the Central Limit Theorem. Let be the standard deviation of and let se() be the standard error, that is, the estimated standard deviation, of (“standard error of thing”), based on v degrees of freedom. Typically we get se() by substituting an estimate (based on v degrees of freedom) of an unknown standard deviation into the formula for .

1. A 100(1 – α)% confidence interval (CI) for the parameter θ is given by

 

(0.2.1)

 

where tv(1 – α/2) is the percentage point of a t-variable with v degrees of freedom (df) that leaves a probability α/2 in the upper tail, and so 1 – α/2 in the lower tail. A two-tailed table where these percentage points are listed under the heading of 2(α/2) = α is given in the Tables section. Equation (0.2.1) in words is

 

(0.2.2)

 

2. To test θ = θ0, where θ0 is some specified value of θ that is presumed to be valid (often θ0 = 0 in tests of regression coefficients) we evaluate the statistic

 

(0.2.3)

 

or, in words,

 

(0.2.4)

 

 

Figure 0.4. Two cases for a t-test. (a) The observed t is positive (black dot) and the upper tail area is δ. A two-tailed test considers that this value could just as well have been negative (open “phantom” dot) and quotes “a two-tailed t-probability of 2δ.” (b) The observed t is negative; similar argument, with tails reversed.

 

This “observed value of t” (our “dot”) is then placed on a diagram of the t(v) distribution. [Recall that v is the number of degrees of freedom on which se() is based and that is the number of df in the estimate of σ2 that was used.] The tail probability beyond the dot is evaluated and doubled for a two-tail test. See Figure 0.4 for the probability 2δ. It is conventional to ask if the 2δ value is “significant” or not by concluding that, if 2δ < 0.05, t is significant and the idea (or hypothesis) that θ = θ0 is unlikely and so “rejected,” whereas if 2δ > 0.05, t is nonsignificant and we “do not reject” the hypothesis θ = θ0. The alternative hypothesis here is θθ0, a two-sided alternative. Note that the value 0.05 is not handed down in holy writings, although we sometimes talk as though it is. Using an “alpha level” of α = 0.05 simply means we are prepared to risk a 1 in 20 chance of making the wrong decision. If we wish to go to α = 0.10 (1 in 10) or α = 0.01 (1 in 100), that is up to us. Whatever we decide, we should remain consistent about this level throughout our testing.

However, it is pointless to agonize too much about α. A journal editor who will publish a paper describing an experiment if 2δ = 0.049, but will not publish it if 2δ = 0.051 is placing a purely arbitrary standard on...

Erscheint lt. Verlag 25.8.2014
Reihe/Serie Wiley Series in Probability and Statistics
Wiley Series in Probability and Statistics
Wiley Series in Probability and Statistics
Sprache englisch
Themenwelt Mathematik / Informatik Mathematik Statistik
Mathematik / Informatik Mathematik Wahrscheinlichkeit / Kombinatorik
Technik
Schlagworte accessible • Analysis • analysisupdated • Applications • BASIC • clear • Concepts • Edition • Elementary • fundamentals • Introduction • Knowledge • Methods • Models • outstanding • presentation • Regression • Regression Analysis • Regression (Math.) • Regressionsanalyse • Statistics • Statistik • thorough
ISBN-10 1-118-62568-4 / 1118625684
ISBN-13 978-1-118-62568-2 / 9781118625682
Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?
EPUBEPUB (Adobe DRM)

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belle­tristik und Sach­büchern. Der Fließ­text wird dynamisch an die Display- und Schrift­größe ange­passt. Auch für mobile Lese­geräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich