Zum Hauptinhalt springen
Nicht aus der Schweiz? Besuchen Sie lehmanns.de

Statistics with JMP: Hypothesis Tests, ANOVA and Regression (eBook)

eBook Download: EPUB
2016
John Wiley & Sons (Verlag)
978-1-119-09716-7 (ISBN)

Lese- und Medienproben

Statistics with JMP: Hypothesis Tests, ANOVA and Regression - Peter Goos, David Meintrup
Systemvoraussetzungen
66,99 inkl. MwSt
(CHF 65,45)
Der eBook-Verkauf erfolgt durch die Lehmanns Media GmbH (Berlin) zum Preis in Euro inkl. MwSt.
  • Download sofort lieferbar
  • Zahlungsarten anzeigen

Statistics with JMP: Hypothesis Tests, ANOVA and Regression

 Peter Goos, University of Leuven and University of Antwerp, Belgium

 David Meintrup, University of Applied Sciences Ingolstadt, Germany

 A first course on basic statistical methodology using JMP

This book provides a first course on parameter estimation (point estimates and confidence interval estimates), hypothesis testing, ANOVA and simple linear regression. The authors approach combines mathematical depth with numerous examples and demonstrations using the JMP software.

Key features:

  • Provides a comprehensive and rigorous presentation of introductory statistics that has been extensively classroom tested.
  • Pays attention to the usual parametric hypothesis tests as well as to non-parametric tests (including the calculation of exact p-values).
  • Discusses the power of various statistical tests, along with examples in JMP to enable in-sight into this difficult topic.
  • Promotes the use of graphs and confidence intervals in addition to p-values.
  • Course materials and tutorials for teaching are available on the books companion website.

Masters and advanced students in applied statistics, industrial engineering, business engineering, civil engineering and bio-science engineering will find this book beneficial. It also provides a useful resource for teachers of statistics particularly in the area of engineering.

Peter Goos, Department of Mathematics, Statistics and Actuarial Sciences, Faculty of Applied Economics of the University of Antwerp, Belgium. David?Meintrup, Department of Mathematics, Statistics and Actuarial Sciences, Faculty of Applied Economics of the University of Antwerp, Belgium.

Dedication iii

Preface xiii

Acknowledgements xvii

Part One Estimators and tests 1

1 Estimating population parameters 3

2 Interval estimators 37

3 Hypothesis tests 71

Part Two One population 103

4 Hypothesis tests for a population mean, proportion or variance 105

5 Two hypothesis tests for the median of a population 149

6 Hypothesis tests for the distribution of a population 175

Part Three Two populations

7 Independent versus paired samples 213

8 Hypothesis tests for means, proportions and variances of two independent samples 219

9 A nonparametric hypothesis test for the medians of two independent samples 263

10 Hypothesis tests for the population mean of two paired samples 285

11 Two nonparametric hypothesis tests for paired samples 305

Part Four More than two populations 325

12 Hypothesis tests for more than two population means: one-way analysis of variance 327

13 Nonparametric alternatives to an analysis of variance 375

14 Hypothesis tests for more than two population variances 401

Part Five More useful tests and procedures 417

15 Design of experiments and data collection 419

16 Testing equivalence 427

17 Estimation and testing of correlation and association 445

18 An introduction to regression modeling 481

19 Simple linear regression 493

A Binomial distribution 589

B Standard normal distribution 593

C X2-distribution 595

D Student's t-distribution 597

E Wilcoxon signed-rank test 599

F Critical values for the Shapiro-Wilk test 605

G Fisher's F-distribution 607

H Wilcoxon rank-sum test 615

I Studentized range or Q-distribution 625

J Two-sided Dunnett test 629

K One-sided Dunnett test 633

L Kruskal-Wallis-Test 637

M Rank correlation test 641

Index 643

"Masters and advanced students in applied statistics, industrial engineering, business engineering, civil engineering and bio-science engineering will find this book beneficial. It also provides a useful resource for teachers of statistics particularly in the area of engineering." (Zentralblatt MATH 2016)

1
Estimating Population Parameters


I don’t know how long I stand there. I don’t believe I’ve ever stood there mourning faithfully in a downpour, but statistically speaking it must have been spitting now and then, there must have been a bit of a drizzle once or twice.

(from The Misfortunates, Dimitri Verhulst, pp. 125–126)

A major goal in statistics is to make statements about populations or processes. Often, the interest is in specific parameters of the distributions or densities of the populations or processes under study. For instance, researchers in political science want to make statements about the proportion of a population that votes for a certain political party. Industrial engineers want to make statements about the proportion of defective smartphones produced by a production process. Bioscience engineers are interested in comparing the mean amounts of growth resulting from applying two or more different fertilizers. Economists are interested in income inequality and may want to compare the variance in income across different groups.

To be able to make such statements, the proportions, means, and variances under study need to be quantified. In statistical jargon, we say that these parameters need to be estimated. It is also important to quantify how reliable each of the estimates is, in order to judge the confidence we can have in any statement we make. This chapter discusses the properties of the most important sample statistics that are used to make statements about population and process means, proportions, and variances.

1.1 Introduction: Estimators Versus Estimates


In practice, population parameters such as μ, σ2, π, and λ (see our book Statistics with JMP: Graphs, Descriptive Statistics and Probability) are rarely known. For example, if we study the arrival times of the customers of a bank, we know that the number of arrivals per unit of time often follows a Poisson1 distribution. However, we do not know the exact value of the distribution’s parameter λ. One way or another, we therefore need to estimate this parameter. This estimate will be based on a number of measurements or observations, x1, x2, …, xn, that we perform in the bank; in other words, on the sample data we collect.

The estimate for the unknown λ will be a function of the sample values x1, x2, …, xn; for example, the sample mean . Every researcher who faces the same problem, studying the arrival pattern of customers, will obtain different sample values, and thus a different sample mean and another estimate. The reason for this is that the number of arrivals in the bank in a given time interval is a random variable. We can express this explicitly by using uppercase letters X1, X2, …, Xn for the sample observations. The fact that each researcher obtains another estimate for λ can also be made more explicit by using a capital letter to denote the sample mean: . The sample mean is interpreted as a random variable, and then it is called an estimator instead of an estimate. In short, an estimate is always a real number, while an estimator is a random variable the value of which is not yet known.

The sample mean is, of course, only one of many possible functions of the sample observations X1, X2, …, Xn, and thus only one of many possible estimators. Obviously, a researcher is not interested in an arbitrary function of the sample observations, but he wants to get a good idea of the unknown parameter. In other words, the researcher wishes to obtain an estimate that, on average, is equal to the unknown parameter, and that, ideally, is guaranteed to be close to the unknown parameter. Statisticians translate these requirements into “the estimator should be unbiased” and “the estimator should have a small variance”. These requirements will be clarified in the next section.

1.2 Estimating a Mean Value


The requirements for a good estimator can best be illustrated by means of two simulation studies. The first study simulates data from a normally distributed population, while the second one simulates data from an exponentially distributed population.

1.2.1 The Mean of a Normally Distributed Population


We first assume that a normally distributed population with mean μ = 3000 and standard deviation σ = 100 is studied by 1000 (fictitious) students. The students are unaware of the μ value and wish to estimate it. To this end, each of these students performs five measurements. A first option to estimate the unknown value μ is to calculate the sample mean. In this way, we obtain 1000 sample means, shown in the histogram in Figure 1.1, at the top left. The mean of these 1000 sample means is 2998.33, while the standard deviation is 43.38.

Figure 1.1 Histograms and descriptive statistics for 1000 sample means and medians calculated based on samples of five observations from a normally distributed population with mean 3000 and standard deviation 100.

Another possibility to estimate the unknown μ is to calculate the median. For a normally distributed population, both the median and the expected value are equal to the parameter μ, so that this makes sense. Based on the samples that the students have gathered, the 1000 medians can also be calculated and displayed in a histogram. The resulting histogram is shown in Figure 1.1, at the top right2. The attentive reader will notice immediately that the second histogram is just a bit wider than the first. Among other things, this is reflected by the fact that the standard deviation of the 1000 medians is 53.43. The mean of the 1000 medians is equal to 2999.08. In Figure 1.1, it can also be seen that the minimum (2841.78) and the first quartile (2962.22) of the sample medians are smaller than the minimum (2867.56) and the first quartile (2969.25) of the sample means. Also, the maximum (3161.64) and the third quartile (3033.51) of the sample medians are greater than the maximum (3140.35) and the third quartile (3027.80) of the sample means. This suggests that the sample medians are, in general, further away from the population mean μ = 3000 than the sample means.

It is striking that both the mean of the 1000 sample means (2998.33) and that of the 1000 medians (2999.08) are very close to 3000. If the number of samples is raised significantly (theoretically, an infinite number of samples could be taken), the mean of the sample means and that of the sample medians will converge to the unknown μ = 3000. Therefore, both the sample mean and the sample median are called unbiased estimators of the mean of a normally distributed population.

The fact that the range, the interquartile range, the standard deviation, and the variance of the 1000 sample means are smaller than those of the 1000 sample medians means that the sample mean is a more reliable estimator of the unknown population mean than the sample median. The larger variance of the medians indicates that the medians are generally further away from μ = 3000 than the sample means. In short, a researcher should have more confidence in the sample mean because it is usually closer to the unknown μ. In such a case, we say that one estimator (here, the sample mean) is more efficient or precise than the other (here, the median).

1.2.2 The Mean of an Exponentially Distributed Population


We now investigate an exponentially distributed population with parameter λ = 1/100. The “unknown” population mean is therefore μ = 1/λ = 100 (see Statistics with JMP: Graphs, Descriptive Statistics and Probability). Each of the 1000 fictitious students performs five measurements. A first option to estimate the unknown value μ is again to calculate the sample mean. A histogram of the 1000 sample means is shown in Figure 1.2, at the top left. The mean of these 1000 sample means is 99.2417, while the standard deviation is 44.10.

Figure 1.2 Histograms and descriptive statistics for 1000 sample means and sample medians calculated based on samples of five observations from an exponentially distributed population with parameter λ = 1/100.

Based on the samples that the students have gathered, the 1000 medians can also be calculated and displayed in a histogram. This histogram is shown in Figure 1.2, at the top right. The mean of the 1000 medians is only 77.0114.

These calculations indicate that the population mean μ = 1/λ = 100 can be approximated fairly well by using the sample means, with a mean of 99.2417. This is not the case for the medians, the mean value of which is far away from μ. This remains the case if the number of samples is increased. In this example, for an exponentially distributed population, the median is not an unbiased but a biased estimator of the population mean.

In addition, Figure 1.2 also shows that the standard deviation of the sample medians (46.13) is greater than that of the sample means (44.10).

1.3 Criteria for Estimators


Key properties of estimators are their expected values and their variances. These statistics are related to the concepts of bias and efficiency, respectively.

1.3.1 Unbiased Estimators


An ideal estimator that always produces the exact value of an...

Erscheint lt. Verlag 16.2.2016
Sprache englisch
Themenwelt Mathematik / Informatik Mathematik Computerprogramme / Computeralgebra
Mathematik / Informatik Mathematik Statistik
Mathematik / Informatik Mathematik Wahrscheinlichkeit / Kombinatorik
Technik
Schlagworte ANOVA • Data Analysis • Datenanalyse • engineering</p> • Engineering statistics • Hypothesis tests • JMP • <p>Statistics methodology • non-parametric tests • Parameter Estimation • parametric hypothesis tests • point estimates and confidence interval estimates • p-values • Regression • Simple Linear Regression • Statistical Software / SAS • Statistics • Statistik • Statistik in den Ingenieurwissenschaften • Statistiksoftware / SAS
ISBN-10 1-119-09716-9 / 1119097169
ISBN-13 978-1-119-09716-7 / 9781119097167
Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?
EPUBEPUB (Adobe DRM)

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belle­tristik und Sach­büchern. Der Fließ­text wird dynamisch an die Display- und Schrift­größe ange­passt. Auch für mobile Lese­geräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich