Statistics (eBook)
John Wiley & Sons (Verlag)
978-1-118-94110-2 (ISBN)
'...I know of no better book of its kind...' (Journal of the Royal Statistical Society, Vol 169 (1), January 2006)
A revised and updated edition of this bestselling introductory textbook to statistical analysis using the leading free software package R
This new edition of a bestselling title offers a concise introduction to a broad array of statistical methods, at a level that is elementary enough to appeal to a wide range of disciplines. Step-by-step instructions help the non-statistician to fully understand the methodology. The book covers the full range of statistical techniques likely to be needed to analyse the data from research projects, including elementary material like t--tests and chi--squared tests, intermediate methods like regression and analysis of variance, and more advanced techniques like generalized linear modelling.
Includes numerous worked examples and exercises within each chapter.
Michael J. Crawley, FRS, Department of Biological Sciences, Imperial College of Science, Technology and Medicine. Author of three bestselling Wiley statistics titles and five life science books.
"e;...I know of no better book of its kind..."e; (Journal of the Royal Statistical Society, Vol 169 (1), January 2006) A revised and updated edition of this bestselling introductory textbook to statistical analysis using the leading free software package R This new edition of a bestselling title offers a concise introduction to a broad array of statistical methods, at a level that is elementary enough to appeal to a wide range of disciplines. Step-by-step instructions help the non-statistician to fully understand the methodology. The book covers the full range of statistical techniques likely to be needed to analyse the data from research projects, including elementary material like t--tests and chi--squared tests, intermediate methods like regression and analysis of variance, and more advanced techniques like generalized linear modelling. Includes numerous worked examples and exercises within each chapter.
Michael J. Crawley, FRS, Department of Biological Sciences, Imperial College of Science, Technology and Medicine. Author of three bestselling Wiley statistics titles and five life science books.
1
Fundamentals
The hardest part of any statistical work is getting started. And one of the hardest things about getting started is choosing the right kind of statistical analysis. The choice depends on the nature of your data and on the particular question you are trying to answer. The truth is that there is no substitute for experience: the way to know what to do is to have done it properly lots of times before.
The key is to understand what kind of response variable you have got, and to know the nature of your explanatory variables. The response variable is the thing you are working on: it is the variable whose variation you are attempting to understand. This is the variable that goes on the y axis of the graph (the ordinate). The explanatory variable goes on the x axis of the graph (the abscissa); you are interested in the extent to which variation in the response variable is associated with variation in the explanatory variable. A continuous measurement is a variable like height or weight that can take any real numbered value. A categorical variable is a factor with two or more levels: sex is a factor with two levels (male and female), and rainbow might be a factor with seven levels (red, orange, yellow, green, blue, indigo, violet).
It is essential, therefore, that you know:
- which of your variables is the response variable?
- which are the explanatory variables?
- are the explanatory variables continuous or categorical, or a mixture of both?
- what kind of response variable have you got – is it a continuous measurement, a count, a proportion, a time-at-death, or a category?
These simple keys will then lead you to the appropriate statistical method:
- The explanatory variables (pick one of the rows):
(a) All explanatory variables continuous Regression (b) All explanatory variables categorical Analysis of variance (ANOVA) (c) Some explanatory variables continuous some categorical Analysis of covariance (ANCOVA) - The response variable (pick one of the rows):
(a) Continuous Regression, ANOVA or ANCOVA (b) Proportion Logistic regression (c) Count Log linear models (d) Binary Binary logistic analysis (e) Time at death Survival analysis
There is a small core of key ideas that need to be understood from the outset. We cover these here before getting into any detail about different kinds of statistical model.
Everything Varies
If you measure the same thing twice you will get two different answers. If you measure the same thing on different occasions you will get different answers because the thing will have aged. If you measure different individuals, they will differ for both genetic and environmental reasons (nature and nurture). Heterogeneity is universal: spatial heterogeneity means that places always differ, and temporal heterogeneity means that times always differ.
Because everything varies, finding that things vary is simply not interesting. We need a way of discriminating between variation that is scientifically interesting, and variation that just reflects background heterogeneity. That is why you need statistics. It is what this whole book is about.
The key concept is the amount of variation that we would expect to occur by chance alone, when nothing scientifically interesting was going on. If we measure bigger differences than we would expect by chance, we say that the result is statistically significant. If we measure no more variation than we might reasonably expect to occur by chance alone, then we say that our result is not statistically significant. It is important to understand that this is not to say that the result is not important. Non-significant differences in human life span between two drug treatments may be massively important (especially if you are the patient involved). Non-significant is not the same as ‘not different’. The lack of significance may be due simply to the fact that our replication is too low.
On the other hand, when nothing really is going on, then we want to know this. It makes life much simpler if we can be reasonably sure that there is no relationship between y and x. Some students think that ‘the only good result is a significant result’. They feel that their study has somehow failed if it shows that ‘A has no significant effect on B’. This is an understandable failing of human nature, but it is not good science. The point is that we want to know the truth, one way or the other. We should try not to care too much about the way things turn out. This is not an amoral stance, it just happens to be the way that science works best. Of course, it is hopelessly idealistic to pretend that this is the way that scientists really behave. Scientists often want passionately that a particular experimental result will turn out to be statistically significant, so that they can get a Nature paper and get promoted. But that does not make it right.
Significance
What do we mean when we say that a result is significant? The normal dictionary definitions of significant are ‘having or conveying a meaning’ or ‘expressive; suggesting or implying deeper or unstated meaning’. But in statistics we mean something very specific indeed. We mean that ‘a result was unlikely to have occurred by chance’. In particular, we mean ‘unlikely to have occurred by chance if the null hypothesis was true’. So there are two elements to it: we need to be clear about what we mean by ‘unlikely’, and also what exactly we mean by the ‘null hypothesis’. Statisticians have an agreed convention about what constitutes ‘unlikely’. They say that an event is unlikely if it occurs less than 5% of the time. In general, the null hypothesis says that ‘nothing is happening’ and the alternative says that ‘something is happening’.
Good and Bad Hypotheses
Karl Popper was the first to point out that a good hypothesis was one that was capable of rejection. He argued that a good hypothesis is a falsifiable hypothesis. Consider the following two assertions:
- there are vultures in the local park
- there are no vultures in the local park
Both involve the same essential idea, but one is refutable and the other is not. Ask yourself how you would refute option A. You go out into the park and you look for vultures. But you do not see any. Of course, this does not mean that there are none. They could have seen you coming, and hidden behind you. No matter how long or how hard you look, you cannot refute the hypothesis. All you can say is ‘I went out and I didn't see any vultures’. One of the most important scientific notions is that absence of evidence is not evidence of absence.
Option B is fundamentally different. You reject hypothesis B the first time you see a vulture in the park. Until the time that you do see your first vulture in the park, you work on the assumption that the hypothesis is true. But if you see a vulture, the hypothesis is clearly false, so you reject it.
Null Hypotheses
The null hypothesis says ‘nothing is happening’. For instance, when we are comparing two sample means, the null hypothesis is that the means of the two populations are the same. Of course, the two sample means are not identical, because everything varies. Again, when working with a graph of y against x in a regression study, the null hypothesis is that the slope of the relationship is zero (i.e. y is not a function of x, or y is independent of x). The essential point is that the null hypothesis is falsifiable. We reject the null hypothesis when our data show that the null hypothesis is sufficiently unlikely.
p Values
Here we encounter a much-misunderstood topic. The p value is not the probability that the null hypothesis is true, although you will often hear people saying this. In fact, p values are calculated on the assumption that the null hypothesis is true. It is correct to say that p values have to do with the plausibility of the null hypothesis, but in a rather subtle way.
As you will see later, we typically base our hypothesis testing on what are known as test statistics: you may have heard of some of these already (Student's t, Fisher's F and Pearson's chi-squared, for instance): p values are about the size of the test statistic. In particular, a p value is an estimate of the probability that a value of the test statistic, or a value more extreme than this, could have occurred by chance when the null hypothesis is true. Big values of the test statistic indicate that the null hypothesis is unlikely to be true. For sufficiently large values of the test statistic, we reject the null hypothesis and accept the alternative hypothesis.
Note also that saying ‘we do not reject the null hypothesis’ and ‘the null hypothesis is true’ are two quite different things. For...
| Erscheint lt. Verlag | 23.9.2014 |
|---|---|
| Sprache | englisch |
| Themenwelt | Mathematik / Informatik ► Mathematik ► Computerprogramme / Computeralgebra |
| Mathematik / Informatik ► Mathematik ► Statistik | |
| Mathematik / Informatik ► Mathematik ► Wahrscheinlichkeit / Kombinatorik | |
| Technik | |
| Schlagworte | Angewandte Wahrscheinlichkeitsrechnung u. Statistik • Applied Probability & Statistics • available • bestselling • Computational Statistics • Edition • enables • Free • generalized linear • Introduction • leading • Life • package • packages • Regression • R (Programm) • scientists • Simple • Software • Statistical Analysis • Statistical Methods • Statistical Software / R • Statistics • Statistics - Text & Reference • Statistik • Statistik / Lehr- u. Nachschlagewerke • Statistiksoftware / R • Users • variety • wide |
| ISBN-10 | 1-118-94110-1 / 1118941101 |
| ISBN-13 | 978-1-118-94110-2 / 9781118941102 |
| Informationen gemäß Produktsicherheitsverordnung (GPSR) | |
| Haben Sie eine Frage zum Produkt? |
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich