Zum Hauptinhalt springen
Nicht aus der Schweiz? Besuchen Sie lehmanns.de

Understanding Statistical Error (eBook)

A Primer for Biologists
eBook Download: EPUB
2015
John Wiley & Sons (Verlag)
978-1-119-10689-0 (ISBN)

Lese- und Medienproben

Understanding Statistical Error - Marek Gierlinski
Systemvoraussetzungen
41,99 inkl. MwSt
(CHF 40,95)
Der eBook-Verkauf erfolgt durch die Lehmanns Media GmbH (Berlin) zum Preis in Euro inkl. MwSt.
  • Download sofort lieferbar
  • Zahlungsarten anzeigen

This accessible introductory textbook provides a straightforward, practical explanation of how statistical analysis and error measurements should be applied in biological research.

Understanding Statistical Error - A Primer for Biologists:

  • Introduces the essential topic of error analysis to biologists
  • Contains mathematics at a level that all biologists can grasp
  • Presents the formulas required to calculate each confidence interval for use in practice
  • Is based on a successful series of lectures from the author’s established course

Assuming no prior knowledge of statistics, this book covers the central topics needed for efficient data analysis, ranging from probability distributions, statistical estimators, confidence intervals, error propagation and uncertainties in linear regression, to advice on how to use error bars in graphs properly. Using simple mathematics, all these topics are carefully explained and illustrated with figures and worked examples. The emphasis throughout is on visual representation and on helping the reader to approach the analysis of experimental data with confidence.

This useful guide explains how to evaluate uncertainties of key parameters, such as the mean, median, proportion and correlation coefficient. Crucially, the reader will also learn why confidence intervals are important and how they compare against other measures of uncertainty.

Understanding Statistical Error - A Primer for Biologists can be used both by students and researchers to deepen their knowledge and find practical formulae to carry out error analysis calculations. It is a valuable guide for students, experimental biologists and professional researchers in biology, biostatistics, computational biology, cell and molecular biology, ecology, biological chemistry, drug discovery, biophysics, as well as wider subjects within life sciences and any field where error analysis is required.

Dr Marek Gierlinski is a bioinformatician at College of Life Science, University of Dundee, UK. He attained his PhD in astrophysics and studied X-ray emission from black holes and neutron stars for many years. In 2009 he started a new career in bioinformatics, bringing his knowledge and skills in statistics and data analysis to a biological institute. He works on a variety of topics, including proteomics, DNA and RNA sequencing, imaging and numerical modelling.

"This volume highlights and promotes these high standards and practices, and should serve as an important starting point for biologists, data scientists, or anyone interested in effectively assessing and presenting uncertainty in data" Marc J. Lajeunesse, Integrative Biology, University of South Florida, Tampa, Florida on behalf of The Quarterly Review of Biology, Sept 17

Chapter 2
Probability distributions


Misunderstanding of probability may be the greatest of all impediments to scientific literacy.

—Stephen Jay Gould

Consider an experiment in which we determine the number of viable bacteria in a sample. To do this, we can use a simple technique of dilution plating. The sample is diluted in five consecutive steps, and each time the concentration is reduced 10-fold. After the final step, we achieve the dilution of 10− 5. The diluted sample is then spread on a Petri dish and cultured in conditions appropriate for the bacteria. Each colony on the plate corresponds to one bacterium in the diluted sample. From this, we can estimate the number of bacteria in the original, undiluted sample.

Now, think of exactly the same experiment, repeated six times under the same conditions. Let us assume that in these six replicates, we found the following numbers of bacterial colonies: 5, 3, 3, 7, 3 and 9. What can we say about these results?

We notice that replicated experiments give different results. This is an obvious thing for an experimental biologist, but can we express it in more strict, mathematical terms? Well, we can interpret these counts as realizations of a random variable. But not just any completely random variable. This variable would follow a certain law, a Poisson law in this case. We can estimate and theoretically predict its probability distribution. We can use this knowledge to predict future results from similar experiments. We can also estimate the uncertainty, or error, of each result.

Firstly, I'm going to introduce the concept of a random variable and a probability distribution. These two are very closely related. Later in this chapter, I will show examples of a few important probability distributions, without which it would be difficult to understand error analysis.

2.1 Random variables


I will not go into gory technical details. A random variable is a mathematical concept, and it has a formal definition. For the purpose of this book, let us say that a random variable can take random values. It sounds a bit tautological, but this is probably the simplest possible definition. In practice, a random variable is a result of an experiment. Its randomness manifests itself in the differing values of repeated measurements of the same quantity. It is quite common that each time you make your measurement, you obtain a different number.

A random variable is a numerical outcome of an experiment. It will vary from trial to trial as the experiment is repeated.

Consider this example. Let us throw two dice and calculate the sum of the numbers shown. This can be any number between 2 and 12. More importantly, some results are more likely than others. For example, there is only one way of getting a 12 (a double 6), but there are five different combinations resulting in the sum of 6 (1+5, 2+4, 3+3, 4+2 and 5+1). It is easy to see that throwing a 6 is five times more likely than throwing a 12.

An example of a non-random variable could be the number of mice used in an experiment. If you have five mice, you have five mice and the result stays unless you drink too much whisky and begin to see little white mice everywhere.

Hold on. In Chapter 1, I showed an example of a repeated measurement that gave a different value each time. So, what is going to happen if you repeat your murine experiment many times? Well, if you come back to the cage after a minute, you are quite likely to find five mice again (unless you forgot to lock the cage). The result is not going to change regardless of how many times you count them. This type of repeated measurement is called pseudo-replication.

More about replication and pseudoreplication in Section 5.11.

But this is not what we are asking about. Typically, you would be conducting an experiment (e.g. testing a drug), spanning over many days in which you would record mice dying and surviving. If you were to repeat the entire experiment many times, you might find that 10 days after dosing the mice with a particular drug there are three mice surviving in experiment 1, two mice alive in experiment 2, four in experiment 3 and so on. Although your particular measurement (counting mice) is ‘perfect’ and not biased by any error, the repeated experiments show the actual level of uncertainty. Hence, contrary to simple intuition, the number of mice at any given moment of time is a random variable. Most values in biological experiments are random variables.

There are two kinds of random variables: discrete and continuous. Discrete random variables can take only certain values, typically whole numbers. The number of mice is a discrete variable, as it can only be 0, 1, 2, 3 and so on. Alternatively, discrete values might be categorical, for example male/female. If necessary, categories can be converted into integer numbers. In contrast, continuous random variables can take any values, typically any real numbers. The length of a mouse's tail is an example of a continuous variable.

2.2 What is a probability distribution?


Every random variable obeys a specific statistical law, called a probability distribution. As the name suggests, this law tells us how the random variable is distributed. Or, to convey it more precisely,

A probability distribution defines the probability of finding the random variable within a certain range of values.

I will use the following notation in this section. A random variable (X) is denoted by a capital letter. This is only a name. Small letters (k, x) denote possible values that the random variable can take. These are actual numbers.

Probability distribution of a discrete variable


Let us consider a discrete random variable X, which can assume non-negative integer values 0, 1, 2, 3,… I will denote P(X = k) as the probability of the variable X being equal to the value k. Mathematically speaking, the probability of finding X between two numbers a and b is determined by the following equation:

(2.1)

which is, simply, the sum of all individual probabilities. For example, in Figure 2.1a, three shaded bars show probabilities of P(X = 5) = 0.16, P(X = 6) = 0.10 and P(X = 7) = 0.06. The sum of these probabilities is 0.32. Hence, P(5 ≤ X ≤ 7) = 0.32. The total probability over all possible values of X is always unity: P(0 ≤ X ≤ ∞) = 1.

Figure 2.1 Examples of probability distributions. (a) Distribution of a discrete random variable X, where each bar shows the probability of X being equal to k. (b) Continuous distribution, probability of finding X between two values equals the area under the f(x) curve between these two values. (c) The same distribution as in (b), with median, θ, and mean, μ, marked. (d) Cumulative distribution, F(x), corresponding to the distribution f(x) from panel (c). By definition, F(θ) = 0.5.

Probability distribution of a continuous variable


A continuous random variable X can take on any real value x. Here we use a probability density function, f(x), which defines the probability per unit x. As such, the value of this function for any specific x doesn't have a simple intuitive meaning. It only makes sense when integrated (or summed up) over a certain range:

(2.2)

Graphically, this integral corresponds to an area under the curve f(x) between a and b, as shown in Figure 2.1b. The probability of finding X between 3 and 6 is indicated by the light-shaded area and equals P(3 ≤ X < 6) = 0.36. The dark-shaded region shows the probability of X being greater (or equal to) 6, P(X ≥ 6) = 0.20. The interval is from 6 to infinity. The total probability over all possible values of X is always unity: P( − ∞ ≤ X ≤ ∞) = 1.

If we narrow the range of integration to nothing (a = b), the resulting probability is zero, as the area under the curve collapses to nothing. Hence, P(X = 5) = 0 in a continuous distribution. Because X is as a continuous variable, it can assume an infinite number of values in any arbitrary interval around 5, so the chances of hitting exactly 5 (I mean exactly) is infinitesimally small.

Cumulative probability distribution


Another useful function is a cumulative probability distribution, defined as the probability that some random variable X is less than x: F(x) = P(X < x). It can be graphically represented as the area under the curve f to the left of x. Due to this definition, F(x) is a monotonic1 function, growing from 0 to 1, with a characteristic ‘sigmoid’ shape in the plot. An example of a probability density function, f(x), and its cumulative distribution, F(x), is shown in Figure 2.1c and 2-1d. It can be understood as a left-tail probability, that is, P(X < x). The right-tail probability is then...

Erscheint lt. Verlag 8.12.2015
Sprache englisch
Themenwelt Mathematik / Informatik Mathematik Angewandte Mathematik
Mathematik / Informatik Mathematik Wahrscheinlichkeit / Kombinatorik
Studium Querschnittsbereiche Epidemiologie / Med. Biometrie
Naturwissenschaften Biologie
Technik
Schlagworte Ökologie / Methoden, Statistik • Biological Chemistry • Biology • Biophysics • Biostatistics • Biostatistik • Biowissenschaften • Cell & Molecular Biology • Computational Biology • Data Analysis • drug discovery • Ecology • Error Analysis • Experimental • experimental biology • Graphical presentation • interval • Life Science • Life Sciences • Methods & Statistics in Ecology • Ökologie / Methoden, Statistik • Probability • propagation • Statistics • Statistik • Uncertainty • Zell- u. Molekularbiologie
ISBN-10 1-119-10689-3 / 1119106893
ISBN-13 978-1-119-10689-0 / 9781119106890
Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?
EPUBEPUB (Adobe DRM)

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belle­tristik und Sach­büchern. Der Fließ­text wird dynamisch an die Display- und Schrift­größe ange­passt. Auch für mobile Lese­geräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich