Handbook of Statistical Systems Biology (eBook)
John Wiley & Sons (Verlag)
978-1-119-95204-6 (ISBN)
This book:
- Provides a comprehensive account of inference techniques in systems biology.
- Introduces classical and Bayesian statistical methods for complex systems.
- Explores networks and graphical modeling as well as a wide range of statistical models for dynamical systems.
- Discusses various applications for statistical systems biology, such as gene regulation and signal transduction.
- Features statistical data analysis on numerous technologies, including metabolic and transcriptomic technologies.
- Presents an in-depth presentation of reverse engineering approaches.
- Provides colour illustrations to explain key concepts.
This handbook will be a key resource for researchers practising systems biology, and those requiring a comprehensive overview of this important field.
Michael Stumpf, Theoretical Systems Biology at Imperial College London David Balding, Statistical Genetics in the Institute of Genetics at University College London Mark Girolami, Department of Computing Science and the Department of Statistics
Chapter 1 Two challenges of systems biology.
Chapter 2 Introduction to Statistical Methods for Complex
Systems.
Chapter 3 Bayesian Inference and Computation.
Chapter 4 Data Integration: Towards Understanding Biological
Complexity.
Chapter 5 Control Engineering Approaches to Reverse Engineering
Biomolecular Approaches.
Chapter 6 Algebraic Statistics and Methods in Systems
Biology.
B. Technology-based Chapters.
Chapter 7 Transcriptomic Technologies and Statistical Data
Analysis.
Chapter 8 Statistical Data Analysis in Metabolomics.
Chaper 9 Imaging and Single-Cell Measurement Technologies.
Chapter 10 Protein Interaction Networks and Their Statistical
Analysis.
C. Networks and Graphical Models.
Chapter 11 Introduction to Graphical Modelling.
Chapter 12 Recovering Genetic Network from Continuous Data with
Dynamic Bayesian Networks.
Chapter 13 Advanced Applications of Bayesian Networks in Systems
Biology.
Chapter 14 Random Graph Models and Their Application to
Protein-Protein Interaction Networks.
Chapter 15 Modelling Biological Networks Via Tailored Random
Graphs.
D. Dynamical Systems.
Chapter 16 Nonlinear Dynamics: a Brief Introduction.
Chapter 17 Qualitative Inference for Dynamical Systems.
Chapter 18 Stochastic Dynamical Systems.
Chapter 19 State-Space models.
Chapter 20 Model Identification by Utilizing Likelihood-Based
Methods.
E. Application Areas.
Chapter 21 Inference of Signalling Pathway Models.
Chapter 22 Modelling Transcription Factor Activity.
Chapter 23 Host-Pathogen Systems Biology.
Chapter 24 Statistical Metabolomics: Bayesian Challenges in the
Analysis of Metabolomic Data.
Chapter 25 Systems Biology of microRNA.
"A very remarkable collection of essays. Strongly
recommended to workers in this area." (International
Statistical Review, 1 October 2013)
"I would highly recommend this book as a useful guide for
the students and practitioners of systems biology."
(Science Progress, 1 September 2012)
"This handbook will be a key resource for researchers
practising systems biology, and those requiring a comprehensive
overview of this important field." (Zentralblatt MATH,
2012)
Chapter 2
Introduction to Statistical Methods for Complex Systems
Tristan Mary-Huard and Stéphane Robin
Agro ParisTech and INRA, Paris, France
2.1 Introduction
The aim of the present chapter is to introduce and illustrate some concepts of statistical inference useful in systems biology. Here we limit ourselves to the classical, so-called ‘frequentist’ statistical inference where parameters are fixed quantities that need to be estimated. The Bayesian approach will be presented in Chapter 3.
Modelling and inference techniques are illustrated in three recurrent problems in systems biology:
Class comparison aims at assessing the effect of some treatment or experimental condition on some biological response. This requires proper statistical modelling to account for the experimental design, various covariates or stratifications or dependence between the measurements. As systems biology often deals with high-throughput technologies, it also raises multiple testing issues.
Class prediction refers to learning techniques that aim at building a rule to predict the status (e.g. well or ill) of an individual, based on a set of biological descriptors. An exhaustive list of classification algorithms is out of reach, but general techniques such as regularization or aggregation are of prime interest in systems biology where the number of variables often exceeds the number of observations by far. Evaluating the performances of a classifier also requires relevant tools.
Class discovery aims at uncovering some structure in a set of observations. These techniques include distance-based or model-based clustering methods and allow to determine distinct groups of individuals in the absence of a prior classification. However, the underlying structure may have more complex forms, each raising specific issues in terms of inference.
This chapter focuses on generic statistical concepts and methods, that can be applied no matter which technology is used for the data acquisition. In practice, applications to any biological problem will necessitate both a relevant strategy for the data collection, and a careful tuning of the methods to obtain meaningful results. These two steps of data collection (or experimental design conception) and adaptation of the generic methods require taking into account the nature of the data. Therefore, they are dependent on the data acquisition technology, and will be discussed in Part B of this Handbook.
In this chapter, the data are assumed to arise from a static process. The analysis of a dynamic biological system would require more sophisticated methods, such as partial differential equations or network modelling.
These topics are not discussed here as they will be reviewed in depth in Parts C and D.
Lastly, a basic knowledge in statistics is assumed, covering topics including point estimation (in particular maximum likelihood estimation), hypothesis testing, and a background in regression and linear models.
2.2 Class Comparison
We consider here the general problem of assessing the effect of some treatment, experimental condition or covariate on some response. We first address the problem of modelling the data resulting from the experiments, focusing on how to account for the dependency between the observations. We then turn to the problem of multiple testing, which is recurrent in high-throughput data analyses.
2.2.1 Models for Dependent Data
Many biological experiments aim at observing the effects of a given treatment (or combination of treatments) on a given response. ‘Treatment’ is used here in a very broad sense, including controlled experimental conditions, uncontrolled covariates, time, population structure, etc. In the following will stand for the total number of experiments.
Linear (Gaussian) models (Searle 1971; Dobson 1990) provide a general framework to describe the influence of a set of controlled conditions and/or uncontrolled covariates, summarized in a -dimensional matrix , on the observed response gathered in a -dimensional vector as
where is the -dimensional vector containing all parameters. In the most classical setting, the response is supposed to be Gaussian, and the dependency structure between the observations is then fully specified by the (co-)variance matrix which contains the variance of each observation on the diagonal, and the covariances between pairs of observations elsewhere. In the most simple setting, the responses are supposed to be independent with same variance , that is .
2.2.1.1 Writing the Right (Mixed) Model
In more complex experiments, the assumption that observations are independent does not hold and the structure of needs to be adapted. Because it contains parameters, the shape of has to be strongly constrained to allow good inference. We first present here some typical experimental settings, and the associated dependency structures.
Variance Components
Consider the study of the combined effects of the genotype (indexed by ) and of the cell type () on some gene expression. Several individuals () from each genotype are included and cells from each type are harvested in each of them. In such a setting the expected response is , which is often decomposed into a genotype effect, a cell type effect and an interaction as .
The most popular way to account for the dependency between measures obtained on the same individual is to add a random term associated with each individual. The complete model can then be written as
where all and are independent centred Gaussian variables with variance and , respectively. The variance of one observation is then , where is the ‘biological’ variance and is the ‘technical’ one (Kerr and Churchill 2001). The random effect induces a uniform correlation between observations from the same individual since:
and 0 if . The matrix form of this model is a generalization of (2.1):
where describes the individual structure: each row corresponds to one measurement and each column to one individual and contains a 1 at the intersection if the measurement has been made on the individual, and a 0 otherwise. The denomination ‘mixed’ of ‘linear mixed models’ comes from the simultaneous presence of fixed and random effects. It corresponds to the simplest form of so-called ‘variance components’ models. The variance matrix corresponding to (2.3) is . Application of such a model to gene expression data can be found in Wolfinger et al.(2001) or Tempelman (2008).
Repeated Measurements
One considers a similar design where, in place of cell types, we compare successive harvesting times (indexed by ) within each individual. The uniform correlation within each individual given in (2.3) may then seem inappropriate, for it does not account for the delay between times of observation. A common dependency form is then the so-called ‘autoregressive’, which states that
and 0 otherwise. This is to assume that the correlation decreases (at an exponential rate) with the time delay. Such a variance structure cannot be put in a simple matrix form similar to (2.4). Note that Equation (2.1) is still valid, but with nondiagonal variance matrix .
Spatial Dependency
It is also desirable to account for spatial dependency when observations have some spatial localization. Suppose one wants to compare treatments (indexed by ), and that replicates () have respective localizations . A typical variance structure (Cressie 1993) is
where accounts for the measurement error variability and controls the speed at which the dependency decreases with distance.
The dependency structures described above can of course be combined. Also note that this list is far from exhaustive. The limitations often come from the software at hand or the specific computing developments that can be made. A large catalogue of such structures can be found in software such as SAS (2002-03) or R (www.r-project.org).
2.2.1.2 Inference
Some problems related to the inference of mixed linear models are still unresolved. We only provide here an introduction to the most popular approaches and emphasize some practical issues that can be faced when using them.
Estimation
Mixed model inference requires to estimate both and . We start with the estimation of , which reduces to the estimation of a few variance parameters such as in the examples given above.
Moment estimates can be obtained (Searle 1971; Demindenko 2004), typically for variance component models. Such estimates are often based on sums of squares, that are squared distances between and its projection on various linear spaces, such as span, span or span(. The expectation of these sums of squares can often be related to the different variance parameters and the estimation then reduces to solving a set of linear equations.
The maximum likelihood (ML) estimator is defined as
and can be used for all models. Unfortunately, ML variance estimates are known to be biased in many (almost all) situations, because both and have to be estimated at the same time. The most popular way to circumvent this problem consists of changing to a model where is known (Verbeke and Molenberghs 2000). Defining some matrix such that , we may define the Gaussian vector which satisfies
The most natural choice for is the projector on the linear space orthogonal to span. The so-called...
| Erscheint lt. Verlag | 9.9.2011 |
|---|---|
| Sprache | englisch |
| Themenwelt | Mathematik / Informatik ► Mathematik ► Wahrscheinlichkeit / Kombinatorik |
| Studium ► 2. Studienabschnitt (Klinik) ► Humangenetik | |
| Studium ► Querschnittsbereiche ► Epidemiologie / Med. Biometrie | |
| Technik | |
| Schlagworte | Bayesian methodology, Bayesian methodology systems biology, Dynamical systems biology, Statistical models, 3D visualisations biology, proteomic technologies statistics, pharmacodynamics statistics • Biowissenschaften • Genetics • Genetik • Life Sciences • Molecular Biology • Molekularbiologie • Statistical Genetics / Microarray Analysis • Statistics • Statistik • Statistische Genetik / Microarray-Analyse |
| ISBN-10 | 1-119-95204-2 / 1119952042 |
| ISBN-13 | 978-1-119-95204-6 / 9781119952046 |
| Informationen gemäß Produktsicherheitsverordnung (GPSR) | |
| Haben Sie eine Frage zum Produkt? |
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich