Zum Hauptinhalt springen
Nicht aus der Schweiz? Besuchen Sie lehmanns.de

Spatial and Spatio-temporal Bayesian Models with R - INLA (eBook)

eBook Download: EPUB
2015
John Wiley & Sons (Verlag)
9781118950197 (ISBN)

Lese- und Medienproben

Spatial and Spatio-temporal Bayesian Models with R - INLA - Marta Blangiardo, Michela Cameletti
Systemvoraussetzungen
65,99 inkl. MwSt
(CHF 64,45)
Der eBook-Verkauf erfolgt durch die Lehmanns Media GmbH (Berlin) zum Preis in Euro inkl. MwSt.
  • Download sofort lieferbar
  • Zahlungsarten anzeigen
Spatial and Spatio-Temporal Bayesian Models with R-INLA provides a much needed, practically oriented & innovative presentation of the combination of Bayesian methodology and spatial statistics. The authors combine an introduction to Bayesian theory and methodology with a focus on the spatial and spatio­-temporal models used within the Bayesian framework and a series of practical examples which allow the reader to link the statistical theory presented to real data problems. The numerous examples from the fields of epidemiology, biostatistics and social science all are coded in the R package R-INLA, which has proven to be a valid alternative to the commonly used Markov Chain Monte Carlo simulations

Marta Blangiardo, MRC-PHE Centre for Environment and Health, Department of Epidemiology and Biostatistics, Imperial College London, UK

Michela Cameletti, Department of Management, Economics and Quantitative Methods, University of Bergamo, Italy


Spatial and Spatio-Temporal Bayesian Models with R-INLA provides a much needed, practically oriented & innovative presentation of the combination of Bayesian methodology and spatial statistics. The authors combine an introduction to Bayesian theory and methodology with a focus on the spatial and spatio -temporal models used within the Bayesian framework and a series of practical examples which allow the reader to link the statistical theory presented to real data problems. The numerous examples from the fields of epidemiology, biostatistics and social science all are coded in the R package R-INLA, which has proven to be a valid alternative to the commonly used Markov Chain Monte Carlo simulations

Marta Blangiardo, MRC-PHE Centre for Environment and Health, Department of Epidemiology and Biostatistics, Imperial College London, UK Michela Cameletti, Department of Management, Economics and Quantitative Methods, University of Bergamo, Italy

Dedication iii

Preface ix

1 Introduction 1

1.1 Why spatial and spatio-temporal statistics? 1

1.2 Why do we use Bayesian methods for modelling spatial and spatio-temporal structures? 2

1.3 Why INLA? 3

1.4 Datasets 3

2 Introduction to 21

2.1 The language 21

2.2 objects 22

2.3 Data and session management 34

2.4 Packages 35

2.5 Programming in 36

2.6 Basic statistical analysis with 39

3 Introduction to Bayesian Methods 53

3.1 Bayesian Philosophy 53

3.2 Basic Probability Elements 57

3.3 Bayes Theorem 62

3.4 Prior and Posterior Distributions 64

3.5 Working with the Posterior Distribution 66

3.6 Choosing the Prior Distribution 68

4 Bayesian computing 83

4.1 Monte Carlo integration 83

4.2 Monte Carlo method for Bayesian inference 85

4.3 Probability distributions and random number generation in 86

4.4 Examples of Monte Carlo simulation 89

4.5 Markov chain Monte Carlo methods 97

4.6 The Integrated Nested Laplace Approximations algorithm 113

4.7 Laplace approximation 113

4.8 The package 123

4.9 How INLA works: step by step example 127

5 Bayesian regression and hierarchical models 139

5.1 Linear Regression 139

5.2 Nonlinear regression: random walk 145

5.3 Generalized Linear Models 150

5.4 Hierarchical Models 159

5.5 Prediction 176

5.6 Model Checking and Selection 179

6 Spatial Modeling 189

6.1 Areal data -GMRF 192

6.2 Ecological Regression 203

6.3 Zero inated models 204

6.4 Geostatistical data 210

6.5 The Stochastic Partial Diferential Equation approach 211

6.6 SPDE within 215

6.7 SPDE toy example with simulated data 217

6.8 More advanced operations through the function 226

6.9 Prior specification for the stationary case 233

6.10 SPDE for Gaussian response: Swiss rainfall data 237

6.11 SPDE with nonnormal outcome: Malaria in the Gambia 245

6.12 Prior specification for the nonstationary case 249

7 Spatio-Temporal Models 257

7.1 Spatio-temporal Disease mapping 258

7.2 Spatio-temporal Modeling particulate matter concentration 268

8 Advanced modeling 283

8.1 Bivariate model for spatially misaligned data 283

8.2 Semicontinuous model to daily rainfall 295

8.3 Spatio-temporal dynamic models 308

8.4 Space-time model lowering the time resolution 321

Chapter 1
Introduction


1.1 Why spatial and spatio-temporal statistics?


In the last few decades, the availability of spatial and spatio-temporal data has increased substantially, mainly due to the advances in computational tools which allow us to collect real-time data coming from GPS, satellites, etc. This means that nowadays in a wide range of fields, from epidemiology to ecology, to climatology and social science, researchers have to deal with geo-referenced data, i.e., including information about space (and possibly also time).

As an example, we consider a typical epidemiological study, where the interest is to evaluate the incidence of a particular disease such as lung cancer across a given country. The data will usually be available as counts of diseases for small areas (e.g., administrative units) for several years. What types of models allow the researchers to take into account all the information available from the data? It is important to consider the potential geographical pattern of the disease: areas close to each others are more likely to share some geographical characteristics which are related to the disease, thus to have similar incidence. Also how is the incidence changing in time? Again it is reasonable to expect that if there is a temporal pattern, this is stronger for subsequent years than for years further apart.

As a different example, let us assume that we are now in the climatology field and observe daily amount of precipitation at particular locations of a sparse network: we want to predict the rain amount at unobserved locations and we need to take into account spatial correlation and temporal dependency.

Spatial and spatio-temporal models are now widely used: typing “statistical models for spatial data” in ™Google Scholar returns more than 3 million hits and “statistical models for spatio-temporal data” gives about 159,000. There are countless scientific papers in peer review journals which use more or less complex and innovative statistical models to deal with the spatial and/or the temporal structure of the data in hand, covering a wide range of applications; the following list only aims at providing a flavor of the main areas where these types of models are used: Haslett and Raftery (1989), Handcock and Wallis (1994) and Jonhansson and Glass (2008) work in the meteorology field; Shoesmith (2013) presents a model for crime rates and burglaries, while Pavia et al. (2008) used spatial models for predicting election results; in epidemiology Knorr-Held and Richardson (2003) worked on infectious disease, while Waller et al. (1997) and Elliott et al. (2001) presented models for chronic diseases. Finally, Szpiro et al. (2010) focused on air pollution estimates and prediction.

1.2 Why do we use Bayesian methods for modeling spatial and spatio-temporal structures?


Several types of models are used with spatial and spatio-temporal data, depending on the aim of the study. If we are interested in summarizing spatial and spatio-temporal variation between areas using risks or probabilities then we could use statistical methods like disease mapping to compare maps and identify clusters. Moran Index is extensively used to check for spatial autocorrelation (Moran, 1950), while the scan statistics, implemented in SaTScan (Killdorf, 1997), has been used for cluster detection and to perform geographical surveillance in a non-Bayesian approach. The same types of models can also be used in studies where there is an aetiological aim to assess the potential effect of risk factors on outcomes.

A different type of study considers the quantification of the risk of experiencing an outcome as the distance from a certain source increases. This is typically framed in an environmental context, so that the source could be a point (e.g., waste site, radio transmitter) or a line (e.g., power line, road). In this case, the methods typically used vary from nonparametric tests proposed by Stone (1988) to the parametric approach introduced by Diggle et al. (1998).

In a different context, when the interest lies in mapping continuous spatial (or spatio-temporal) variables, which are measured only at a finite set of specific points in a given region, and in predicting their values at unobserved locations, geostatistical methods – such as kriging – are employed (Cressie, 1991; Stein, 1991). This may play a significant role in environmental risk assessment in order to identify areas where the risk of exceeding potentially harmful thresholds is higher.

Bayesian methods to deal with spatial and spatio-temporal data started to appear around year 2000, with the development of Markov chain Monte Carlo (MCMC) simulative methods (Casella and George, 1992; Gilks et al., 1996). Before that the Bayesian approach was almost only used for theoretical models and found little applications in real case studies due to the lack of numerical/analytical or simulative tools to compute posterior distributions. The advent of MCMC has triggered the possibility for researchers to develop complex models on large datasets without the need of imposing simplified structures. Probably the main contribution to spatial and spatio-temporal statistics is the one of Besag et al. (1991), who developed the Besag–York–Mollié (BYM) method (see Chapter 6) which is commonly used for disease mapping, while Banerjee et al. (2004), Diggle and Ribeiro (2007) and Cressie and Wikle (2011) have concentrated on Bayesian geostatistical models. The main advantage of the Bayesian approach resides in its taking into account uncertainty in the estimates/predictions, and its flexibility and capability of dealing with issues like missing data. In the book, we follow this paradigm and introduce the Bayesian philosophy and inference in Chapter 3, while in Chapter 4 we review Bayesian computation tools, but the reader could also find interesting the following: Knorr-Held (2000) and Best et al. (2005) for disease mapping and Diggle et al. (1998) for a modeling approach for continuous spatial data and for prediction.

1.3 Why INLA?


MCMC methods are extensively used for Bayesian inference, but their limitation resides in their computational burden. This has become an important issue, considering the advances in data collection, leading to availability of big datasets, characterized by high spatial and temporal resolution as well as data from different sources. The model complexity of taking into account spatial and spatio-temporal structures with large datasets could lead to several days of computing time to perform Bayesian inference via MCMC.

To overcome this issue, here comes the integrated nested Laplace approximations (INLA), a deterministic algorithm proposed by Rue et al. (2009) which has proven capable of providing accurate and fast results. It started as a stand-alone program but was then embedded into R (as a package called R-INLA), and since then it has become very popular amongst statisticians and applied researchers in a wide range of fields, with spatial and spatio-temporal models being possibly one of the main applications for it. The website www.r-inla.org provides a great resource of papers and tutorials and it contains a forum where users can post queries and requests of help. In this book we provide a detailed documentation of the INLA functions and options for modeling spatial and spatio-temporal data and use a series of examples drawn from epidemiology, social and environmental science.

1.4 Datasets


In this section, we briefly describe the datasets that we will use throughout the book. They are available for download from R packages or from the INLA website (https://sites.google.com/a/r-inla.org/stbook/), where we also provide the R code used to run all the examples.1

1.4.1 National Morbidity, Mortality, and Air Pollution Study


The National Morbidity, Mortality and Air Pollution Study (NMMAPS) is a large time series study to estimate the effect of air pollution on the health of individuals living in 108 US cities during the period 1987–2000. Several papers have been published on the data, methods, and results from this study (see, for instance, Samet et al. (2000)). Detailed information about the database can be found on the Internet-based Health and Air Pollution Surveillance System (iHAPSS) website (http://www.ihapss.jhsph.edu/). Data on the daily concentration of particulateswith an aerodynamic diameter of less than 10 (PM) and nitrogen dioxide (NO), both measured in , as well as daily temperature for Salt Lake City are contained in the file NMMAPSraw.csv.

We use this dataset to study the relationship between PM and temperature as an illustration of a linear regression model (Chapter 5). A plot which shows the trend of PM and temperature for the 14 years of available data is presented in Figure 1.1.

Figure 1.1 Daily temperature (points) and PM concentration (line) in Salt Lake City (1987–2000).

1.4.2 Average income in Swedish municipalities


Statistics Sweden (http://www.scb.se/) has created a population registry of Sweden, with detailed socioeconomic information at the individual and household level for all Swedish municipalities. This dataset was used by the EURAREA Consortium (EURAREA Consortium, 2004), a European research project funded by EUROSTAT, to investigate methods for small...

Erscheint lt. Verlag 7.4.2015
Sprache englisch
Themenwelt Mathematik / Informatik Mathematik Statistik
Mathematik / Informatik Mathematik Wahrscheinlichkeit / Kombinatorik
Studium Querschnittsbereiche Prävention / Gesundheitsförderung
Technik
Schlagworte Bayesian analysis • Bayessches Verfahren • Bayes-Verfahren • Epidemiologie u. Biostatistik • Epidemiology & Biostatistics • Gesundheits- u. Sozialwesen • Health & Social Care • R (Programm) • Spatial and Spatio-Temporal Bayesian Models , R-INLA , Bayesian methodology, spatial statistics, spatio­-temporal, epidemiology, biostatistics, social science, Markov Chain Monte Carlo simulations • Statistics • Statistics for Social Sciences • Statistik • Statistik in den Sozialwissenschaften
ISBN-13 9781118950197 / 9781118950197
Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?
EPUBEPUB (Adobe DRM)

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belle­tristik und Sach­büchern. Der Fließ­text wird dynamisch an die Display- und Schrift­größe ange­passt. Auch für mobile Lese­geräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich