Multivariate Nonparametric Regression and Visualization (eBook)
392 Seiten
Wiley-Interscience (Verlag)
978-1-118-59350-9 (ISBN)
A modern approach to statistical learning and its applications through visualization methods
With a unique and innovative presentation, Multivariate Nonparametric Regression and Visualization provides readers with the core statistical concepts to obtain complete and accurate predictions when given a set of data. Focusing on nonparametric methods to adapt to the multiple types of data generatingmechanisms, the book begins with an overview of classification and regression.
The book then introduces and examines various tested and proven visualization techniques for learning samples and functions. Multivariate Nonparametric Regression and Visualization identifies risk management, portfolio selection, and option pricing as the main areas in which statistical methods may be implemented in quantitative finance. The book provides coverage of key statistical areas including linear methods, kernel methods, additive models and trees, boosting, support vector machines, and nearest neighbor methods. Exploring the additional applications of nonparametric and semiparametric methods, Multivariate Nonparametric Regression and Visualization features:
- An extensive appendix with R-package training material to encourage duplication and modification of the presented computations and research
- Multiple examples to demonstrate the applications in the field of finance
- Sections with formal definitions of the various applied methods for readers to utilize throughout the book
Multivariate Nonparametric Regression and Visualization is an ideal textbook for upper-undergraduate and graduate-level courses on nonparametric function estimation, advanced topics in statistics, and quantitative finance. The book is also an excellent reference for practitioners who apply statistical methods in quantitative finance.
JUSSI KLEMELÄ, PhD, is Senior Research Fellow in the Department of Mathematical Sciences at the University of Oulu. He has written numerous journal articles on his research interests, which include density estimation and the implementation of cutting edge visualization tools. Dr. Klemelä is the author of Smoothing of Multivariate Data: Density Estimation and Visualization, also published by Wiley.
JUSSI KLEMELÄ, PhD, is Senior Research Fellow in the Department of Mathematical Sciences at the University of Oulu. He has written numerous journal articles on his research interests, which include density estimation and the implementation of cutting edge visualization tools. Dr. Klemelä is the author of Smoothing of Multivariate Data: Density Estimation and Visualization, also published by Wiley.
Preface xvii
Introduction xix
I.1 Estimation of Functionals of Conditional Distributions xx
I.2 Quantitative Finance xxi
I.3 Visualization xxi
I.4 Literature xxiii
PART I METHODS OF REGRESSION AND CLASSIFICATION
1 Overview of Regression and Classification 3
1.1 Regression 3
1.2 Discrete Response Variable 29
1.3 Parametric Family Regression 33
1.4 Classification 37
1.5 Applications in Quantitative Finance 42
1.6 Data Examples 52
1.7 Data Transformations 53
1.8 Central Limit Theorems 58
1.9 Measuring the Performance of Estimators 61
1.10 Confidence Sets 73
1.11 Testing 75
2 Linear Methods and Extensions 77
2.1 Linear Regression 78
2.2 Varying Coefficient Linear Regression 97
2.3 Generalized Linear and Related Models 102
2.4 Series Estimators 107
2.5 Conditional Variance and ARCH models 111
2.6 Applications in Volatility and Quantile Estimation 115
2.7 Linear Classifiers 124
3 Kernel Methods and Extensions 127
3.1 Regressogram 129
3.2 Kernel Estimator 130
3.3 Nearest Neighborhood Estimator 147
3.4 Classification with Local Averaging 148
3.5 Median Smoothing 151
3.6 Conditional Density Estimators 152
3.7 Conditional Distribution Function Estimation 158
3.8 Conditional Quantile Estimation 160
3.9 Conditional Variance Estimation 162
3.10 Conditional Covariance Estimation 176
3.11 Applications in Risk Management 181
3.12 Applications in Portfolio Selection 205
4 Semiparametric and Structural Models 229
4.1 Single Index Model 230
4.2 Additive Model 234
4.3 Other Semiparametric Models 237
5 Empirical Risk Minimization 241
5.1 Empirical Risk 243
5.2 Local Empirical Risk 247
5.3 Support Vector Machines 257
5.4 Stagewise Methods 259
5.5 Adaptive Regressograms 264
PART II VISUALIZATION
6 Visualization of Data 277
6.1 Scatter Plots 278
6.2 Histogram and Kernel Density Estimator 282
6.3 Dimension Reduction 284
6.4 Observations as Objects 288
7 Visualization of Functions 295
7.1 Slices 296
7.2 Partial Dependence Functions 296
7.3 Reconstruction of Sets 299
7.4 Level Set Trees 303
7.5 Unimodal Densities 326
7.5.1 Probability Content of Level Sets 327
7.5.2 Set Visualization 328
Appendix A: R Tutorial 329
A.1 Data Visualization 329
A.2 Linear Regression 331
A.3 Kernel Regression 332
A.4 Local Linear Regression 341
A.5 Additive Models: Backfitting 344
A.6 Single Index Regression 345
A.7 Forward Stagewise Modeling 347
A.8 Quantile Regression 349
References 351
Author Index 361
Topic Index 365
"Altogether, the book provides a very nice overview of nonparametric and semiparametric regression methods with interesting applications to problems in quantitative finance." (Mathematical Reviews, 1 October 2015)
INTRODUCTION
We study regression analysis and classification, as well as estimation of conditional variances, quantiles, densities, and distribution functions. The focus of the book is on nonparametric methods. Nonparametric methods are flexible and able to adapt to various kinds of data, but they can suffer from the curse of dimensionality and from the lack of interpretability. Semiparametric methods are often able to cope with quite high-dimensional data and they are often easier to interpret, but they are less flexible and their use may lead to modeling errors. In addition to terms “nonparametric estimator” and “semiparametric estimator”, we can use the term “structured estimator” to denote such estimators that arise, for example, in additive models. These estimators obey a structural restriction, whereas the term “semiparametric estimator” is used for estimators that have a parametric and a nonparametric component.
Nonparametric, semiparametric, and structured methods are well established and widely applied. There are, nevertheless, areas where a further work is useful. We have included three such areas in this book:
I.1 ESTIMATION OF FUNCTIONALS OF CONDITIONAL DISTRIBUTIONS
One of the main topics of the book are the kernel methods. Kernel methods are easy to implement and computationally feasible, and their definition is intuitive. For example, a kernel regression estimator is a local average of the values of the response variable. Local averaging is a general regression method. In addition to the kernel estimator, examples of local averaging include the nearest-neighbor estimator, the regressogram, and the orthogonal series estimator.
We cover linear regression and generalized linear models. These models can be seen as starting points to many semiparametric and structured regression models. For example, the single index model, the additive model, and the varying coefficient linear regression model can be seen as generalizations of the linear regression model or the generalized linear model.
Empirical risk minimization is a general approach to statistical estimation. The methods of empirical risk minimization can be used in regression function estimation, in classification, in quantile regression, and in the estimation of other functionals of the conditional distribution. The method of local empirical risk minimization is a method which can be seen as a generalization of the kernel regression.
A regular regressogram is a special case of local averaging, but the empirical choice of the partition leads to a rich class of estimators. The choice of the partition is made using empirical risk minimization. In the one- and two-dimensional cases a regressogram is usually less efficient than the kernel estimator, but in high-dimensional cases a regressogram can be useful. For example, a method to select the partition of a regressogram can be seen as a method of variable selection, if the chosen partition is such that it can be defined using only a subset of the variables. The estimators that are defined as a solution of an optimization problem, like the minimizers of an empirical risk, need typically be calculated with numerical methods. Stagewise algorithms can also be taken as a definition of an estimator, even without giving an explicit minimization problem which they solve.
A regression function is defined as the conditional expectation of the distribution of a response variable. The conditional expectation is useful in making predictions as well as in finding causal relationships. We cover also the estimation of the conditional variance and conditional quantiles. These are needed to give a more complete view of the conditional distribution. Also, the estimation of the conditional variance and conditional quantiles is needed in risk management, which is an important area of quantitative finance. The conditional variance can be estimated by estimating the conditional expectation of the squared random variable, whereas a conditional quantile is a special case of the conditional median. In the time series setting the standard approaches for estimating the conditional variance are the ARCH and GARCH modeling, but we discuss nonparametric alternatives. The GARCH estimator is close to a moving average, whereas the ARCH estimator is related to linear state space modeling.
In classification we are not interested in the estimation of functionals of a distribution, but the aim is to construct classification rules. However, most of the regression function estimation methods have a counterpart in classification.
I.2 QUANTITATIVE FINANCE
Risk management, portfolio selection, and option pricing can be identified as three important areas of quantitative finance. Parametric statistical methods have been dominating the statistical research in quantitative finance. In risk management, probability distributions have been modeled with the Pareto distribution or with distributions derived from the extreme value theory. In portfolio selection the multivariate normal model has been used together with the Markowitz theory of portfolio selection. In option pricing the Black-Scholes model of stock prices has been widely applied. The Black-Scholes model has also been extended to more general parametric models for the process of stock prices.
In risk management the p-quantile of a loss distribution has a direct interpretation as such threshold that the probability of the loss exceeding the threshold is less than p. Thus estimation of conditional quantiles is directly relevant for risk management. Unconditional quantile estimators do not take into account all available information, and thus in risk management it is useful to estimate conditional quantiles. The estimation of the conditional variance can be applied in the estimation of a conditional quantile, because in location-scale families the variance determines the quantiles. The estimation of conditional variance can be extended to the estimation of the conditional covariance or the conditional correlation.
We apply nonparametric regression function estimation in portfolio selection. The portfolio is selected either with the maximization of a conditional expected utility or with the maximization of a Markowitz criterion. When the collection of allowed portfolio weights is a finite set, then also classification can be used in portfolio selection. The squared returns are much easier to predict than the returns themselves, and thus in quantitative finance the focus has been in the prediction of volatility. However, it can be shown that despite the weak predictability of the returns, portfolio selection can profit from statistical prediction.
Option pricing can be formulated as a problem of stochastic control. We do not study the statistics of option pricing in detail, but give a basic framework for solving some option pricing problems nonparametrically.
I.3 VISUALIZATION
Statistical visualization is often considered as a visualization of the raw data. The visualization of the raw data can be a part of the exploratory data analysis, a first step to model building, and a tool to generate hypotheses about the data-generating mechanism. However, we put emphasis on a different approach to visualization. In this approach, visualization tools are associated with statistical estimators or inference procedures. For example, we estimate first a regression function and then try to visualize and describe the properties of this regression function estimate. The distinction between the visualization of the raw data and the visualization of the estimator is not clear when nonparametric function estimation is used. In fact, nonparametric function estimation can be seen as a part of exploratory data analysis.
The SiZer is an example of a tool that combines visualization and inference, see Chaudhuri & Marron (1999). This methodology combines formal testing for the existence of modes with the SiZer maps to find out whether a mode of a density estimate of a regression function estimate is really there.
Semiparametric function estimates are often easier to visualize than nonparametric function estimates. For example, in a single index model the regression function estimate is a composition of a linear function and a univariate function. Thus in a single index model we need only to visualize the coefficients of the linear function and a one-dimensional function. The ease of visualization gives motivation to study semiparametric methods.
CART, as presented in Breiman, Friedman, Olshen & Stone (1984), is an example of an estimation method whose popularity is not only due to its statistical properties but also because it is defined in terms of a binary tree that gives directly a visualization of the estimator. Even when it is possible to find estimators with better statistical properties than CART, the possibility to visualization gives motivation to use CART.
Visualization of nonparametric function estimates, such as kernel estimates, is challenging. For the visualization of completely nonparametric estimates, we can use level set tree-based methods, as presented in Klemelä (2009). Level set tree-based methods have found interest also in topological data analysis and in scientific visualization, and these methods have their origin in the concept of a Reeb graph,...
| Erscheint lt. Verlag | 5.5.2014 |
|---|---|
| Reihe/Serie | Wiley Series in Computational Statistics |
| Wiley Series in Computational Statistics | Wiley Series in Computational Statistics |
| Sprache | englisch |
| Themenwelt | Mathematik / Informatik ► Informatik ► Theorie / Studium |
| Mathematik / Informatik ► Mathematik ► Statistik | |
| Mathematik / Informatik ► Mathematik ► Wahrscheinlichkeit / Kombinatorik | |
| Technik | |
| Schlagworte | additive models and trees • Angewandte Wahrscheinlichkeitsrechnung u. Statistik • Applied Probability & Statistics • Boosting • Kernel Methods • linear methods • Nearest Neighbor Methods • Regression Analysis • Regressionsanalyse • R (Programm) • Statistical Learning: A Visualization Approach Using R • Statistical Software / R • Statistics • Statistik • Statistiksoftware / R • Support Vector Machines |
| ISBN-10 | 1-118-59350-2 / 1118593502 |
| ISBN-13 | 978-1-118-59350-9 / 9781118593509 |
| Informationen gemäß Produktsicherheitsverordnung (GPSR) | |
| Haben Sie eine Frage zum Produkt? |
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich