Data Analysis and Applications 3 (eBook)
John Wiley & Sons (Verlag)
978-1-119-72186-4 (ISBN)
Andreas Makrides is Associate Lecturer of Statistics at the University of Central Lancashire, Cyprus (UClan) and conducted postdoctoral research at the Laboratoire de Mathematiques Raphael Salem, Universite de Rouen, France. Alex Karagrigoriou is Professor of Probability and Statistics at the University of the Aegean, Greece. He is also the faculty?s Head of Graduate Studies and Director of the in-house Laboratory of Statistics and Data Analysis. Christos H. Skiadas is former vice-Rector at the Technical University of Crete, Greece and founder of its Data Analysis and Forecasting Laboratory. He continues his research in ManLab, in the faculty?s Department of Production Engineering and Management.
Andreas Makrides is Associate Lecturer of Statistics at the University of Central Lancashire, Cyprus (UClan) and conducted postdoctoral research at the Laboratoire de Mathematiques Raphael Salem, Universite de Rouen, France. Alex Karagrigoriou is Professor of Probability and Statistics at the University of the Aegean, Greece. He is also the faculty?s Head of Graduate Studies and Director of the in-house Laboratory of Statistics and Data Analysis. Christos H. Skiadas is former vice-Rector at the Technical University of Crete, Greece and founder of its Data Analysis and Forecasting Laboratory. He continues his research in ManLab, in the faculty?s Department of Production Engineering and Management.
Part 1. Computational Data Analysis and Methods
1. Semi-supervised Learning Based on Distributionally Robust Optimization, Jose Blanchet and Yang Kang.
2. Updating of PageRank in Evolving Treegraphs, Benard Abola, Pitos Seleka Biganda, Christopher Engstorm, John Magero Mango, Godwin Kakuba and Sergei Silvestrov.
3. Exploring The Relationship Between Ordinary PageRank, Lazy PageRank and Random Walk with Backstep PageRank for Different Graph Structures, Pitos Seleka Biganda, Benard Abola, Christopher Engstorm, John Magero Mango, Godwin Kakuba and Sergei Silvestrov.
4. On the Behavior of Alternative Splitting Criteria for CUB Model-based Trees, Carmela Cappelli, Rosaria Simone and Francesca Di Iorio.
5. Investigation on Life Satisfaction Through (Stratified) Chain Regression Graph Models, Federica Nicolussi and Manuela Cazzaro.
Part 2. Classification Data Analysis and Methods
6. Selection of Proximity Measures for a Topological Correspondence Analysis, Rafik Abdelssam.
7. Support Vector Machines: A Review and Applications in Statistical Process Monitoring, Anastasios Apsemidis and Stelios Psarakis.
8. Binary Classification Techniques: An Application on Simulated and Real Bio-medical Data, Fragkiskos G. Bersimis, Iraklis Varlamis, Malvina Vamvakari and Demosthenes B. Panagiotakos.
9. Some Properties of the Multivariate Generalized Hyperbolic Models, Stergios B. Fotopoulos, Venkata K. Jandhyala and Alex Paparas.
10. On Determining the Value of Online Customer Satisfaction Ratings ? A Case-based Appraisal, Jim Freeman.
11. Projection Clustering Unfolding: A New Algorithm for Clustering Individuals or Items in a Preference Matrix, Mariangela Sciandra, Antonio D?Ambrosio and Antonella Plaia.
1
Semi-supervised Learning Based on Distributionally Robust Optimization
We propose a novel method for semi-supervised learning (SSL) based on data-driven distributionally robust optimization (DRO) using optimal transport metrics. Our proposed method enhances generalization error by using the non-labeled data to restrict the support of the worst case distribution in our DRO formulation. We enable the implementation of our DRO formulation by proposing a stochastic gradient descent algorithm, which allows us to easily implement the training procedure. We demonstrate that our semi-supervised DRO method is able to improve the generalization error over natural supervised procedures and state-of-the-art SSL estimators. Finally, we include a discussion on the large sample behavior of the optimal uncertainty region in the DRO formulation. Our discussion exposes important aspects such as the role of dimension reduction in SSL.
1.1. Introduction
We propose a novel method for semi-supervised learning (SSL) based on data-driven distributionally robust optimization (DRO) using an optimal transport metric – also known as Earth’s moving distance (see [RUB 00]).
Our approach enhances generalization error by using the unlabeled data to restrict the support of the models, which lie in the region of distributional uncertainty. It is intuitively felt that our mechanism for fitting the underlying model is automatically tuned to generalize beyond the training set, but only over potential instances which are relevant. The expectation is that predictive variables often lie in lower dimensional manifolds embedded in the underlying ambient space; thus, the shape of this manifold is informed by the unlabeled data set (see Figure 1.1 for an illustration of this intuition).
Figure 1.1. Idealization of the way in which the unlabeled predictive variables provide a proxy for an underlying lower dimensional manifold. Large red dots represent labeled instances and small blue dots represent unlabeled instances. For a color version of this figure, see www.iste.co.uk/makrides/data3.zip
To enable the implementation of the DRO formulation, we propose a stochastic gradient descent (SGD) algorithm, which allows us to implement the training procedure at ease. Our SGD construction includes a procedure of independent interest which, we believe, can be used in more general stochastic optimization problems.
We focus our discussion on semi-supervised classification but the modeling and computational approach that we propose can be applied more broadly as we shall illustrate in section 1.4.
We now explain briefly the formulation of our learning procedure. Suppose that the training set is given bywhereis the label of the i-th observation and we assume that the predictive variable, Xi, takes values in ℝd. We use n to denote the number of labeled data points.
In addition, we consider a set of unlabeled observations,We build the setThat is, we replicate each unlabeled data point twice, recognizing that the missing label could be any of the two available alternatives. We assume that the data must be labeled either −1 or 1.
We then construct the set XN = Dn ∪ εN −n which, in simple words, is obtained by just combining both the labeled data and the unlabeled data with all the possible labels that can be assigned. The cardinality of XN, denoted as |XN|, is equal to 2(N − n) + n (for simplicity, we assume that all of the data points and the unlabeled observations are distinct).
Let us define P (XN) to be the space of probability measures whose support is contained in XN. We use Pn to denote the empirical measure supported on the set Dn, so Pn ∈ P(XN). In addition, we write EP (·) to denote the expectation associated with a given probability measure P.
Let us assume that we are interested in fitting a classification model by minimizing an expected loss function l (X, Y, β), where β is a parameter which uniquely characterizes the underlying model. We shall assume that l (X, Y, ·) is a convex function for each fixed (X, Y ). The empirical risk associated with the parameter β is
The loss function l (X, Y, β) is associated with the machine learning model that we consider. For example, we take square loss function for ordinary least square regression, the absolute loss function for quantile regression and log-exponential loss function for logistic regression. In general, we require the convexity of the loss function to have a unique optimal model. But some popular learning algorithms, like neural network, do not have convex loss function. And convexity is not required for our SSL-DRO formalization. For example, some recent works (see in [SIN 17, VOL 18]) extend the DRO formalization to the deep-learning models with non-convex loss function as a tool to avoid overfitting.
In this chapter, we propose the estimate of β by solving the DRO problem where Dc (·) is a suitably defined discrepancy between Pn and any probability measure P ∈ P (XN), which is within a certain tolerance measured by δ∗.
So, intuitively, [1.2] represents the value of a game in which the outer player (we) will choose β and the adversary player (nature) will rearrange the support and the mass of Pn within a budget measured by δ∗. We then wish to minimize the expected risk, regardless of the way in which the adversary might corrupt (within the prescribed budget) the existing evidence. In formulation [1.2], the adversary is crucial to ensure that we endow our mechanism for selecting β with the ability to cope with the risk impact of out-of-sample (i.e. out of the training set) scenarios. We denote the formulation in [1.2] as semi-supervised distributionally robust optimization (SSL-DRO).
The criterion that we use to define Dc (·) is based on the theory of optimal transport, and it is closely related to the concept of Wasserstein distance (see section 1.3). The choice of Dc (·) is motivated by recent results, which show that popular estimators such as regularized logistic regression, support vector machines (SVMs), square-root Lasso (SR-Lasso), group Lasso, and adaptive regularized regression admit a DRO representation exactly equal to [1.2] in which the support XN is replaced by ℝd + 1 (see [BLA 16b, BLA 17a, BLA 17b] and also equation [1.10] in this chapter).
In view of these representation results for supervised learning algorithms, the inclusion of XN in our DRO formulation [1.2] provides a natural SSL approach in the context of classification and regression. The goal of this chapter is to enable the use of the distributionally robust training framework [1.2] as a SSL technique. We will show that estimating β via [1.2] may result in a significant improvement in generalization relative to natural supervised learning counterparts (such as regularized logistic regression and SR-Lasso). The potential improvement is illustrated in section 1.4. Moreover, we show via numerical experiments in section 1.5 that our method is able to improve upon state-of-the-art SSL algorithms.
As a contribution of independent interest, we construct a stochastic gradient descent algorithm to approximate the optimal selection,minimizing [1.2].
An important parameter when applying [1.2] is the size of the uncertainty region, which is parameterized by δ∗. We apply cross-validation to calibrate δ∗, but we also discuss the non-parametric behavior of an optimal selection of δ∗ (according to a suitably defined optimality criterion explained in section 1.6. as n, N → ∞.
In section 1.2, we provide a broad overview of alternative procedures in the SSL literature, including recent approaches which are related to robust optimization. A key role in our formulation is played by δ∗, which can be seen as a regularization parameter. This identification is highlighted in the form of [1.2] and the DRO representation of regularized logistic regression, which we recall in [1.10]. The optimal choice of δ∗ ensures statistical consistency as n, N → ∞.
Similar robust optimization formulations to [1.2] for machine learning have been investigated in the literature recently. For example, connections between robust optimization and machine learning procedures such as Lasso and SVMs have been studied in the literature (see [XU 09]). In contrast to this literature, the use of distributionally robust uncertainty allows us to discuss the optimal size of the uncertainty region as the sample size increases (as we shall explain in section 1.6). The work of [SHA 15] is among the first to study DRO representations based on optimal transport, but they do not study the...
| Erscheint lt. Verlag | 9.4.2020 |
|---|---|
| Sprache | englisch |
| Themenwelt | Mathematik / Informatik ► Mathematik ► Statistik |
| Mathematik / Informatik ► Mathematik ► Wahrscheinlichkeit / Kombinatorik | |
| Wirtschaft ► Betriebswirtschaft / Management ► Unternehmensführung / Management | |
| Schlagworte | alternative splitting criteria • cub model-based trees • Data Analysis • Datenanalyse • different graph structures • Distributionally Robust Optimization • francesca di iorio • john magero mango • pitos seleka biganda • real bio-medical data • Statistical Process Monitoring • Statistics • Statistik |
| ISBN-10 | 1-119-72186-5 / 1119721865 |
| ISBN-13 | 978-1-119-72186-4 / 9781119721864 |
| Informationen gemäß Produktsicherheitsverordnung (GPSR) | |
| Haben Sie eine Frage zum Produkt? |
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich