Conformal Prediction for Reliable Machine Learning (eBook)

Theory, Adaptations and Applications

Vineeth Balasubramanian, Shen-Shyang Ho, Vladimir Vovk (Herausgeber)

eBook Download: EPUB

2014 | 1. Auflage
334 Seiten
Elsevier Science (Verlag)
978-0-12-401715-3 (ISBN)

The conformal predictions framework is a recent development in machine learning that can associate a reliable measure of confidence with a prediction in any real-world pattern recognition application, including risk-sensitive applications such as medical diagnosis, face recognition, and financial risk prediction. Conformal Predictions for Reliable Machine Learning: Theory, Adaptations and Applications captures the basic theory of the framework, demonstrates how to apply it to real-world problems, and presents several adaptations, including active learning, change detection, and anomaly detection. As practitioners and researchers around the world apply and adapt the framework, this edited volume brings together these bodies of work, providing a springboard for further research as well as a handbook for application in real-world problems. - Understand the theoretical foundations of this important framework that can provide a reliable measure of confidence with predictions in machine learning - Be able to apply this framework to real-world problems in different machine learning settings, including classification, regression, and clustering - Learn effective ways of adapting the framework to newer problem settings, such as active learning, model selection, or change detection

The conformal predictions framework is a recent development in machine learning that can associate a reliable measure of confidence with a prediction in any real-world pattern recognition application, including risk-sensitive applications such as medical diagnosis, face recognition, and financial risk prediction. Conformal Predictions for Reliable Machine Learning: Theory, Adaptations and Applications captures the basic theory of the framework, demonstrates how to apply it to real-world problems, and presents several adaptations, including active learning, change detection, and anomaly detection. As practitioners and researchers around the world apply and adapt the framework, this edited volume brings together these bodies of work, providing a springboard for further research as well as a handbook for application in real-world problems. - Understand the theoretical foundations of this important framework that can provide a reliable measure of confidence with predictions in machine learning- Be able to apply this framework to real-world problems in different machine learning settings, including classification, regression, and clustering- Learn effective ways of adapting the framework to newer problem settings, such as active learning, model selection, or change detection

Chapter 1

The Basic Conformal Prediction Framework

Vladimir Vovk, Computer Learning Research Centre, Department of Computer Science, Royal Holloway, University of London, United Kingdom

Abstract

The aim of this chapter is to give a gentle introduction to the method of conformal prediction. It defines conformal predictors and discusses their properties, leaving various extensions of conformal predictors for Chapter 2.

Keywords

Conformal Prediction; Validity; Efficiency; Classification; Regression; Exchangeability

Acknowledgments

I am grateful to Sasha Tsybakov for his advice. The empirical studies described in this chapter used the R language and the R package PredictiveRegression; I am grateful to Ilia Nouretdinov for cowriting (and writing the first version of) the package. This work was supported in part by the Cyprus Research Promotion Foundation (TPE/ORIZO/0609(BIE)/24) and EPSRC (EP/K033344/1).

The aim of this chapter is to give a gentle introduction to the method of conformal prediction. It will define conformal predictors and discuss their properties, leaving various extensions of conformal prediction for Chapter 2.

1.1 The Basic Setting and Assumptions

In the bulk of this chapter we consider the basic setting where we are given a training set of examples, and our goal is to predict a new example. We will assume that the examples are elements of an example space (formally, this is assumed to be a measurable space, i.e., a set equipped with a -algebra). We always assume that contains more than one element, Z|>1, and that each singleton is measurable. The examples in the training set will usually be denoted 1,…,zl and the example to be predicted, (test example) l+1. Mathematically the training set is a sequence, z1,…,zl), not a set.

The basic setting might look restrictive, but later in this chapter we will see that it covers the standard problems of classification (Section 1.6) and regression (Section 1.7); we will also see that the algorithms developed for our basic setting can be applied in the online (Section 1.8) and batch (Section 2.4) modes of prediction.

We will make two main kinds of assumptions about the way the examples i,i=1,…,l+1, are generated. Let us fix the size ≥1 of the training set for now. Under the randomness assumption, the +1 examples are generated independently from the same unknown probability distribution on . Under the exchangeability assumption, the sequence z1,…,zl+1) is generated from a probability distribution on l+1 that is exchangeable: for any permutation of the set 1,…,l+1}, the distribution of the permuted sequence zπ(1),…,zπ(l+1)) is the same as the distribution of the original sequence z1,…,zl+1). It is clear that the randomness assumption implies the exchangeability assumption, and in Section 1.5 we will see that the exchangeability assumption is much weaker. (On the other hand, in the online mode of prediction the difference between the two assumptions almost disappears, as we will see in Section 1.8.)

The randomness assumption is a standard assumption in machine learning. Methods of conformal prediction, however, usually work for the weaker exchangeability assumption. In some important cases even the exchangeability assumption can be weakened; see, for example, Chapters 8 and 9 of [365] dealing with online compression modeling.

1.2 Set and Confidence Predictors

In this book we are concerned with reliable machine learning, and so consider prediction algorithms that output a set of elements of as their prediction; such a set is called a prediction set (or a set prediction). The statement implicit in a prediction set is that it contains the test example l+1, and the prediction set is regarded as erroneous if and only if it fails to contain l+1. We will be looking for a compromise between reliability and informativeness of the prediction sets output by our algorithms; an example of prediction sets we try to avoid is the whole of ; it is absolutely reliable but not informative.

A set predictor is a function that maps any sequence z1,…,zl)∈Zl to a set (z1,…,zl)⊆Z and satisfies the following measurability condition: the set

z1,…,zl+1)∣zl+1∈Γ(z1,…,zl) (1.1)

(1.1)

is measurable in l+1.

We will often consider nested families of set predictors depending on a parameter ∈[0,1], which we call the significance level, reflecting the required reliability of prediction. Our parameterization of reliability will be such that smaller values of correspond to greater reliability. (This is just a convention: e.g., if we used the confidence level -∊ as the parameter, larger values of the parameter would correspond to greater reliability.)

Formally, a confidence predictor is a family Γ∊∣∊∈[0,1]) of set predictors that is nested in the following sense: whenever ≤∊1≤∊2≤1,

∊1(z1,…,zl)⊇Γ∊2(z1,…,zl). (1.2)

(1.2)

1.2.1 Validity and Efficiency of Set and Confidence Predictors

The two main indicators of the quality of set and confidence predictors are what we call their validity (how reliable they are) and efficiency (how informative they are).1 We say that a set predictor is exactly valid at a significance level ∈[0,1] if, under any power probability distribution =Ql+1 on l+1, the probability of the event l+1∉Γ(z1,…,zl) that makes an error is . However, it is obvious that the property of exact validity is impossible to achieve unless is either 0 or 1:

Proposition 1.1

At any level ∈(0,1), no set predictor is exactly valid.

Proof

Let be a probability distribution on that is concentrated at one point. Then any set predictor makes a mistake with probability either 0 or 1.

In Section 1.8 we will see that exact validity can be achieved using randomization.

The requirement that can be achieved (even trivially) is that of “conservative validity.” A set predictor is said to be conservatively valid (or simply valid) at a significance level ∈[0,1] if, under any power probability distribution =Ql+1 on l+1, the probability of l+1∉Γ∊(z1,…,zl) does not exceed . The trivial way to achieve this, for any ∈[0,1], is to set (z1,…,zl)≔Z for all 1,…,zl. A confidence predictor Γ∊∣∊∈[0,1]) is (conservatively) valid if each of its constituent set predictors ∊ is valid at the significance level . Conformal predictors will provide nontrivial conservatively, and in some sense almost exactly, valid confidence predictors. In the following chapter we will discuss other notions of validity.

By the efficiency of set and confidence predictors we mean the smallness of the prediction sets they output. This is a vague notion, but in any case it can be meaningful only if we impose some restrictions on the predictors that we consider. Without restrictions, the trivial set predictor (z1,…,zl)≔∅,∀z1,…,zl, and the trivial confidence predictor ∊(z1,…,zl)≔∅,∀z1,…,zl,∊, are the most efficient ones. We will be looking for the most efficient confidence predictors in the class of valid confidence predictors; different notions of validity (including “conditional validity” considered in the next chapter) and different formalizations of the notion of efficiency will lead to different solutions to this problem.

1.3 Conformal Prediction

Let ∈N, where ≔{1,2,…} is the set of natural numbers. A...

Erscheint lt. Verlag	23.4.2014
Sprache	englisch
Themenwelt	Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik
ISBN-10	0-12-401715-0 / 0124017150
ISBN-13	978-0-12-401715-3 / 9780124017153

Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?

EPUB (Adobe DRM)

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.