Mathematics for Digital Science 3 (eBook)
444 Seiten
Wiley-Iste (Verlag)
978-1-394-38853-0 (ISBN)
Over the past century, advancements in computer science have consistently resulted from extensive mathematical work. Even today, innovations in the digital domain continue to be grounded in a strong mathematical foundation. To succeed in this profession, both today's students and tomorrow's computer engineers need a solid mathematical background.
The goal of this book series is to offer a solid foundation of the knowledge essential to working in the digital sector. Across three volumes, it explores fundamental principles, digital information, data analysis, and optimization. Whether the reader is pursuing initial training or looking to deepen their expertise, the Mathematics for Digital Science series revisits familiar concepts, helping them refresh and expand their knowledge while also introducing equally essential, newer topics.
Gérard-Michel Cochard is Professor Emeritus at Université de Picardie Jules Verne, France, where he has held various senior positions. He has also served at the French Ministry of Education and the CNAM (Conservatoire National des Arts et Métiers). His research is conducted at the Eco-PRocédés, Optimisation et Aide à la Décision (EPROAD) laboratory, France.
Mhand Hifi is Professor of Computer Science at Université de Picardie Jules Verne, France, where he heads the EPROAD UR 4669 laboratory and manages the ROD team. As an expert in operations research and NP-hard problem-solving, he actively contributes to numerous international conferences and journals in the field.
1
Linear Modeling for Two-Dimensional Data
CONCEPTS COVERED IN THIS CHAPTER. –
This brief chapter serves as a reminder of the concepts presented in detail in Volume 1. It primarily provides an overview of basic statistical analysis tools, particularly linear regression and correlation for two-dimensional data.
References: [SAP 11].
1.1. Basic statistics
Consider a population of n elements. Each element i is characterized by the value of a variable x = xi. The n values xi constitute a one-dimensional statistical series, whose characteristics are:
- The average is defined by:
In this definition, it is assumed that all elements have the same statistical weight If the weights are not equal, the following expression is used:
where pi represents the statistical weight of individual i.
- Variance v(x) is defined as the average of the squares of the deviations from the average:
Huygens’ theorem provides another method for calculating variance:
This relationship is often summarized as “the average of squares minus the square of the mean”.
- Standard deviation σ(x) is defined as the square root of the variance:
EXAMPLE 1.1.–
Consider the statistical series shown in Figure 1.1, which represents the number of rainy days over 10 consecutive years at a given location.
Figure 1.1. Statistical series
The average can be easily calculated by assigning equal statistical weight to each measurement. The average of the squares the variance v(x) = 1284 and the standard deviation σ(x) = 35,83 are also determined.
Figure 1.2 shows the graphical representation of the statistical series in the form of a histogram. This histogram illustrates the distribution of data regarding the number of rainy days over the 10 years.
The average is a measure of the position of the statistical series along the number of days axis, while the standard deviation serves as a dispersion parameter, providing an indicator of the spread of the statistical series.
Figure 1.2. Graphical representation of the statistical series
1.2. Linear adjustment
Now, consider a two-dimensional statistical series, where each element is characterized by the values of two variables, x and y. For each variable, various statistical measures can be calculated, such as the average, variance and standard deviation.
To graphically represent this two-dimensional series, a two-dimensional Cartesian coordinate system is used. The x-axis represents the variable x. and the y-axis represents the variable y. Each element i of the series is represented as a point (xi,yi) in this coordinate system, where the coordinate xi corresponds to the value of the variable x, and the coordinate yi corresponds to the value of the variable y. Figure 1.3 shows examples of graphical representations of two two-dimensional series.
Figure 1.3. Example of two two-dimensional series
When observing a two-dimensional series and detecting a certain structure in the set of representative points, we may be inclined to model this structure using a curve. This involves finding a mathematical function that best describes the relationships between the variables x and y. In the examples shown in Figure 1.3, a straight line can be proposed for modeling the first example, and a parabola for the second example, as shown in Figure 1.4. These models are adjustments that simplify the representation of trends or relationships observed in the data.
Figure 1.4. Examples of adjustments
The linear adjustment is the simplest of all analytical adjustments. It involves obtaining the equation of the straight line that “best fit” the set of representative points of the series.
A classic method for obtaining the equation of the line in linear adjustment is the least squares method. This method involves minimizing the sum of the squares of the deviations between the observed values and the values predicted by the line. For the variables x and y, the respective means, denoted by and are calculated assuming equal statistical weight for each value of i:
Next, the deviations from these averages for each point in the series are calculated (convenient to work with “centered” coordinates):
It is easy to verify that:
The squares of these deviations are obtained by squaring these values:
The least squares method involves finding the coefficients a and b of the equation of the line y = ax + b. Alternatively, using the centered coordinates, the equation becomes Y′ = AX + B, where for each of the representative points, The relationship between (A, B) and (a,b) is:
The goal is to optimize the sum of squared deviations to a minimum. Mathematically, this involves minimizing the following objective function:
In other words, the aim is to minimize the following quantity:
The minimum of M corresponds to the cancellation of the first derivatives with respect to A and B, the only unknowns in M. Taking the partial derivatives:
which leads to:
These conditions lead to the following equations:
EXAMPLE 1.2.–
Let us consider the statistical series showing the number of rainy days (x) and umbrella sales in local currency (y) (see Figure 1.5).
Figure 1.5. Statistical series (x, y)
Figure 1.6. Detailed adjustment calculations
Figure 1.7. Adjustment line.
Figure 1.6 summarizes the calculations required to determine the best-fit adjustment line, with the values of a = 1311.53 and b = 8831.78. Figure 1.7 displays the best-fit adjustment line.
1.3. Linear correlation
Figure 1.8. Different correlation situations
In the case of adjustment, the goal is to express y as a function of x. This choice is arbitrary, as x could be expressed as a function of y. In this case, two adjustment lines would be obtained, both intersecting at the point
By treating the variables x and y symmetrically, the concept of correlation between these variables can be introduced. Correlation measures the relationship between two variables and quantifies the possible influence of one on the other. Figure 1.8 presents various examples of scatter plots to illustrate different correlation situations.
In particular, in the case of linear correlation, it is interesting to note that when the two best-fit adjustment lines, y = f(x) and x = f′(y), coincide, this indicates maximum linear correlation between the variables x and y.
EXAMPLE 1.3.–
For the series in Example 1.2, the following two best-fit adjustment lines are obtained:
Figure 1.9 shows that the two straight lines are very close to each other, indicating a strong correlation between the variables.
Figure 1.9. Adjustment lines.
The two best-fit adjustment lines have direction coefficients a and a′. If the lines coincide, then equivalently, a × a′ = 1. Now,
The maximum correlation corresponds to the following equality (known as the Cauchy-Schwarz equality):
The analytical definition of the linear correlation is:
which is simply
EXAMPLE 1.3 (CONTINUED).–
Let us return to Example 1.3. The equations of the adjustment lines are:
The linear correlation coefficient is close to 1, i.e. r = 0.98 ≈ 1. This indicates an almost maximal linear correlation between the variables x and y. In this case, a strong relationship exists between x and y.
The linear correlation coefficient r is often written in another form, using the standard deviations σ(x) and σ(y):
Furthermore, the covariance cov(x, y) is defined by:
It follows that:
In the case of linear fitting, the expression for M is:
The minimum is found by replacing A and B with the values obtained:
By definition, M and therefore Mmin are positive or zero quantities. This leads to the Cauchy-Schwarz inequality:
Figure 1.10. Variations in the linear correlation coefficient
This inequality implies that the linear correlation coefficient lies in the range –1 ≤ r ≤ 1. This means that the linear correlation coefficient can take values between -1 and 1, inclusive. Figure 1.10 shows such a correlation scale, where different ranges of r values are...
| Erscheint lt. Verlag | 19.6.2025 |
|---|---|
| Reihe/Serie | ISTE Invoiced |
| Sprache | englisch |
| Themenwelt | Mathematik / Informatik ► Informatik |
| Schlagworte | Computer Science • Data Analysis • digital information • Digital Science • Mathematics • Optimization |
| ISBN-10 | 1-394-38853-5 / 1394388535 |
| ISBN-13 | 978-1-394-38853-0 / 9781394388530 |
| Informationen gemäß Produktsicherheitsverordnung (GPSR) | |
| Haben Sie eine Frage zum Produkt? |
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich