Linear Genetic Programming - Markus F. Brameier, Wolfgang Banzhaf

Blick ins Buch

Linear Genetic Programming (eBook)

Markus F. Brameier, Wolfgang Banzhaf (Autoren)

eBook Download: PDF

2007
XV, 315 Seiten
Springer US (Verlag)
978-0-387-31030-5 (ISBN)

Lese- und Medienproben

Ebook-Leseprobe (PDF)

Linear Genetic Programming presents a variant of Genetic Programming that evolves imperative computer programs as linear sequences of instructions, in contrast to the more traditional functional expressions or syntax trees. Typical GP phenomena, such as non-effective code, neutral variations, and code growth are investigated from the perspective of linear GP. This book serves as a reference for researchers; it includes sufficient introductory material for students and newcomers to the field.

Wolfgang Banzhaf is a professor of Computer Science at the Department of Computer Science of Memorial University of Newfoundland, Canada, and head of the department since 2003. Prior to that, he served for 10 years as Associate Professor for Applied Computer Science in the Department of Computer Science at University of Dortmund, Germany. From 1989 to 1993 he was a researcher with Mitsubishi Electric Corp., first in MELCO's Central Research Lab in Japan, then in the United States at Mitsubishi Electric Research Labs Inc., Cambridge, MA. Between 1985 and 1989 he was a postdoc in the Department of Physics, University of Stuttgart, Germany. He holds a PhD in Physics from the University of Karlruhe in Germany. His research interests are in the field of artificial evolution and self-organization studies. He has recently become more involved with bioinformatics.

Linear Genetic Programming presents a variant of genetic programming (GP) that evolves imperative computer programs as linear sequences of instructions, in contrast to the more traditional functional expressions or syntax trees. Primary characteristics of linear program structure are exploited to achieve acceleration of both execution time and evolutionary progress. Online analysis and optimization of program code lead to more efficient techniques and contribute to a better understanding of the method and its parameters. In particular, the reduction of structural variation step size and non-effective variations play a key role in finding higher quality and less complex solutions. Typical GP phenomena, such as non-effective code, neutral variations, and code growth are investigated from the perspective of linear GP.This book serves as a reference for researchers; it also contains sufficient introductory material for students and those who are new to the field.

Markus Brameier received a PhD degree in Computer Science from the Department of Computer Science at University of Dortmund, Germany,in 2004. From 2003 to 2004 he was a postdoctoral fellow at the Stockholm Bioinformatics Center (SBC), a collaboration between Stockholm University, the Royal Institute of Technology, and Karolinska Institute, in Sweden. Currently he is Assistant Professor at the Bioinformatics Research Center (BiRC) of the University of Aarhus in Denmark. His primary research interests are in bioinformatics and genetic programming. Wolfgang Banzhaf is a professor of Computer Science at the Department of Computer Science of Memorial University of Newfoundland, Canada, and head of the department since 2003. Prior to that, he served for 10 years as Associate Professor for Applied Computer Science in the Department of Computer Science at University of Dortmund, Germany. From 1989 to 1993 he was a researcher with Mitsubishi Electric Corp., first in MELCO’s Central Research Lab in Japan, then in the United States at Mitsubishi Electric Research Labs Inc., Cambridge, MA. Between 1985 and 1989 he was a postdoc in the Department of Physics, University of Stuttgart, Germany. He holds a PhD in Physics from the University of Karlruhe in Germany. His research interests are in the field of artificial evolution and self-organization studies. He has recently become more involved with bioinformatics.

Contents 8
Preface 12
About the Authors 15
INTRODUCTION 16
1.1 Evolutionary Algorithms 16
1.2 Genetic Programming 18
1.3 Linear Genetic Programming 21
1.4 Motivation 23
PART I FUNDAMENTAL ANALYSIS 26
BASIC CONCEPTS OF LINEAR GENETIC PROGRAMMING 28
2.1 Representation of Programs 28
2.1.1 Coding of Instructions 30
2.1.2 Instruction Set 31
2.1.3 Branching Concepts 33
2.1.4 Advanced Branching Concepts 35
2.1.5 Iteration Concepts 37
2.1.6 Modularization Concepts 38
2.2 Execution of Programs 40
2.2.1 Runtime Comparison 41
2.2.2 Translation 43
2.3 Evolution of Programs 44
2.3.1 Initialization 46
2.3.2 Selection 47
2.3.3 Reproduction 48
2.3.4 Variation 48
CHARACTERISTICS OF THE LINEAR REPRESENTATION 50
3.1 E.ective Code and None.ective Code 50
3.2 Structural Introns and Semantic Introns 52
3.2.1 Detecting and Removing Structural Introns 53
3.2.2 Avoiding Semantic Introns 55
3.2.3 Detecting Semantic Introns 59
3.2.4 Symbolic Simpli.cation 61
3.3 Graph Interpretation 62
3.3.1 Variation E.ects 66
3.3.2 Interpretation of Branches 67
3.3.3 Evaluation Order 69
3.3.4 Tree Interpretation 70
3.4 Analysis of Program Structure 71
3.5 Graph Evolution 75
3.6 Summary and Conclusion 76
A COMPARISON WITH NEURAL NETWORKS 78
4.1 Medical Data Mining 78
4.2 Benchmark Data sets 79
4.3 Experimental Setup 4.3.1 Genetic Programming 80
4.3.2 Population Structure 82
4.3.3 Neural Networks 83
4.4 Experiments and Comparison 4.4.1 Generalization Performance 84
4.4.2 E.ective Training Time 86
4.4.3 Acceleration of Absolute Runtime 86
4.4.4 Acceleration of E.ective Training Time 87
4.4.5 Further Comparison 89
4.5 Summary and Conclusion 89
PART II METHOD DESIGN 90
LINEAR GENETIC OPERATORS I – SEGMENT VARIATIONS 92
5.1 Variation E.ects 93
5.1.1 Semantic Variation E.ects 93
5.1.2 Structural Variation E.ects 94
5.2 E.ective Variation and Evaluation 94
5.3 Variation Step Size 95
5.4 Causality 97
5.4.1 Self-Adaptation 100
5.5 Selection of Variation Points 101
5.6 Characteristics of Variation Operators 102
5.7 Segment Variation Operators 104
5.7.1 Linear Crossover 104
5.7.2 One-Point Crossover 107
5.7.3 One-Segment Recombination 107
5.7.4 E.ective Recombination 109
5.7.5 Segment Mutations 110
5.7.6 Explicit Introns 111
5.7.7 Building Block or Macro Mutation? 112
5.8 Experimental Setup 5.8.1 Benchmark Problems 114
5.8.2 Parameter Settings 115
5.9 Experiments 117
5.9.1 Comparison of Recombination Operators 117
5.9.2 Comparison with Segment Mutations 121
5.9.3 Crossover Rate 123
5.9.4 Analysis of Crossover Parameters 125
5.9.5 Explicit Introns 129
5.10 Summary and Conclusion 133
LINEAR GENETIC OPERATORS II – INSTRUCTION MUTATIONS 134
6.1 Minimum Mutation Step Size 134
6.2 Instruction Mutation Operators 136
6.2.1 Macro Mutations 137
6.2.2 Micro Mutations 138
6.2.3 E.ective Mutations 139
6.2.4 Minimum E.ective Mutations 141
6.2.5 Free Mutations 142
6.2.6 Explicit Induction of Neutral Mutations 142
6.3 Experimental Setup 6.3.1 Benchmark Problems 144
6.3.2 Parameter Settings 145
6.4 Experiments 146
6.4.1 Comparison of Instruction Mutations 146
6.4.2 Comparison with Segment Variations 153
6.4.3 Explicit Growth Bias 154
6.4.4 Number of Mutation Points 156
6.4.5 Self-Adaptation 159
6.4.6 Distribution of Mutation Points 160
6.5 Summary and Conclusion 163
ANALYSIS OF CONTROL PARAMETERS 164
7.1 Number of Registers 164
7.1.1 Initialization of Registers 168
7.1.2 Constant Registers 171
7.2 Number of Output Registers 171
7.3 Rate of Constants 172
7.4 Population Size 174
7.5 Maximum Program Length 177
7.6 Initialization of Linear Programs 179
7.7 Constant Program Length 184
7.8 Summary and Conclusion 185
A COMPARISON WITH TREE-BASED GENETIC PROGRAMMING 188
8.1 Tree-Based Genetic Programming 188
8.1.1 Tree Genetic Operators 189
8.1.2 Initialization of Tree Programs 191
8.2 Benchmark Problems 192
8.2.1 GP Benchmarks ( 192
8.2.2 Bioinformatics Problems ( 193
8.2.3 Generalization Data 195
8.3 Experimental Setup 196
8.3.1 A Multi-Representation GP System 196
8.3.2 Complexity of Programs 197
8.3.3 Parameter Settings 198
8.4 Experiments and Comparison 8.4.1 Prediction Quality and Complexity 200
8.4.2 Generalization Ability 203
8.5 Discussion 205
8.6 Summary and Conclusion 206
PART III ADVANCED TECHNIQUES AND PHENOMENA 209
CONTROL OF DIVERSITY AND VARIATION STEP SIZE 210
9.1 Introduction 210
9.2 Structural Program Distance 9.2.1 E . ective Edit Distance 212
9.2.2 Alternative Distance Metrics 214
9.3 Semantic Program Distance 215
9.4 Control of Diversity 216
9.5 Control of Variation Step Size 218
9.6 Experimental Setup 220
9.7 Experiments 9.7.1 Distance Distribution and Correlation 221
9.7.2 Development of E.ective Step Size 225
9.7.3 Structural Diversity Selection 229
9.7.4 Development of E.ective Diversity 230
9.7.5 Semantic Diversity Selection 232
9.7.6 Diversity and Fitness Progress 233
9.7.7 Control of E.ective Mutation Step Size 235
9.8 Alternative Selection Criteria 237
9.9 Summary and Conclusion 238
CODE GROWTH AND NEUTRAL VARIATIONS 240
10.1 Code Growth in GP 241
10.2 Proposed Causes of Code Growth 242
10.2.1 Protection Theory 242
10.2.2 Drift Theory 243
10.2.3 Bias Theory 243
10.3 In.uence of Variation Step Size 244
10.4 Neutral Variations 245
10.5 Conditional Reproduction and Variation 247
10.6 Experimental Setup 248
10.7 Experiments 10.7.1 Conditional Instruction Mutations 248
10.7.2 E.ective Reproduction 252
10.7.3 Conditional Segment Variations 253
10.7.4 Semantic Diversity 255
10.7.5 Neutral Drift 257
10.7.6 Crossover Step Size 259
10.7.7 Implicit Bias: Linear Crossover 260
10.7.8 Implicit Bias: E.ective Instruction Mutation 263
10.8 Control of Code Growth 264
10.8.1 Variation-Based Control 264
10.8.2 Why Mutations Cause Less Bloat 268
10.8.3 Selection-Based Control 270
10.8.4 E.ective Complexity Selection 271
10.9 Summary and Conclusion 274
EVOLUTION OF PROGRAM TEAMS 276
11.1 Introduction 276
11.2 Team Evolution 277
11.2.1 Team Representation 278
11.2.2 Team Operators 279
11.3 Combination of Multiple Predictors 280
11.3.1 Making Multiple Decisions Di.er 281
11.3.2 Combination Methods 282
11.3.3 Averaging (AV) 284
11.3.4 Weighting by Error (ERR) 284
11.3.5 Coevolution of Weights (EVOL) 285
11.3.6 Majority Voting (MV) 285
11.3.7 Weighted Voting (WV) 286
11.3.8 Winner-Takes-All (WTA) 286
11.3.9 Weight Optimization (OPT) 287
11.4 Experimental Setup 11.4.1 Benchmark Problems 288
11.4.2 Team and Member Fitness 290
11.4.3 Parameter Settings 290
11.5 Experiments 291
11.5.1 Prediction Accuracy 291
11.5.2 Code Size 295
11.5.3 Number of Team Members 297
11.5.4 Number of Varied Members 299
11.5.5 Interpositional Recombination 300
11.5.6 Member Fitness 300
11.6 Combination of Multiple Program Outputs 301
11.7 Summary and Conclusion 302
Epilogue 304
References 306
Index 318

2.1.5 Iteration Concepts (p.22)
Iteration of code by loops plays a rather unimportant role in genetic programming. Most GP applications that require loops involve control problems with the combination of primitive actions of an agent being the object of evolution. Data flow is usually not necessary in such programs.

Instead, each instruction performs actions with side effects on the problem environment and .tness is derived from a reinforcement signal. For the problem classes we focus on here, supervised classiffcation and approximation, iteration is of minor importance. That is not to say that a reuse of code by iterations could not result in more compact and elegant solutions.

In functional programming the concept of loops is unknown. The implicit iteration concept in functional programs denotes recursions which are, however, hard to control in tree-based genetic programming [142]. Otherwise, iterated evaluations of a subtree can have an effect only if functions produce side effects. In linear GP, assignments represent an implicit side effect on memory locations as part of the imperative representation. Nevertheless, the iteration of an instruction segment may only be effective if it includes at least one effective instruction and if at least one register acts as both destination register and source register in the same or a combination of (effective) instructions, e.g., r0 := r0 + 1.

In the following, possible iteration concepts for linear GP will be presented. These comprise conditional loops and loops with a limited number of iterations. One form of iteration in linear programs is a conditional backward jump corresponding to a while loop in C. The problem with this concept is that infinite loops can be easily formed by conditions that are always fulfilled.

In general, it is not possible to detect all in.nite loops in programs, due to the halting problem [36]. A solution to remedy this situation is to terminate a genetic program after a maximal number of instructions. The result of the program would then, however, depend on the execution time allowed.

The more recommended option is a loop concept that limits the number of iterations in each loop. This requires an additional control flow parameter which may either be constant or be varied within loop instructions. Such a construct is usually expressed by a for loop in C. Because only overlapping loops (not nested loops) need to be avoided, an appropriate choice to limit the size of loop blocks may be the coevolution of endfor instructions.

Analogous to the interpretation of branches in Section 2.1.4, a for instruction and a succeeding endfor de.ne a loop block provided that only closed loops lie in between. All other loop instructions are not interpreted.

2.1.6 Modularization Concepts

For certain problems modularization may be advantageous in GP. By using subroutines repeatedly within programs, solutions may become more compact and the same limited program space can be used more efficiently. A problem may also be decomposed into simpler subproblems that can be solved more efficiently in local submodules. In this case, a combination of subsolutions may result in a simpler and better overall solution.

Erscheint lt. Verlag	25.2.2007
Reihe/Serie	Genetic and Evolutionary Computation
Reihe/Serie	Genetic and Evolutionary Computation
Zusatzinfo	XVI, 316 p.
Verlagsort	New York
Sprache	englisch
Themenwelt	Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik
Schlagworte	algorithms • code growth • Diversity control • evolutionary algorithm • Genetic algorithms • Genetic operators • genetic programming • learning • Linear genetic programming • machine learning • neutral variations • Optimization • programming • Step Size Control • Syntax
ISBN-10	0-387-31030-4 / 0387310304
ISBN-13	978-0-387-31030-5 / 9780387310305

Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?

PDF (Wasserzeichen)

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.