High-Performance Computing on Complex Environments (eBook)
502 Seiten
Wiley (Verlag)
978-1-118-71207-8 (ISBN)
• Covers cutting-edge research in HPC on complex environments, following an international collaboration of members of the ComplexHPC
• Explains how to efficiently exploit heterogeneous and hierarchical architectures and distributed systems
• Twenty-three chapters and over 100 illustrations cover domains such as numerical analysis, communication and storage, applications, GPUs and accelerators, and energy efficiency
Emmanuel Jeannot is a Senior Research Scientist at INRIA. He received his PhD in computer science from Ecole Normale Superieur de Lyon. His main research interests are processes placement, scheduling for heterogeneous environments and grids, data redistribution, algorithms and models for parallel machines.
Julius ?ilinskas is a Principal Researcher and a Head of Department at Vilnius University in Vilnius, Lithuania. His research interests include parallel computing, optimization, data analysis and visualization.
EMMANUEL JEANNOT is a Senior Research Scientist at INRIA. He received his PhD in computer science from École Normale Supérieur de Lyon. His main research interests are processes placement, scheduling for heterogeneous environments and grids, data redistribution, algorithms, and models for parallel machines. JULIUS ILINSKAS is a Principal Researcher and a Head of Department at Vilnius University, Lithuania. His research interests include parallel computing, optimization, data analysis, and visualization.
Cover 1
Title Page 5
Contents 9
Contributors 25
Preface 29
European Science Foundation 31
Part I Introduction 33
Chapter 1 Summary of the Open European Network for High-Performance Computing in Complex Environments 35
1.1 Introduction and Vision 36
1.2 Scientific Organization 38
1.2.1 Scientific Focus 38
1.2.2 Working Groups 38
1.3 Activities of the Project 38
1.3.1 Spring Schools 38
1.3.2 International Workshops 39
1.3.3 Working Groups Meetings 39
1.3.4 Management Committee Meetings 39
1.3.5 Short-Term Scientific Missions 39
1.4 Main Outcomes of the Action 39
1.5 Contents of the Book 40
Acknowledgment 42
Part II Numerical Analysis for Heterogeneous and Multicore Systems 43
Chapter 2 On the Impact of the Heterogeneous Multicore and Many-Core Platforms on Iterative Solution Methods and Preconditioning Techniques 45
2.1 Introduction 46
2.2 General Description of Iterative Methods and Preconditioning 48
2.2.1 Basic Iterative Methods 48
2.2.2 Projection Methods: CG and GMRES 50
2.3 Preconditioning Techniques 52
2.4 Defect-Correction Technique 53
2.5 Multigrid Method 54
2.6 Parallelization of Iterative Methods 54
2.7 Heterogeneous Systems 55
2.7.1 Heterogeneous Computing 56
2.7.2 Algorithm Characteristics and Resource Utilization 57
2.7.3 Exposing Parallelism 58
2.7.4 Heterogeneity in Matrix Computation 58
2.7.5 Setup of Heterogeneous Iterative Solvers 59
2.8 Maintenance and Portability 61
2.9 Conclusion 62
Acknowledgments 63
References 63
Chapter 3 Efficient Numerical Solution of 2D Diffusion Equation on Multicore Computers 65
3.1 Introduction 66
3.2 Test Case 67
3.2.1 Governing Equations 67
3.2.2 Solution Procedure 68
3.3 Parallel Implementation 71
3.3.1 Intel PCM Library 71
3.3.2 OpenMP 72
3.4 Results 73
3.4.1 Results of Numerical Integration 73
3.4.2 Parallel Efficiency 74
3.5 Discussion 77
3.6 Conclusion 79
Acknowledgment 79
References 79
Chapter 4 Parallel Algorithms for Parabolic Problems on Graphs in Neuroscience 83
4.1 Introduction 83
4.2 Formulation of the Discrete Model 85
4.2.1 The theta-Implicit Discrete Scheme 87
4.2.2 The Predictor--Corrector Algorithm I 89
4.2.3 The Predictor--Corrector Algorithm II 90
4.3 Parallel Algorithms 91
4.3.1 Parallel theta-Implicit Algorithm 91
4.3.2 Parallel Predictor--Corrector Algorithm I 94
4.3.3 Parallel Predictor--Corrector Algorithm II 95
4.4 Computational Results 95
4.4.1 Experimental Comparison of Predictor--Corrector Algorithms 98
4.4.2 Numerical Experiment of Neuron Excitation 100
4.5 Conclusions 101
Acknowledgments 102
References 102
Part III Communication and Storage Considerations in High-Performance Computing 105
Chapter 5 An Overview of Topology Mapping Algorithms and Techniques in High-Performance Computing 107
5.1 Introduction 108
5.2 General Overview 108
5.2.1 A Key to Scalability: Data Locality 109
5.2.2 Data Locality Management in Parallel Programming Models 109
5.2.3 Virtual Topology: Definition and Characteristics 110
5.2.4 Understanding the Hardware 111
5.3 Formalization of the Problem 111
5.4 Algorithmic Strategies for Topology Mapping 113
5.4.1 Greedy Algorithm Variants 113
5.4.2 Graph Partitioning 114
5.4.3 Schemes Based on Graph Similarity 114
5.4.4 Schemes Based on Subgraph Isomorphism 114
5.5 Mapping Enforcement Techniques 114
5.5.1 Resource Binding 115
5.5.2 Rank Reordering 115
5.5.3 Other Techniques 116
5.6 Survey of Solutions 117
5.6.1 Algorithmic Solutions 117
5.6.2 Existing Implementations 117
5.7 Conclusion and Open Problems 121
Acknowledgment 122
References 122
Chapter 6 Optimization of Collective Communication for Heterogeneous HPC Platforms 127
6.1 Introduction 127
6.2 Overview of Optimized Collectives and Topology-Aware Collectives 129
6.3 Optimizations of Collectives on Homogeneous Clusters 130
6.4 Heterogeneous Networks 131
6.4.1 Comparison to Homogeneous Clusters 131
6.5 Topology- and Performance-Aware Collectives 132
6.6 Topology as Input 133
6.7 Performance as Input 134
6.7.1 Homogeneous Performance Models 135
6.7.2 Heterogeneous Performance Models 137
6.7.3 Estimation of Parameters of Heterogeneous Performance Models 138
6.7.4 Other Performance Models 138
6.8 Non-MPI Collective Algorithms for Heterogeneous Networks 138
6.8.1 Optimal Solutions with Multiple Spanning Trees 139
6.8.2 Adaptive Algorithms for Efficient Large-Message Transfer 139
6.8.3 Network Models Inspired by BitTorrent 140
6.9 Conclusion 143
Acknowledgments 143
References 143
Chapter 7 Effective Data Access Patterns on Massively Parallel Processors 147
7.1 Introduction 147
7.2 Architectural Details 148
7.3 K-Model 149
7.3.1 The Architecture 149
7.3.2 Cost and Complexity Evaluation 150
7.3.3 Efficiency Evaluation 151
7.4 Parallel Prefix Sum 152
7.4.1 Experiments 157
7.5 Bitonic Sorting Networks 158
7.5.1 Experiments 163
7.6 Final Remarks 164
Acknowledgments 165
References 165
Chapter 8 Scalable Storage I/O Software for Blue Gene Architectures 167
8.1 Introduction 167
8.2 Blue Gene System Overview 168
8.2.1 Blue Gene Architecture 168
8.2.2 Operating System Architecture 168
8.3 Design and Implementation 170
8.3.1 The Client Module 171
8.3.2 The I/O Module 173
8.4 Conclusions and Future Work 174
Acknowledgments 174
References 174
Part IV Efficient Exploitation of Heterogeneous Architectures 177
Chapter 9 Fair Resource Sharing for Dynamic Scheduling of Workflows on Heterogeneous Systems 179
9.1 Introduction 180
9.1.1 Application Model 180
9.1.2 System Model 183
9.1.3 Performance Metrics 184
9.2 Concurrent Workflow Scheduling 185
9.2.1 Offline Scheduling of Concurrent Workflows 186
9.2.2 Online Scheduling of Concurrent Workflows 187
9.3 Experimental Results and Discussion 192
9.3.1 DAG Structure 192
9.3.2 Simulated Platforms 192
9.3.3 Results and Discussion 194
9.4 Conclusions 197
Acknowledgments 198
References 198
Chapter 10 Systematic Mapping of Reed--Solomon Erasure Codes on Heterogeneous Multicore Architectures 201
10.1 Introduction 201
10.2 Related Works 203
10.3 Reed--Solomon Codes and Linear Algebra Algorithms 204
10.4 Mapping Reed--Solomon Codes on Cell/B.E. Architecture 205
10.4.1 Cell/B.E. Architecture 205
10.4.2 Basic Assumptions for Mapping 206
10.4.3 Vectorization Algorithm and Increasing its Efficiency 207
10.4.4 Performance Results 209
10.5 Mapping Reed--Solomon Codes on Multicore GPU Architectures 210
10.5.1 Parallelization of Reed--Solomon Codes on GPU Architectures 210
10.5.2 Organization of GPU Threads 212
10.6 Methods of Increasing the Algorithm Performance on GPUs 213
10.6.1 Basic Modifications 213
10.6.2 Stream Processing 214
10.6.3 Using Shared Memory 216
10.7 GPU Performance Evaluation 217
10.7.1 Experimental Results 217
10.7.2 Performance Analysis using the Roofline Model 219
10.8 Conclusions and Future Works 222
Acknowledgments 223
References 223
Chapter 11 Heterogeneous Parallel Computing Platforms and Tools for Compute-Intensive Algorithms: A Case Study 225
11.1 Introduction 226
11.2 A Low-Cost Heterogeneous Computing Environment 228
11.2.1 Adopted Computing Environment 231
11.3 First Case Study: The N-Body Problem 232
11.3.1 The Sequential N-Body Algorithm 233
11.3.2 The Parallel N-Body Algorithm for Multicore Architectures 235
11.3.3 The Parallel N-Body Algorithm for CUDA Architectures 236
11.4 Second Case Study: The Convolution Algorithm 238
11.4.1 The Sequential Convolver Algorithm 238
11.4.2 The Parallel Convolver Algorithm for Multicore Architectures 239
11.4.3 The Parallel Convolver Algorithm for GPU Architectures 240
11.5 Conclusions 243
Acknowledgments 244
References 244
Chapter 12 Efficient Application of Hybrid Parallelism in Electromagnetism Problems 247
12.1 Introduction 247
12.2 Computation of Green's functions in Hybrid Systems 248
12.2.1 Computation in a Heterogeneous Cluster 249
12.2.2 Experiments 250
12.3 Parallelization in Numa Systems of a Volume Integral Equation Technique 254
12.3.1 Experiments 254
12.4 Autotuning Parallel Codes 258
12.4.1 Empirical Autotuning 259
12.4.2 Modeling the Linear Algebra Routines 261
12.5 Conclusions and Future Research 262
Acknowledgments 263
References 264
Part V CPU + GPU Coprocessing 267
Chapter 13 Design and Optimization of Scientific Applications for Highly Heterogeneous and Hierarchical HPC Platforms Using Functional Computation Performance Models 269
13.1 Introduction 270
13.2 Related Work 273
13.3 Data Partitioning Based on Functional Performance Model 275
13.4 Example Application: Heterogeneous Parallel Matrix Multiplication 277
13.5 Performance Measurement on CPUs/GPUs System 279
13.6 Functional Performance Models of Multiple Cores and GPUs 280
13.7 FPM-Based Data Partitioning on CPUs/GPUs System 282
13.8 Efficient Building of Functional Performance Models 283
13.9 FPM-Based Data Partitioning on Hierarchical Platforms 285
13.10 Conclusion 289
Acknowledgments 291
References 291
Chapter 14 Efficient Multilevel Load Balancing on Heterogeneous CPU + GPU Systems 293
14.1 Introduction: Heterogeneous CPU + GPU Systems 294
14.1.1 Open Problems and Specific Contributions 295
14.2 Background and Related Work 297
14.2.1 Divisible Load Scheduling in Distributed CPU-Only Systems 297
14.2.2 Scheduling in Multicore CPU and Multi-GPU Environments 300
14.3 Load Balancing Algorithms for Heterogeneous CPU + GPU Systems 301
14.3.1 Multilevel Simultaneous Load Balancing Algorithm 302
14.3.2 Algorithm for Multi-Installment Processing with Multidistributions 305
14.4 Experimental Results 307
14.4.1 MSLBA Evaluation: Dense Matrix Multiplication Case Study 307
14.4.2 AMPMD Evaluation: 2D FFT Case Study 309
14.5 Conclusions 311
Acknowledgments 312
References 312
Chapter 15 The All-Pair Shortest-Path Problem in Shared-Memory Heterogeneous Systems 315
15.1 Introduction 315
15.2 Algorithmic Overview 317
15.2.1 Graph Theory Notation 317
15.2.2 Dijkstra's Algorithm 318
15.2.3 Parallel Version of Dijkstra's Algorithm 319
15.3 CUDA Overview 319
15.4 Heterogeneous Systems and Load Balancing 320
15.5 Parallel Solutions to The APSP 321
15.5.1 GPU Implementation 321
15.5.2 Heterogeneous Implementation 322
15.6 Experimental Setup 323
15.6.1 Methodology 323
15.6.2 Target Architectures 324
15.6.3 Input Set Characteristics 324
15.6.4 Load-Balancing Techniques Evaluated 324
15.7 Experimental Results 325
15.7.1 Complete APSP 325
15.7.2 512-Source-Node-to-All Shortest Path 327
15.7.3 Experimental Conclusions 328
15.8 Conclusions 329
Acknowledgments 329
References 329
Part VI Efficient Exploitation of Distributed Systems 333
Chapter 16 Resource Management for HPC on the Cloud 335
16.1 Introduction 335
16.2 On the Type of Applications for HPC and HPC2 337
16.3 HPC on the Cloud 338
16.3.1 General PaaS Solutions 338
16.3.2 On-Demand Platforms for HPC 342
16.4 Scheduling Algorithms for HPC2 343
16.5 Toward an Autonomous Scheduling Framework 344
16.5.1 Autonomous Framework for RMS 345
16.5.2 Self-Management 347
16.5.3 Use Cases 349
16.6 Conclusions 351
Acknowledgment 352
References 352
Chapter 17 Resource Discovery in Large-Scale Grid Systems 355
17.1 Introduction and Background 355
17.1.1 Introduction 355
17.1.2 Resource Discovery in Grids 356
17.1.3 Background 357
17.2 The Semantic Communities Approach 357
17.2.1 Grid Resource Discovery Using Semantic Communities 357
17.2.2 Grid Resource Discovery Based on Semantically Linked Virtual Organizations 359
17.3 The P2P Approach 361
17.3.1 On Fully Decentralized Resource Discovery in Grid Environments Using a P2P Architecture 361
17.3.2 P2P Protocols for Resource Discovery in the Grid 362
17.4 The Grid-Routing Transferring Approach 365
17.4.1 Resource Discovery Based on Matchmaking Routers 365
17.4.2 Acquiring Knowledge in a Large-Scale Grid System 367
17.5 Conclusions 369
Acknowledgment 370
References 370
Part VII Energy Awareness in High-Performance Computing 373
Chapter 18 Energy-Aware Approaches for HPC Systems 375
18.1 Introduction 376
18.2 Power Consumption of Servers 377
18.2.1 Server Modeling 378
18.2.2 Power Prediction Models 379
18.3 Classification and Energy Profiles of HPC Applications 386
18.3.1 Phase Detection 388
18.3.2 Phase Identification 390
18.4 Policies and Leverages 391
18.5 Conclusion 392
Acknowledgements 393
References 393
Chapter 19 Strategies for Increased Energy Awareness in Cloud Federations 397
19.1 Introduction 397
19.2 Related Work 399
19.3 Scenarios 401
19.3.1 Increased Energy Awareness Across Multiple Data Centers within a Single Administrative Domain 401
19.3.2 Energy Considerations in Commercial Cloud Federations 404
19.3.3 Reduced Energy Footprint of Academic Cloud Federations 406
19.4 Energy-Aware Cloud Federations 406
19.4.1 Availability of Energy-Consumption-Related Information 407
19.4.2 Service Call Scheduling at the Meta-Brokering Level of FCM 408
19.4.3 Service Call Scheduling and VM Management at the Cloud-Brokering Level of FCM 409
19.5 Conclusions 411
Acknowledgments 412
References 412
Chapter 20 Enabling Network Security in HPC Systems Using Heterogeneous CMPs 415
20.1 Introduction 416
20.2 Related Work 418
20.3 Overview of Our Approach 419
20.3.1 Heterogeneous CMP Architecture 419
20.3.2 Network Security Application Behavior 420
20.3.3 High-Level View 421
20.4 Heterogeneous CMP Design for Network Security Processors 422
20.4.1 Task Assignment 422
20.4.2 ILP Formulation 423
20.4.3 Discussion 425
20.5 Experimental Evaluation 426
20.5.1 Setup 426
20.5.2 Results 427
20.6 Concluding Remarks 429
Acknowledgments 429
References 429
Part VIII Applications of Heterogeneous High-Performance Computing 433
Chapter 21 Toward a High-Performance Distributed CBIR System for Hyperspectral Remote Sensing Data: A Case Study in Jungle Computing 435
21.1 Introduction 436
21.2 CBIR For Hyperspectral Imaging Data 439
21.2.1 Spectral Unmixing 439
21.2.2 Proposed CBIR System 441
21.3 Jungle Computing 442
21.3.1 Jungle Computing: Requirements 443
21.4 IBIS and Constellation 444
21.5 System Design and Implementation 447
21.5.1 Endmember Extraction 450
21.5.2 Query Execution 450
21.5.3 Equi-Kernels 451
21.5.4 Matchmaking 452
21.6 Evaluation 452
21.6.1 Performance Evaluation 453
21.7 Conclusions 458
Acknowledgments 458
References 458
Chapter 22 Taking Advantage of Heterogeneous Platforms in Image and Video Processing 461
22.1 Introduction 462
22.2 Related Work 463
22.2.1 Image Processing on GPU 463
22.2.2 Video Processing on GPU 464
22.2.3 Contribution 465
22.3 Parallel Image Processing on GPU 465
22.3.1 Development Scheme for Image Processing on GPU 465
22.3.2 GPU Optimization 466
22.3.3 GPU Implementation of Edge and Corner Detection 466
22.3.4 Performance Analysis and Evaluation 466
22.4 Image Processing on Heterogeneous Architectures 469
22.4.1 Development Scheme for Multiple Image Processing 469
22.4.2 Task Scheduling within Heterogeneous Architectures 470
22.4.3 Optimization Within Heterogeneous Architectures 470
22.5 Video Processing on GPU 470
22.5.1 Development Scheme for Video Processing on GPU 471
22.5.2 GPU Optimizations 472
22.5.3 GPU Implementations 472
22.5.4 GPU-Based Silhouette Extraction 472
22.5.5 GPU-Based Optical Flow Estimation 472
22.5.6 Result Analysis 475
22.6 Experimental Results 476
22.6.1 Heterogeneous Computing for Vertebra Segmentation 476
22.6.2 GPU Computing for Motion Detection Using a Moving Camera 477
22.7 Conclusion 479
Acknowledgment 480
References 480
Chapter 23 Real-Time Tomographic Reconstruction Through CPU + GPU Coprocessing 483
23.1 Introduction 484
23.2 Tomographic Reconstruction 485
23.3 Optimization of Tomographic Reconstruction for CPUs and for GPUs 487
23.4 Hybrid CPU + GPU Tomographic Reconstruction 489
23.5 Results 491
23.6 Discussion and Conclusion 493
Acknowledgments 495
References 495
Index 499
Series Page 501
| Erscheint lt. Verlag | 10.4.2014 |
|---|---|
| Reihe/Serie | Wiley Series on Parallel and Distributed Computing | Wiley Series on Parallel and Distributed Computing |
| Sprache | englisch |
| Themenwelt | Mathematik / Informatik ► Informatik ► Netzwerke |
| Mathematik / Informatik ► Informatik ► Theorie / Studium | |
| Schlagworte | Computer Science • Electrical & Electronics Engineering • Elektrotechnik u. Elektronik • Grid & Cloud Computing • Grid- u. Cloud-Computing • High Performance Computing • Informatik • Numerical Methods & Algorithms • Numerische Methoden u. Algorithmen • Parallel and Distributed Computing • Paralleles u. Verteiltes Rechnen |
| ISBN-10 | 1-118-71207-2 / 1118712072 |
| ISBN-13 | 978-1-118-71207-8 / 9781118712078 |
| Informationen gemäß Produktsicherheitsverordnung (GPSR) | |
| Haben Sie eine Frage zum Produkt? |
Größe: 6,5 MB
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich