Supercomputers for Linux SysAdmins
Apress (Verlag)
979-8-8688-1599-7 (ISBN)
Supercomputers for Linux SysAdmins delves into the world of modern HPC cluster architecture, hardware, software and resources management using a Linux/UNIX based approach. The number of HPC clusters is growing with an estimated 30 billion by 2030 but there are not enough sysadmins to run and manage them, this book serves to bridge this gap to help more Sysadmins and managers to transition into the exiting world of HPCs.
This book helps those with a strong foundational knowledge in Linux, to deal with supercomputers and HPC clusters. We start with the basic principles of supercomputer management, fundamentals of Linux and UNIX, Shell Scripting and systemd and well as other open source tools and frameworks, taking you thorough the security, monitoring and hardware requirements for supercomputers and HPC clusters.
You Will Learn:
How to plan new supercomputers
The main principles and technologies used in supercomputers and HPC clusters
How to set up the software environments on new supercomputers
To set up supercomputer and HPC cluster resources and jobs management
To manage accounts, resource sharing and many more.
Who is it for:
The main audience of this book are regular UNIX/Linux sysadmins and managers, who should deal with HPC clusters on-prem or in cloud and those who are interested in supercomputers and HPC clusters and how to utilize them in their projects and teams.
Sergey Zhumatiy has been managing supercomputers since 1999 starting out building and managing HPC clusters at Moscow State University and holds a PhD in computer science. Several supercomputers under his supervising, like Chebyshev, Lomonosov, Lomonosov-2, achieved top rankings in the top500 supercomputers list, and dominated the Russian top50 supercomputers list. Now he works as an HPC Architect and SysAdmin at NVIDIA.
1: Introduction.- 2: What is "super"?.- 3: How to build and start it?.- 4: Supercomputer Hardware.- Chapter 5: InfiniBand.- 6: How a supercomputer does the job.- 7: UNIX and Linux – the basics.- 8: UNIX and Linux – working techniques.- Chapter 9: Network File Systems.- Chapter 10: Remote Management.- 11: Users – Accounting, Management.- 12: Users – quotas, access rights.- 13: Job management systems.- 14: OpenPBS, Torque.- 15: Slurm.- 16: Containers.- 17: Clouds.- 18: Remote User Access.- 19: Cluster Status Monitoring Systems.- 20: Backup.- 21: Compilers and Environments, for Parallel Technologies.- 22: Parallel Computing Support Libraries.- 23. Booting and init.- 24: Node Setup, Software Installation.- 25: Out-of-the-box stacks and deployment systems.- 26: Cluster Management Systems – xCAT and others.- 27: Communicating with users.- 28: One-two-three instructions.- 29: Shell scripts – basics and common mistakes.- 30: Systemd – A Short Course.- 31. Conclusion.- Glossary.
| Erscheinungsdatum | 20.07.2025 |
|---|---|
| Zusatzinfo | 18 Illustrations, black and white |
| Verlagsort | Berkley |
| Sprache | englisch |
| Maße | 178 x 254 mm |
| Themenwelt | Informatik ► Betriebssysteme / Server ► Unix / Linux |
| Mathematik / Informatik ► Informatik ► Theorie / Studium | |
| Informatik ► Weitere Themen ► Hardware | |
| Schlagworte | Hardware • High Performance Computing • HPC Clusters • Kubernetes • Linux • Nagios • Open MPI • Open Source • Supercomputers • sysadmin • systemd • System Management • UNIX • Zabbix |
| ISBN-13 | 979-8-8688-1599-7 / 9798868815997 |
| Zustand | Neuware |
| Informationen gemäß Produktsicherheitsverordnung (GPSR) | |
| Haben Sie eine Frage zum Produkt? |
aus dem Bereich