Supercomputers for Linux SysAdmins

Managing Modern HPC Clusters and Supercomputers from Software to Hardware

Sergey Zhumatiy (Autor)

Buch | Softcover

470 Seiten

2025
Apress (Verlag)
979-8-8688-1599-7 (ISBN)

Artikel merken

Supercomputers and High Performance Computing (HPC) clusters are not so exotic as people imagine these days. They give companies the power of computation like no one server can give alone. They make new drugs and materials discoveries, universe modeling and AI training, crash simulations and market research possible – all thanks to HPC clusters. Building or renting a HPC cluster is not so difficult either as cloud providers can give you resources to build one cheap and performative enough to use yourself, so If you are or want to become HPC cluster Sysadmin or manager, this book is for you.

Supercomputers for Linux SysAdmins delves into the world of modern HPC cluster architecture, hardware, software and resources management using a Linux/UNIX based approach. The number of HPC clusters is growing with an estimated 30 billion by 2030 but there are not enough sysadmins to run and manage them, this book serves to bridge this gap to help more Sysadmins and managers to transition into the exiting world of HPCs.

This book helps those with a strong foundational knowledge in Linux, to deal with supercomputers and HPC clusters. We start with the basic principles of supercomputer management, fundamentals of Linux and UNIX, Shell Scripting and systemd and well as other open source tools and frameworks, taking you thorough the security, monitoring and hardware requirements for supercomputers and HPC clusters.

You Will Learn:

How to plan new supercomputers
The main principles and technologies used in supercomputers and HPC clusters
How to set up the software environments on new supercomputers
To set up supercomputer and HPC cluster resources and jobs management
To manage accounts, resource sharing and many more.

Who is it for:

The main audience of this book are regular UNIX/Linux sysadmins and managers, who should deal with HPC clusters on-prem or in cloud and those who are interested in supercomputers and HPC clusters and how to utilize them in their projects and teams.

Sergey Zhumatiy has been managing supercomputers since 1999 starting out building and managing HPC clusters at Moscow State University and holds a PhD in computer science. Several supercomputers under his supervising, like Chebyshev, Lomonosov, Lomonosov-2, achieved top rankings in the top500 supercomputers list, and dominated the Russian top50 supercomputers list. Now he works as an HPC Architect and SysAdmin at NVIDIA.

1: Introduction.- 2: What is "super"?.- 3: How to build and start it?.- 4: Supercomputer Hardware.- Chapter 5: InfiniBand.- 6: How a supercomputer does the job.- 7: UNIX and Linux – the basics.- 8: UNIX and Linux – working techniques.- Chapter 9: Network File Systems.- Chapter 10: Remote Management.- 11: Users – Accounting, Management.- 12: Users – quotas, access rights.- 13: Job management systems.- 14: OpenPBS, Torque.- 15: Slurm.- 16: Containers.- 17: Clouds.- 18: Remote User Access.- 19: Cluster Status Monitoring Systems.- 20: Backup.- 21: Compilers and Environments, for Parallel Technologies.- 22: Parallel Computing Support Libraries.- 23. Booting and init.- 24: Node Setup, Software Installation.- 25: Out-of-the-box stacks and deployment systems.- 26: Cluster Management Systems – xCAT and others.- 27: Communicating with users.- 28: One-two-three instructions.- 29: Shell scripts – basics and common mistakes.- 30: Systemd – A Short Course.- 31. Conclusion.- Glossary.

Erscheinungsdatum	20.07.2025
Zusatzinfo	18 Illustrations, black and white
Verlagsort	Berkley
Sprache	englisch
Maße	178 x 254 mm
Themenwelt	Informatik ► Betriebssysteme / Server ► Unix / Linux
	Mathematik / Informatik ► Informatik ► Theorie / Studium
	Informatik ► Weitere Themen ► Hardware
Schlagworte	Hardware • High Performance Computing • HPC Clusters • Kubernetes • Linux • Nagios • Open MPI • Open Source • Supercomputers • sysadmin • systemd • System Management • UNIX • Zabbix
ISBN-13	979-8-8688-1599-7 / 9798868815997
Zustand	Neuware