Kubernetes for Generative AI Solutions - Sukirti Gupta, Ashok Srirama

Blick ins Buch

Kubernetes for Generative AI Solutions (eBook)

A complete guide to designing, optimizing, and deploying Generative AI workloads on Kubernetes

Sukirti Gupta, Ashok Srirama (Autoren)

eBook Download: EPUB

2025 | 1. Auflage
338 Seiten
Packt Publishing (Verlag)
978-1-83620-992-8 (ISBN)

Lese- und Medienproben

Ebook-Leseprobe (EPUB)

Generative AI (GenAI) is revolutionizing industries, from chatbots to recommendation engines to content creation, but deploying these systems at scale poses significant challenges in infrastructure, scalability, security, and cost management.
This book is your practical guide to designing, optimizing, and deploying GenAI workloads with Kubernetes (K8s) the leading container orchestration platform trusted by AI pioneers. Whether you're working with large language models, transformer systems, or other GenAI applications, this book helps you confidently take projects from concept to production. You'll get to grips with foundational concepts in machine learning and GenAI, understanding how to align projects with business goals and KPIs. From there, you'll set up Kubernetes clusters in the cloud, deploy your first workload, and build a solid infrastructure. But your learning doesn't stop at deployment. The chapters highlight essential strategies for scaling GenAI workloads in production, covering model optimization, workflow automation, scaling, GPU efficiency, observability, security, and resilience.
By the end of this book, you'll be fully equipped to confidently design and deploy scalable, secure, resilient, and cost-effective GenAI solutions on Kubernetes.

Master the complete Generative AI project lifecycle on Kubernetes (K8s) from design and optimization to deployment using best practices, cost-effective strategies, and real-world examples.Key FeaturesBuild and deploy your first Generative AI workload on Kubernetes with confidenceLearn to optimize costly resources such as GPUs using fractional allocation, Spot Instances, and automationGain hands-on insights into observability, infrastructure automation, and scaling Generative AI workloadsPurchase of the print or Kindle book includes a free PDF eBookBook DescriptionGenerative AI (GenAI) is revolutionizing industries, from chatbots to recommendation engines to content creation, but deploying these systems at scale poses significant challenges in infrastructure, scalability, security, and cost management. This book is your practical guide to designing, optimizing, and deploying GenAI workloads with Kubernetes (K8s) the leading container orchestration platform trusted by AI pioneers. Whether you're working with large language models, transformer systems, or other GenAI applications, this book helps you confidently take projects from concept to production. You ll get to grips with foundational concepts in machine learning and GenAI, understanding how to align projects with business goals and KPIs. From there, you'll set up Kubernetes clusters in the cloud, deploy your first workload, and build a solid infrastructure. But your learning doesn't stop at deployment. The chapters highlight essential strategies for scaling GenAI workloads in production, covering model optimization, workflow automation, scaling, GPU efficiency, observability, security, and resilience. By the end of this book, you ll be fully equipped to confidently design and deploy scalable, secure, resilient, and cost-effective GenAI solutions on Kubernetes.What you will learnExplore GenAI deployment stack, agents, RAG, and model fine-tuningImplement HPA, VPA, and Karpenter for efficient autoscalingOptimize GPU usage with fractional allocation, MIG, and MPS setupsReduce cloud costs and monitor spending with Kubecost toolsSecure GenAI workloads with RBAC, encryption, and service meshesMonitor system health and performance using Prometheus and GrafanaEnsure high availability and disaster recovery for GenAI systemsAutomate GenAI pipelines for continuous integration and deliveryWho this book is forThis book is for solutions architects, product managers, engineering leads, DevOps teams, GenAI developers, and AI engineers. It's also suitable for students and academics learning about GenAI, Kubernetes, and cloud-native technologies. A basic understanding of cloud computing and AI concepts is needed, but no prior knowledge of Kubernetes is required.]]>

1 Generative AI Fundamentals

Generative AI (GenAI) has revolutionized our world and has grabbed everyone’s attention since the introduction of ChatGPT in November of 2022 by OpenAI (https://openai.com/index/chatgpt/). However, the foundational concepts of this technology have been around for quite some time. In this chapter, we will introduce the key concepts of GenAI and how it has evolved over time. We will then discuss how to think about a GenAI project and align it with the business objectives, covering the entire process for developing and deploying GenAI workloads, along with potential use cases across different industries.

In this chapter, we’re going to cover the following main topics:

Artificial intelligence versus GenAI
The evolution of machine learning
Transformer architecture
The GenAI project life cycle
The GenAI deployment stack
GenAI project use cases

Artificial Intelligence versus GenAI

Before we dive deeper into GenAI concepts, let’s discuss the differences between Artificial Intelligence (AI), Machine Learning (ML), Deep Learning (DL), and GenAI, as these terms are often used interchangeably.

Figure 1.1 shows the relationships between these concepts.

Figure 1.1 – Relationships between AI, ML, DL, and GenAI

Let’s learn more about these relationships:

AI: AI refers to a system or algorithm that is capable of performing tasks that would otherwise typically require human intelligence. These tasks include reasoning, learning, problem-solving, perception, and language understanding. AI is a broad category and can include rule-based systems, expert systems, neural networks, and GenAI algorithms. The evolution of AI algorithms has provided machines with human-like senses and capabilities, such as vision to analyze the world around them, listening and speaking to understand natural language and respond verbally, and using sensor data to understand the external environment and respond accordingly.
ML: ML is a subset of AI that involves algorithms and models that enable machines to learn from data and make predictions without requiring explicit coding. In traditional programming, developers write explicit instructions for a computer to execute, whereas in ML, algorithms learn from the patterns and relationships in data and make predictions. ML can further be divided into the following sub-categories:
- Supervised learning: This uses labeled datasets to train the models. It can further be subdivided into classification and regression problems:
  - Classification problems use labeled data, such as labeled pictures of dogs and cats, to train the model. Once the model is trained, it can classify a user-provided picture using the classes it has been trained on.
  - Regression problems, on the other hand, use numerical data to understand the relationship between dependent and independent variables, such as house pricing based on different attributes. Once a model establishes a relationship, it can then forecast the pricing for different sets of attributes, even if the model has not been trained on these specific attributes. Some popular regression algorithms are linear regression, logistic regression, and polynomial regression.
- Unsupervised learning: This uses ML algorithms to analyze and cluster unlabeled datasets to discover hidden patterns in data. Unsupervised learning can further be divided into the following two sub-categories:
  - Clustering algorithms group data based on similarities or differences. A popular clustering algorithm is the k-means clustering algorithm, which uses Euclidian distances between data points to measure the similarity between data points and assign them in k distinct, non-overlapping clusters. It iterates to refine the clusters to minimize the variance within each cluster. A typical use case is segmenting customers based on purchasing behavior, demographics, or preferences to target marketing strategies effectively.
  - Dimensionality reduction is another form of unsupervised learning, which is used to reduce the number of features/dimensions in a given dataset. It aims to simplify models, reduce computational costs, and improve overall model performance. Principal Component Analysis (PCA) (https://towardsdatascience.com/a-one-stop-shop-for-principal-component-analysis-5582fb7e0a9c) is a popular algorithm used for dimensionality reduction. It achieves this by finding a new set of features called components, which are composites of the original features that are uncorrelated with one another.
- Semi-supervised learning: This is a type of ML that combines supervised and unsupervised learning by leveraging both labeled and unlabeled data for training. This is particularly useful when obtaining labeled data is time-consuming and expensive because you can use small amounts of labeled data for training and then iteratively apply it to the large amounts of unlabeled data. This can be applied in both classification and regression use cases, such as spam/image/object detection, speech recognition, and forecasting.
- Reinforcement learning: In reinforcement learning, there is an agent and reward system, and algorithms learn by trial and error to maximize the reward for the agent. An agent is an autonomous system, like a computer program or robot, that can make decisions and act in response to its environment without direct human instructions. Rewards are given from the environment when agent actions lead to a positive outcome. For example, if we want to train a robot to walk without falling over, positive rewards are given for actions that help the robot to remain upright, and negative rewards are given for actions that cause it to fall over. The robot begins by trying different actions randomly, such as leaning forward, moving its legs, or shifting its weight. As it performs these actions, it observes the resulting changes in its state. The robot uses feedback (rewards) to update its understanding of which actions are beneficial and thus learns to walk over time.
We have summarized the different categories of ML in Figure 1.2:

Figure 1.2 – Different categories of ML

DL: DL is a subset of ML that involves deep neural networks with many layers. Conceptually, it is inspired by the human brain, which has billions of deeply connected neurons and provides humans with very advanced cognition. Some popular examples of deep neural nets are Convolutional Neural Networks (CNNs), used for image processing, and Recurrent Neural Networks (RNNs), which are used for analyzing time series data or natural language processing.
GenAI: GenAI is a further subset of DL and focuses on creating new data, such as text, images, music, and other forms of content. Lots of generative applications are based on Foundational Models (FMs), which are large-scale AI models trained on vast amounts of diverse data, serving as a base for a wide range of downstream tasks. They are pre-trained on broad datasets and can be fine-tuned for specific applications. Large Language Models (LLMs) are a subset of FMs specifically designed for understanding and generating human language. GenAI is the primary focus of this book; we will be diving into its details later in the book.

Now that we understand the key differences between AI, ML, DL, and GenAI, let’s explore the evolution of ML and how transformer architecture has revolutionized the ML landscape, particularly in the field of Natural Language Processing (NLP).

Evolution of machine learning

Since this book is about GenAI, what could be a better way to start it than asking ChatGPT to summarize the evolution of AI and ML over the last decade?

Prompt: "Why did the chicken cross the road?" Describe how that question's answer evolved using AI/ML over the last decade.

ChatGPT Response (ChatGPT-4o, June 16th, 2024):

The evolution of AI/ML responses to the question "Why did the chicken cross the road?" over the past decade reflects significant advancements in language processing...

Erscheint lt. Verlag	6.6.2025
Vorwort	Rajdeep Saha
Sprache	englisch
Themenwelt	Mathematik / Informatik ► Informatik ► Netzwerke
Themenwelt	Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik
ISBN-10	1-83620-992-4 / 1836209924
ISBN-13	978-1-83620-992-8 / 9781836209928

Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?

EPUB (Ohne DRM)

Digital Rights Management: ohne DRM
Dieses eBook enthält kein DRM oder Kopierschutz. Eine Weitergabe an Dritte ist jedoch rechtlich nicht zulässig, weil Sie beim Kauf nur die Rechte an der persönlichen Nutzung erwerben.

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür die kostenlose Software Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.