Generative AI with Python and PyTorch - Joseph Babcock, Raghav Bali

Blick ins Buch

Generative AI with Python and PyTorch (eBook)

Navigating the AI frontier with LLMs, Stable Diffusion, and next-gen AI applications

Joseph Babcock, Raghav Bali (Autoren)

eBook Download: EPUB

2025
454 Seiten
Packt Publishing (Verlag)
978-1-83588-445-4 (ISBN)

Lese- und Medienproben

Ebook-Leseprobe (EPUB)

Become an expert in Generative AI through immersive, hands-on projects that leverage today's most powerful models for Natural Language Processing (NLP) and computer vision. Generative AI with Python and PyTorch is your end-to-end guide to creating advanced AI applications, made easy by Raghav Bali, a seasoned data scientist with multiple patents in AI, and Joseph Babcock, a PhD and machine learning expert. Through business-tested approaches, this book simplifies complex GenAI concepts, making learning both accessible and immediately applicable.
From NLP to image generation, this second edition explores practical applications and the underlying theories that power these technologies. By integrating the latest advancements in LLMs, it prepares you to design and implement powerful AI systems that transform data into actionable intelligence.
You'll build your versatile LLM toolkit by gaining expertise in GPT-4, LangChain, RLHF, LoRA, RAG, and more. You'll also explore deep learning techniques for image generation and apply styler transfer using GANs, before advancing to implement CLIP and diffusion models.
Whether you're generating dynamic content or developing complex AI-driven solutions, this book equips you with everything you need to harness the full transformative power of Python and AI.

Master GenAI techniques to create images and text using variational autoencoders (VAEs), generative adversarial networks (GANs), LSTMs, and large language models (LLMs)Key FeaturesImplement real-world applications of LLMs and generative AIFine-tune models with PEFT and LoRA to speed up trainingExpand your LLM toolbox with Retrieval Augmented Generation (RAG) techniques, LangChain, and LlamaIndexPurchase of the print or Kindle book includes a free eBook in PDF formatBook DescriptionBecome an expert in Generative AI through immersive, hands-on projects that leverage today's most powerful models for Natural Language Processing (NLP) and computer vision. Generative AI with Python and PyTorch is your end-to-end guide to creating advanced AI applications, made easy by Raghav Bali, a seasoned data scientist with multiple patents in AI, and Joseph Babcock, a PhD and machine learning expert. Through business-tested approaches, this book simplifies complex GenAI concepts, making learning both accessible and immediately applicable. From NLP to image generation, this second edition explores practical applications and the underlying theories that power these technologies. By integrating the latest advancements in LLMs, it prepares you to design and implement powerful AI systems that transform data into actionable intelligence. You ll build your versatile LLM toolkit by gaining expertise in GPT-4, LangChain, RLHF, LoRA, RAG, and more. You ll also explore deep learning techniques for image generation and apply styler transfer using GANs, before advancing to implement CLIP and diffusion models. Whether you re generating dynamic content or developing complex AI-driven solutions, this book equips you with everything you need to harness the full transformative power of Python and AI.What you will learnGrasp the core concepts and capabilities of LLMsCraft effective prompts using chain-of-thought, ReAct, and prompt query language to guide LLMs toward your desired outputsUnderstand how attention and transformers have changed NLPOptimize your diffusion models by combining them with VAEsBuild text generation pipelines based on LSTMs and LLMsLeverage the power of open-source LLMs, such as Llama and Mistral, for diverse applicationsWho this book is forThis book is for data scientists, machine learning engineers, and software developers seeking practical skills in building generative AI systems. A basic understanding of math and statistics and experience with Python coding is required.]]>

1 Introduction to Generative AI: Drawing Data from Models

At the Colorado State Fair in 2022, the winning entry was a fantastical sci-fi landscape created by video game designer Jason Allen titled Théâtre D’opéra Spatial (Figure 1.1). The first-prize art was remarkable both from the dramatic subject matter and due to the unusual origin of this image. Unlike the majority of other artworks entered into the competition, Théâtre D’opéra Spatial was not painted using oil or watercolors, nor was its “creator” even human; rather, it is an entirely digital image produced by a sophisticated machine learning algorithm called Midjourney. Jason used Midjourney, which has been trained on diverse images, along with natural language instructions to create the image, rather than a brush and canvas.

Figure 1.1: Théâtre D’opéra Spatial1

Visual art is far from the only area in which machine learning has demonstrated astonishing results. Indeed, if you have paid attention to the news in the last few years, you have likely seen many stories about the groundbreaking results of modern AI systems applied to diverse problems, from the hard sciences to online avatars and interactive chat. Deep neural network models, such as the one powering Midjourney, have shown amazing abilities to generate realistic human language2, author computer code3, and solve school exams with human-level ability2. Such models can also classify X-ray images of human anatomy on the level of trained physicians4, beat human masters at both classic board games such as Go (an Asian form of chess) as well as multiplayer computer games5, 6, and translate French into English with amazing sensitivity to grammatical nuances7.

Free Benefits with Your Book

Your purchase includes a free PDF copy of this book along with other exclusive benefits. Check the Free Benefits with Your Book section in the Preface to unlock them instantly and maximize your learning experience.

Discriminative versus generative models

However, these latter examples of AI differ in an important way from the model that generated Théâtre D’opéra Spatial. In all of these other applications, the model is presented with a set of inputs—data such as English text, or X-ray images—that is paired with a target output, such as the next word in a translated sentence or the diagnostic classification of an X-ray. Indeed, this is probably the kind of AI model you are most familiar with from prior experiences in predictive modeling; they are broadly known as discriminative models, whose purpose is to create a mapping between a set of input variables and a target output. The target output could be a set of discrete classes (such as which word in the English language appears next in a translation), or a continuous outcome (such as the expected amount of money a customer will spend in an online store over the next 12 months).

However, this kind of model, in which data is “labeled” or “scored,” represents only half of the capabilities of modern machine learning. Another class of algorithms, such as the one that generated the winning entry in the Colorado State Art Fair, doesn’t compute a score or label from input variables but rather generates new data. Unlike discriminative models, the input variables are often vectors of numbers that aren’t related to real-world values at all and are often even randomly generated. This kind of model, known as a generative model, which can produce complex outputs such as text, music, or images from random noise, is the topic of this book.

Even if you did not know it at the time, you have probably seen other instances of generative models mentioned in the news alongside the discriminative examples given previously. A prominent example is deepfakes—videos in which one person’s face has been systematically replaced with another’s by using a neural network to remap the pixels8 (Figure 1.2).

Figure 1.2: A deepfake image9

Maybe you have also seen stories about AI models that generate “fake news,” which scientists at the firm OpenAI were initially terrified to release to the public due to concerns it could be used to create propaganda and misinformation online (Figure 1.3)11.

Figure 1.3: A chatbot dialogue created using GPT-210

In these and other applications—such as Google’s voice assistant Duplex, which can make a restaurant reservation by dynamically creating conservation with a human in real time12, or even software that can generate original musical compositions13—we are surrounded by the outputs of generative AI algorithms. These models are able to handle complex information in a variety of domains: creating photorealistic images or stylistic “filters” on pictures, synthetic sound, conversational text, and even rules for optimally playing video games. You might ask: Where did these models come from? How can I implement them myself?

Implementing generative models

While generative models could theoretically be implemented using a wide variety of machine learning algorithms, in practice, they are usually built with deep neural networks, which are well suited to capture the complex variation in data such as images or language. In this book, we will focus on implementing these deep-learning-based generative models for many different applications using PyTorch. PyTorch is a Python programming library used to develop and produce deep learning models. It was open-sourced by Meta (formerly Facebook) in 2016 and has become one of the most popular libraries for the research and deployment of neural network models. We’ll execute PyTorch code on the cloud using Google’s Colab notebook environment, which allows you to access world-class computing infrastructure including graphic processing units (GPUs) and tensor processing units (TPUs) on demand and without the need for onerous environment setups. We’ll also leverage the Pipelines library from Hugging Face, which provides an easy interface to run experiments using a catalog of some of the most sophisticated models available.

In the following chapters, you will learn not only the underlying theory behind these models but also the practical skills to implement them in popular programming frameworks. In Chapter 2, we’ll review how, since 2006, an explosion of research in “deep learning” using large neural network models has produced a wide variety of generative modeling applications. Innovations arising from this research included variational autoencoders (VAEs), which can efficiently generate complex data samples from random numbers that are “decoded” into realistic images, using techniques we will describe in Chapter 11. We will also describe a related image generation algorithm, the generative adversarial network (GAN), in more detail in Chapters 12-14 of this book through applications for image generation, style transfer, and deepfakes. Conceptually, the GAN model creates a competition between two neural networks.

One (termed the generator) produces realistic (or, in the case of the experiments by Obvious, artistic) images starting from a set of random numbers that are “decoded” into realistic images by applying a mathematical transformation. In a sense, the generator is like an art student, producing new paintings from brushstrokes and creative inspiration. The second network, known as the discriminator, attempts to classify whether a picture comes from a set of real-world images, or whether it was created by the generator. Thus, the discriminator acts like a teacher, grading whether the student has produced work comparable to the paintings they are attempting to mimic. As the generator becomes better at fooling the discriminator, its output becomes closer and closer to the historical examples it is designed to copy. In Chapter 11, we’ll also describe the algorithm used in Théâtre D’opéra Spatial, the latent diffusion model, which builds on VAEs to provide scalable image synthesis based on natural language prompts from a human user.

Another key innovation in generative models is in the domain of natural language data—by representing the complex interrelationship between words in a sentence in a computationally scalable way, the Transformer network and the Bidirectional Encoder from Transformers (BERT) model built on top of it present powerful building blocks to generate textual data in applications such as chatbots and large language models (LLMs), which we’ll cover in Chapters 4 and 5. In Chapter 6, we will dive deeper into the most famous open-source models in the current LLM landscape, including Llama. In Chapters 7 and 8.

Before diving into further details on the various applications of generative models and how to implement them in PyTorch, we will take a step back and examine how exactly generative models are different from...

Erscheint lt. Verlag	28.3.2025
Sprache	englisch
Themenwelt	Mathematik / Informatik ► Informatik ► Netzwerke
Themenwelt	Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik
ISBN-10	1-83588-445-8 / 1835884458
ISBN-13	978-1-83588-445-4 / 9781835884454

Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?

EPUB (Ohne DRM)

Digital Rights Management: ohne DRM
Dieses eBook enthält kein DRM oder Kopierschutz. Eine Weitergabe an Dritte ist jedoch rechtlich nicht zulässig, weil Sie beim Kauf nur die Rechte an der persönlichen Nutzung erwerben.

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür die kostenlose Software Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Print-Ausgabe

Buch | Softcover

CHF 73,30