Deep Learning Pipeline (eBook)
XXV, 551 Seiten
Apress (Verlag)
978-1-4842-5349-6 (ISBN)
Build your own pipeline based on modern TensorFlow approaches rather than outdated engineering concepts. This book shows you how to build a deep learning pipeline for real-life TensorFlow projects.
You'll learn what a pipeline is and how it works so you can build a full application easily and rapidly. Then troubleshoot and overcome basic Tensorflow obstacles to easily create functional apps and deploy well-trained models. Step-by-step and example-oriented instructions help you understand each step of the deep learning pipeline while you apply the most straightforward and effective tools to demonstrative problems and datasets.
You'll also develop a deep learning project by preparing data, choosing the model that fits that data, and debugging your model to get the best fit to data all using Tensorflow techniques. Enhance your skills by accessing some of the most powerful recent trends in data science. If you've ever considered building your own image or text-tagging solution or entering a Kaggle contest, Deep Learning Pipeline is for you!- Develop a deep learning project using data
- Study and apply various models to your data
- Debug and troubleshoot the proper model suited for your data
Hisham Elamir is a data scientist with expertise in machine learning, deep learning, and statistics. He currently lives and works in Cairo, Egypt. In his work projects, he faces challenges ranging from natural language processing (NLP), behavioral analysis, and machine learning to distributed processing. He is very passionate about his job and always tries to stay updated about the latest developments in data science technologies, attending meetups, conferences, and other events.
Mahmoud Hamdy is a machine learning engineer who works in Egypt and lives in Egypt, His primary area of study is the overlap between knowledge, logic, language, and learning. He works helping train machine learning, and deep learning models to distil large amounts of unstructured, semi-structured, and structured data into new knowledge about the world by using methods ranging from deep learning to statistical relational learning. He applies strong theoretical and practical skills in several areas of machine learning to finding novel and effective solutions for interesting and challenging problems in such interconnections
Build your own pipeline based on modern TensorFlow approaches rather than outdated engineering concepts. This book shows you how to build a deep learning pipeline for real-life TensorFlow projects. You'll learn what a pipeline is and how it works so you can build a full application easily and rapidly. Then troubleshoot and overcome basic Tensorflow obstacles to easily create functional apps and deploy well-trained models. Step-by-step and example-oriented instructions help you understand each step of the deep learning pipeline while you apply the most straightforward and effective tools to demonstrative problems and datasets. You'll also develop a deep learning project by preparing data, choosing the model that fits that data, and debugging your model to get the best fit to data all using Tensorflow techniques. Enhance your skills by accessing some of the most powerful recent trends in data science. If you've ever considered building your own image or text-tagging solution or entering a Kaggle contest, Deep Learning Pipeline is for you! What You'll LearnDevelop a deep learning project using dataStudy and apply various models to your dataDebug and troubleshoot the proper model suited for your dataWho This Book Is ForDevelopers, analysts, and data scientists looking to add to or enhance their existing skills by accessing some of the most powerful recent trends in data science. Prior experience in Python or other TensorFlow related languages and mathematics would be helpful.
Table of Contents 4
About the Authors 15
About the Technical Reviewer 16
Introduction 17
Part I: Introduction 24
Chapter 1: A Gentle Introduction 25
Information Theory, Probability Theory, and Decision Theory 26
Information Theory 26
Probability Theory 28
Decision Theory 30
Introduction to Machine Learning 32
Predictive Analytics and Its Connection with Machine learning 33
Machine Learning Approaches 34
Supervised Learning 34
Unsupervised Learning 36
Semisupervised Learning 38
Checkpoint 38
Reinforcement Learning 39
From Machine Learning to Deep Learning 41
Lets’ See What Some Heroes of Machine Learning Say About the Field 41
Connections Between Machine Learning and Deep Learning 42
Difference Between ML and DL 43
In Machine Learning 43
In Deep Learning 44
What Have We Learned Here? 44
Why Should We Learn About Deep Learning (Advantages of Deep learning)? 45
Disadvantages of Deep Learning (Cost of Greatness) 46
Introduction to Deep Learning 47
Machine Learning Mathematical Notations 50
Summary 58
Chapter 2: Setting Up Your Environment 59
Background 59
Python 2 vs. Python 3 60
Installing Python 60
Python Packages 62
IPython 63
Installing IPython 64
Jupyter 65
Installing Jupyter 67
What Is an ipynb File? 69
Packages Used in the Book 72
NumPy 72
SciPy 72
Pandas 73
Matplotlib 73
NLTK 74
Scikit-learn 74
Gensim 75
TensorFlow 75
Installing on Mac or Linux distributions 76
Installing on Windows 78
Keras 78
Summary 78
Chapter 3: A Tour Through the Deep Learning Pipeline 79
Deep Learning Approaches 80
What Is Deep Learning 80
Biological Deep Learning 80
What Are Neural Networks Architectures? 84
Deep Learning Pipeline 90
Define and Prepare Problem 91
Summarize and Understand Data 92
Process and Prepare Data 93
Evaluate Algorithms 94
Improve Results 95
Fast Preview of the TensorFlow Pipeline 96
Tensors—the Main Data Structure 97
First Session 98
Data Flow Graphs 100
Tensor Properties 103
Tensor Rank 104
Tensor Shape 105
Summary 105
Chapter 4: Build Your First Toy TensorFlow app 107
Basic Development of TensorFlow 107
Hello World with TensorFlow 108
Simple Iterations 109
Prepare the Input Data 110
Doing the Gradients 113
Linear Regression 115
Why Linear Regression? 115
What Is Linear Regression? 115
Dataset Description 116
Full Source Code 121
XOR Implementation Using TensorFlow 123
Full Source Code 129
Summary 131
Part II: Data 132
Chapter 5: Defining Data 133
Defining Data 134
Why Should You Read This Chapter? 134
Structured, Semistructured, and Unstructured Data 135
Tidy Data 137
Divide and Conquer 138
Tabular Data 139
Quantitative? vs. ?Qualitative? Data 139
Example—the Titanic 139
Divide and Conquer 141
Making a Checkpoint 142
The Four Levels of Data 142
Measure of Center 143
The Nominal Level 143
Mathematical Operations Allowed for Nominal 144
Measures of Center for Nominal 144
What Does It Mean to be a Nominal Level Type? 145
The Ordinal Level 145
Examples of Being Ordinal 146
What Data Is Like at the Ordinal Level 146
Mathematical Operations Allowed for Ordinal 147
Measures of Center for Ordinal 148
Quick Recap and Check 149
The Interval Level 150
Examples of Interval Level Data 150
What Data Is Like at the Interval Level 151
Mathematical Operations Allowed for Interval 151
Measures of Center for Interval 151
Measures of Variation for Interval 152
Standard Deviation 153
The Ratio Level 154
Examples 154
Measures of Center for Ratio 155
Problems with the Ratio Level 155
Summarizing All Levels Table 5-1 156
Text Data 157
What Is Text Processing and What Is the Level of Importance of Text Processing? 157
IMDB—Example 158
Images Data 159
Type of Images (2-D, 3-D, 4-D) 160
2-D Data 160
3-D Data 160
4-D Data 161
Example—MNIST 162
Example—CIFAR-10 163
Summary 164
Chapter 6: Data Wrangling and Preprocessing 166
The Data Fields Pipelines Revisited 167
Giving You a Reason 167
Where Is Data Cleaning in the Process? 168
Data Loading and Preprocessing 169
Fast and Easy Data Loading 169
Missing Data 177
Empties 178
Is It Ever Useful to Fill Missing Data Using a Zero Instead of an Empty or Null? 178
Managing Missing Features 179
Dealing with Big Datasets 180
Accessing Other Data Formats 182
Data Preprocessing 183
Data Augmentation 188
Image Crop 191
Crop and Resize 191
Crop to Bounding Box 193
Flipping 194
Rotate Image 196
Translation 197
Transform 198
Adding Salt and Pepper Noise 199
Convert RGB to Grayscale 200
Change Brightness 200
Adjust Contrast 201
Adjust Hue 202
Adjust Saturation 203
Categorical and Text data 204
Data Encoding 205
Performing One-Hot Encoding on Nominal Features 207
Can You Spot the Problem? 208
A Special Type of Data: Text 209
So Far, Everything Has Been Pretty Good, Hasn’t It? 214
Tokenization, Stemming, and Stop Words 220
What Are Tokenizing and Tokenization? 220
The Bag-of-Words (BoW) Model 223
What is the BoW? 223
Summary 225
Chapter 7: Data Resampling 226
Creating Training and Test Sets 227
Cross-Validation 228
Validation Set Technique 229
Leave-One-Out Cross-Validation (LOOCV) 232
K-Fold Cross-Validation 235
Bootstrap 236
Bootstrap in Statistics 237
Tips to Use Bootstrap (Resampling with Replacement) 239
Generators 242
What Are Keras Generators? 242
Data Generator 244
Callback 245
Summary 250
Chapter 8: Feature Selection and Feature Engineering 251
Dataset Used in This Chapter 252
Dimensionality Reduction—Questions to Answer 254
What Is Dimensionality Reduction? 255
When Should I Use Dimensionality Reduction? 257
Unsupervised Dimensionality Reduction via Principal Component Analysis (PCA) 258
Total and Explained Variance 261
Feature Selection and Filtering 261
Principal Component Analysis 265
Nonnegative Matrix Factorization 274
Sparse PCA 275
Kernel PCA 277
Atom Extraction and Dictionary Learning 279
Latent Dirichlet Allocation (LDA) 280
Latent Dirichlet Allocation (LDA in NLP) 281
Code Example Using gensim 285
LDA vs. PCA 287
ZCA Whitening 290
Summary 294
Part III: TensorFlow 295
Chapter 9: Deep Learning Fundamentals 296
Perceptron 297
Single Perceptron 307
Multilayer Perceptron 308
Recap 309
Different Neural Network Layers 310
Input Layer 311
Hidden Layer(s) 311
Output Layer 312
Shallow vs. Deep Neural Networks 312
Activation Functions 314
Types of Activation Functions 316
Recap 322
Gradient Descent 322
Recap 324
Batch vs. Stochastic vs. Mini-Batch Gradient Descent 325
Batch Gradient Descent 325
Stochastic Gradient Descent 326
Mini-batch Gradient Descent 327
Recap 328
Loss function and Backpropagation 329
Loss Function 333
Backpropagation 336
The Four Fundamental Equations Behind Backpropagation 339
Exploding Gradients 347
Re-Design the Network Model 349
Use Long Short-Term Memory Networks 349
Use Gradient Clipping 349
Use Weight Regularization 350
Vanishing Gradients 350
Vanishing Gradients Problem 351
TensorFlow Basics 353
Placeholder vs. Variable vs. Constant 354
Gradient-Descent Optimization Methods from a Deep-Learning Perspective 355
Learning Rate in the Mini-batch Approach to Stochastic Gradient Descent 360
Summary 360
Chapter 10: Improving Deep Neural Networks 361
Optimizers in TensorFlow 361
The Notation to Use 362
Momentum 363
Nesterov Accelerated Gradient 364
Adagrad 365
Adadelta 366
RMSprop 367
Adam 368
Nadam (Adam + NAG) 369
Choosing the Learning Rate 370
Dropout Layers and Regularization 373
Normalization Techniques 375
Batch Normalization 376
Weight Normalization 377
Layer Normalization 378
Instance Normalization 379
Group Normalization 380
Summary 381
Chapter 11: Convolutional Neural Network 383
What is a Convolutional Neural Network 384
Convolution Operation 385
One-Dimensional Convolution 385
Two-Dimensional Convolution 387
Padding and Stride 388
Common Image-Processing Filters 391
Mean and Median Filters 391
Gaussian Filter 398
Sobel Edge-Detection Filter 401
Identity Transform 406
Convolutional Neural Networks 406
Layers of Convolutional Neural Networks 407
Input Layer 409
Convolutional layer 409
Pooling Layer 412
Backpropagation Through the Convolutional and Pooling Layers 413
Weight Sharing Through Convolution and Its Advantages 415
Translation Equivariance and Invariance 416
Case Study—Digit Recognition on the CIFAR-10 Dataset 419
Summary 429
Chapter 12: Sequential Models 430
Recurrent Neural Networks 430
Language Modeling 435
Backpropagation Through Time 439
Vanishing and Exploding Gradient Problems in RNN 444
The Solution to Vanishing and Exploding Gradients Problems in RNNs 447
Long Short-Term Memory 448
Case Study—Digit Identification on the MNIST Dataset 453
Gated Recurrent Unit 453
Bidirectional RNN (Bi-RNN) 460
Summary 461
Part IV: Applying What You’ve Learned 462
Chapter 13: Selected Topics in Computer Vision 463
Different Architectures in Convolutional Neural Networks 464
LeNet 465
AlexNet 467
VGG 470
ResNet 472
Transfer Learning 474
What Is a Pretrained Model, and Why Use It? 475
How to Use a Pretrained Model? 476
Ways to Fine-Tune the Model 477
Pretrained VGG19 478
Summary 483
Chapter 14: Selected Topics in Natural Language Processing 484
Vector Space Model 485
Vector Representation of Words 488
Word2Vec 489
Continuous Bag of Words 489
Implementing Continuous Bag of Words 492
Skip-Gram Model for Word Embeddings 499
Implementing Skip-Gram 502
GloVe 505
Summary 507
Chapter 15: Applications 508
Case Study—Tabular Dataset 508
Understanding the Dataset 508
Scratching the Surface 510
Digging Deeper 514
Preprocessing Dataset 518
Building the Model 523
Case Study—IMDB Movie Review Data with Word2Vec 528
Case Study—Image Segmentation 538
Summary 548
Index 549
| Erscheint lt. Verlag | 20.12.2019 |
|---|---|
| Zusatzinfo | XXV, 551 p. 214 illus. |
| Sprache | englisch |
| Themenwelt | Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik |
| Schlagworte | Deep learning • deep learning pipeline • machine learning • Python • R Programming • tensorflow |
| ISBN-10 | 1-4842-5349-3 / 1484253493 |
| ISBN-13 | 978-1-4842-5349-6 / 9781484253496 |
| Informationen gemäß Produktsicherheitsverordnung (GPSR) | |
| Haben Sie eine Frage zum Produkt? |
DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich