Für diesen Artikel ist leider kein Bild verfügbar.

Apache Spark Projects

A complete walk-through of Apache Spark's core capabilities with 7 real-world Big Data projects

Anirudh Ramanathan (Autor)

Buch | Softcover

281 Seiten

2020
Packt Publishing Limited (Verlag)
978-1-78899-595-5 (ISBN)

Titel ist leider vergriffen;
keine Neuauflage

Artikel merken

Explore the potential of Apache Spark and its ecosystem through real-world applications.
About This Book
* A unique, practical guide with 7 end to end projects demonstrating the power of Apache Spark
* Shows the readers how to perform real-time Big Data processing using different components of Apache Spark
* Includes best practices and tips for highest performance of their Big Data processing pipeline with Apache Spark
Who This Book Is For
This book is for Big Data professionals who want to master the features of Apache Spark and bring speed and ease-of-use in executing large-scale data processing tasks. Basic understanding of Apache Spark ecosystem is sufficient to get the most out of this book.
What You Will Learn
* Explore Spark ecosystem and learn to deploy in large-scale clusters
* Perform basic operations of Spark with the Movie lens data analysis
* Learn how to do data analysis using Spark Streaming and SQL
* Understand how to predict flight delays with Mlib
* Learn how to forecast sales predictions with SparkR
* Write Pyspark codes for building a recommendation engine
In Detail
Apache Spark is one of the most popular Big Data tools used in a plethora of industries today right from E-commerce, Entertainment to Travel and Retail Industry. This book demonstrates how to leverage the capabilities of Apache Spark and use them in practical projects using real-world scenarios.
The book begins with a quick introduction to all the components of the Spark ecosystem and later teach the readers how to use them in real-world scenarios. It demonstrates how to use each component of Apache Spark ecosystem, i.e. Spark SQL, Spark Streaming, Spark Mllib, PySpark to build an efficient, end to end Big Data processing pipeline. Some of the projects that are covered such as Sales forecasting using SparkR and recommendation engine using PySpark. The readers will learn about the different libraries like Mlib, Spark SQL, GraphX and Spark Streaming. Throughout the book, the readers will gain knowledge about the different components of the Spark ecosystem and will also be able to manage their big data pipelines using Apache Spark.
By the end of the book, you will master all the aspects of Apache Spark, and use them in your own Big Data projects without any hassle.

Anirudh Ramanathan is a Software Engineer in distributed systems at Google since May 2016 and is an Apache Spark committer. He works on Kubernetes with a focus on batch processing, distributed databases, machine learning and big data pipelines. He started the Kubernetes scheduler project in Apache Spark and is also the founding engineer on the Kubeflow project He is also working at the intersection of containers and: Spark, Airflow, Tensorflow, Jupyter, etc. at Google. Prior to this, he was a Master's student in the Computer Science department at Texas A&M University. His interests lie in distributed systems and programming languages.

Erscheint lt. Verlag	14.2.2020
Verlagsort	Birmingham
Sprache	englisch
Maße	191 x 235 mm
Themenwelt	Mathematik / Informatik ► Informatik ► Datenbanken
Themenwelt	Mathematik / Informatik ► Informatik ► Theorie / Studium
ISBN-10	1-78899-595-3 / 1788995953
ISBN-13	978-1-78899-595-5 / 9781788995955
Zustand	Neuware