Pandas Cookbook - William Ayd, Matthew Harrison

Blick ins Buch

Pandas Cookbook (eBook)

Practical recipes for scientific computing, time series, and exploratory data analysis using Python

William Ayd, Matthew Harrison (Autoren)

eBook Download: EPUB

2024
404 Seiten
Packt Publishing (Verlag)
978-1-83620-586-9 (ISBN)

Lese- und Medienproben

Ebook-Leseprobe (EPUB)

Unlock the full power of pandas 2.x with this hands-on cookbook, designed for Python developers, data analysts, and data scientists who need fast, efficient solutions for real-world data challenges. This book provides practical, ready-to-use recipes to streamline your workflow. With step-by-step guidance, you'll master data wrangling, visualization, performance optimization, and scalable data analysis using pandas' most powerful features.
From importing and merging large datasets to advanced time series analysis and SQL-like operations, this cookbook equips you with the tools to analyze, manipulate, and visualize data like a pro. Learn how to boost efficiency, optimize memory usage, and seamlessly integrate pandas with NumPy, PyArrow, and databases. This book will help you transform raw data into actionable insights with ease.
*Email sign-up and proof of purchase required

From fundamental techniques to advanced strategies for handling big data, visualization, and more, this book equips you with skills to excel in real-world data analysis projects. Free with your book: DRM-free PDF version + access to Packt's next-gen Reader*Key FeaturesThis book targets features in pandas 2.x and beyondPractical, easy to implement recipes for quick solutions to common problems in data using pandasMaster the fundamentals of pandas to quickly begin exploring any datasetBook DescriptionUnlock the full power of pandas 2.x with this hands-on cookbook, designed for Python developers, data analysts, and data scientists who need fast, efficient solutions for real-world data challenges. This book provides practical, ready-to-use recipes to streamline your workflow. With step-by-step guidance, you'll master data wrangling, visualization, performance optimization, and scalable data analysis using pandas most powerful features. From importing and merging large datasets to advanced time series analysis and SQL-like operations, this cookbook equips you with the tools to analyze, manipulate, and visualize data like a pro. Learn how to boost efficiency, optimize memory usage, and seamlessly integrate pandas with NumPy, PyArrow, and databases. This book will help you transform raw data into actionable insights with ease. *Email sign-up and proof of purchase requiredWhat you will learnThe pandas type system and how to best navigate itImport/export DataFrames to/from common data formatsData exploration in pandas through dozens of practice problemsGrouping, aggregation, transformation, reshaping, and filtering dataMerge data from different sources through pandas SQL-like operationsLeverage the robust pandas time series functionality in advanced analysesScale pandas operations to get the most out of your systemThe large ecosystem that pandas can coordinate with and supplementWho this book is forThis book is for Python developers, data scientists, engineers, and analysts. pandas is the ideal tool for manipulating structured data with Python and this book provides ample instruction and examples. Not only does it cover the basics required to be proficient, but it goes into the details of idiomatic pandas]]>

Preface

pandas is a library for creating and manipulating structured data with Python. What do I mean by structured? I mean tabular data in rows and columns like what you would find in a spreadsheet or database. Data scientists, analysts, programmers, engineers, and others are leveraging it to mold their data.

pandas is limited to “small data” (data that can fit in memory on a single machine). However, the syntax and operations have been adopted by or inspired other projects: PySpark, Dask, and cuDF, among others. These projects have different goals, but some of them will scale out to big data. So, there is value in understanding how pandas works as the features are becoming the de facto API for interacting with structured data.

I, Will Ayd, have been a core maintainer of the pandas library since 2018. During that time, I have had the pleasure of contributing to and collaborating on a host of other open source projects in the same ecosystem, including but not limited to Arrow, NumPy and Cython.

I also consult for a living, utilizing the same ecosystem that I contribute to. Using the best open source tooling, I help clients develop data strategies, implement processes and patterns, and train associates to stay ahead of the ever-changing analytics curve. I strongly believe in the freedom that open source tooling provides, and have proven that value to many companies.

If your company is interested in optimizing your data strategy, feel free to reach out (will_ayd@innobi.io).

Who this book is for

This book contains a huge number of recipes, ranging from very simple to advanced. All recipes strive to be written in clear, concise, and modern idiomatic pandas code. The How it works sections contain extremely detailed descriptions of the intricacies of each step of the recipe. Often, in the There’s more… section, you will get what may seem like an entirely new recipe. This book is densely packed with an extraordinary amount of pandas code.

While not strictly required, users are advised to read the book chronologically. The recipes are structured in such a way that they first introduce concepts and features using very small, directed examples, but continuously build from there into more complex applications.

Due to the wide range of complexity, this book can be useful to both novice and everyday users alike. It has been my experience that even those who use pandas regularly will not master it without being exposed to idiomatic pandas code. This is somewhat fostered by the breadth that pandas offers. There are almost always multiple ways of completing the same operation, which can have users get the result they want but in a very inefficient manner. It is not uncommon to see an order of magnitude or more in performance difference between two sets of pandas solutions to the same problem.

The only real prerequisite for this book is a fundamental knowledge of Python. It is assumed that the reader is familiar with all the common built-in data containers in Python, such as lists, sets, dictionaries, and tuples.

What this book covers

Chapter 1, pandas Foundations, introduces the main pandas objects, namely, Series, DataFrames, and Index.

Chapter 2, Selection and Assignment, shows you how to sift through the data that you have loaded into any of the pandas data structures.

Chapter 3, Data Types, explores the type system underlying pandas. This is an area that has evolved rapidly and will continue to do so, so knowing the types and what distinguishes them is invaluable information.

Chapter 4, The pandas I/O System, shows why pandas has long been a popular tool to read from and write to a variety of storage formats.

Chapter 5, Algorithms and How to Apply Them, introduces you to the foundation of performing calculations with the pandas data structures.

Chapter 6, Visualization, shows you how pandas can be used directly for plotting, alongside the seaborn library which integrates well with pandas.

Chapter 7, Reshaping DataFrames, discusses the many ways in which data can be transformed and summarized robustly via the pandas pd.DataFrame.

Chapter 8, Group By, showcases how to segment and summarize subsets of your data contained within a pd.DataFrame.

Chapter 9, Temporal Data Types and Algorithms, introduces users to the date/time types which underlie time-series-based analyses that pandas is famous for and highlights usage against real data.

Chapter 10, General Usage/Performance Tips, goes over common pitfalls users run into when using pandas, and showcases the idiomatic solutions.

Chapter 11, The pandas Ecosystem, discusses other open source libraries that integrate, extend, and/or complement pandas.

To get the most out of this book

There are a couple of things you can do to get the most out of this book. First, and most importantly, you should download all the code, which is stored in Jupyter Notebook. While reading through each recipe, run each step of code in the notebook. Make sure you explore on your own as you run through the code. Second, have the pandas official documentation open (http://pandas.pydata.org/pandas-docs/stable/) in one of your browser tabs. The pandas documentation is an excellent resource containing over 1,000 pages of material. There are examples for most of the pandas operations in the documentation, and they will often be directly linked from the See also section. While it covers the basics of most operations, it does so with trivial examples and fake data that don’t reflect situations that you are likely to encounter when analyzing datasets from the real world.

What you need for this book

pandas is a third-party package for the Python programming language and, as of the printing of this book, is transitioning from the 2.x to the 3.x series. The examples in this book should work with a minimum pandas version of 2.0 along with Python versions 3.9 and above.

The code in this book will make use of the pandas, NumPy, and PyArrow libraries. Jupyter Notebook files are also a popular way to visualize and inspect code. All of these libraries should be installable via pip or the package manager of your choice. For pip users, you can run:

python -m pip install pandas numpy pyarrow notebook

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support/errata and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at www.packt.com.
Select the Support tab.
Click on Code Downloads.
Enter the name of the book in the Search box and follow the on-screen instructions.

The code bundle for the book is also hosted on GitHub at https://github.com/WillAyd/Pandas-Cookbook-Third-Edition. In case there is an update to the code, it will be updated in the existing GitHub repository.

Running a Jupyter notebook

The suggested method to work through the content of this book is to have a Jupyter notebook up and running so that you can run the code while reading through the recipes. Following along on your computer allows you to go off exploring on your own and gain a deeper understanding than by just reading the book alone.

After installing Jupyter notebook, open a Command Prompt (type cmd at the search bar on Windows, or open Terminal on Mac or Linux) and type:

jupyter notebook

It is not necessary to run this command from your home directory. You can run it from any location, and the contents in the browser will reflect that location. Although we have now started the Jupyter Notebook program, we haven’t actually launched a single individual notebook where we can start developing in Python. To do so, you can click on the New button on the right-hand side of the page, which will drop down a list of all the possible kernels available for you to use. If you are working from a fresh installation, then you will only have a single kernel available to you (Python 3). After selecting the Python 3 kernel, a new tab will open in the browser, where you can start writing Python code.

You can, of course, open previously created notebooks instead of beginning a new one. To do so, navigate through the filesystem provided in the Jupyter Notebook browser home page and select the notebook you want to open. All Jupyter Notebook files end in .ipynb.

Alternatively, you may use cloud providers for a notebook environment. Both Google and Microsoft provide free notebook environments that come preloaded with pandas.

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://packt.link/gbp/9781836205876.

Conventions

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in...

Erscheint lt. Verlag	31.10.2024
Vorwort	Wes McKinney
Sprache	englisch
Themenwelt	Sachbuch/Ratgeber ► Freizeit / Hobby ► Sammeln / Sammlerkataloge
	Informatik ► Datenbanken ► Data Warehouse / Data Mining
	Mathematik / Informatik ► Informatik ► Programmiersprachen / -werkzeuge
	Mathematik / Informatik ► Informatik ► Theorie / Studium
ISBN-10	1-83620-586-4 / 1836205864
ISBN-13	978-1-83620-586-9 / 9781836205869

Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?

EPUB (Ohne DRM)

Digital Rights Management: ohne DRM
Dieses eBook enthält kein DRM oder Kopierschutz. Eine Weitergabe an Dritte ist jedoch rechtlich nicht zulässig, weil Sie beim Kauf nur die Rechte an der persönlichen Nutzung erwerben.

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür die kostenlose Software Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Print-Ausgabe

Buch | Softcover

CHF 66,30