Zum Hauptinhalt springen
Nicht aus der Schweiz? Besuchen Sie lehmanns.de

Pandas Cookbook (eBook)

Practical recipes for scientific computing, time series, and exploratory data analysis using Python
eBook Download: EPUB
2024
404 Seiten
Packt Publishing (Verlag)
978-1-83620-586-9 (ISBN)

Lese- und Medienproben

Pandas Cookbook -  William Ayd,  Matthew Harrison
Systemvoraussetzungen
29,99 inkl. MwSt
(CHF 29,30)
Der eBook-Verkauf erfolgt durch die Lehmanns Media GmbH (Berlin) zum Preis in Euro inkl. MwSt.
  • Download sofort lieferbar
  • Zahlungsarten anzeigen

Unlock the full power of pandas 2.x with this hands-on cookbook, designed for Python developers, data analysts, and data scientists who need fast, efficient solutions for real-world data challenges. This book provides practical, ready-to-use recipes to streamline your workflow. With step-by-step guidance, you'll master data wrangling, visualization, performance optimization, and scalable data analysis using pandas' most powerful features.
From importing and merging large datasets to advanced time series analysis and SQL-like operations, this cookbook equips you with the tools to analyze, manipulate, and visualize data like a pro. Learn how to boost efficiency, optimize memory usage, and seamlessly integrate pandas with NumPy, PyArrow, and databases. This book will help you transform raw data into actionable insights with ease.


No detailed description available for "e;Pandas Cookbook"e;.

Preface


pandas is a library for creating and manipulating structured data with Python. What do I mean by structured? I mean tabular data in rows and columns like what you would find in a spreadsheet or database. Data scientists, analysts, programmers, engineers, and others are leveraging it to mold their data.

pandas is limited to “small data” (data that can fit in memory on a single machine). However, the syntax and operations have been adopted by or inspired other projects: PySpark, Dask, and cuDF, among others. These projects have different goals, but some of them will scale out to big data. So, there is value in understanding how pandas works as the features are becoming the de facto API for interacting with structured data.

I, Will Ayd, have been a core maintainer of the pandas library since 2018. During that time, I have had the pleasure of contributing to and collaborating on a host of other open source projects in the same ecosystem, including but not limited to Arrow, NumPy and Cython.

I also consult for a living, utilizing the same ecosystem that I contribute to. Using the best open source tooling, I help clients develop data strategies, implement processes and patterns, and train associates to stay ahead of the ever-changing analytics curve. I strongly believe in the freedom that open source tooling provides, and have proven that value to many companies.

If your company is interested in optimizing your data strategy, feel free to reach out (will_ayd@innobi.io).

Who this book is for


This book contains a huge number of recipes, ranging from very simple to advanced. All recipes strive to be written in clear, concise, and modern idiomatic pandas code. The How it works sections contain extremely detailed descriptions of the intricacies of each step of the recipe. Often, in the There’s more… section, you will get what may seem like an entirely new recipe. This book is densely packed with an extraordinary amount of pandas code.

While not strictly required, users are advised to read the book chronologically. The recipes are structured in such a way that they first introduce concepts and features using very small, directed examples, but continuously build from there into more complex applications.

Due to the wide range of complexity, this book can be useful to both novice and everyday users alike. It has been my experience that even those who use pandas regularly will not master it without being exposed to idiomatic pandas code. This is somewhat fostered by the breadth that pandas offers. There are almost always multiple ways of completing the same operation, which can have users get the result they want but in a very inefficient manner. It is not uncommon to see an order of magnitude or more in performance difference between two sets of pandas solutions to the same problem.

The only real prerequisite for this book is a fundamental knowledge of Python. It is assumed that the reader is familiar with all the common built-in data containers in Python, such as lists, sets, dictionaries, and tuples.

What this book covers


Chapter 1, pandas Foundations, introduces the main pandas objects, namely, Series, DataFrames, and Index.

Chapter 2, Selection and Assignment, shows you how to sift through the data that you have loaded into any of the pandas data structures.

Chapter 3, Data Types, explores the type system underlying pandas. This is an area that has evolved rapidly and will continue to do so, so knowing the types and what distinguishes them is invaluable information.

Chapter 4, The pandas I/O System, shows why pandas has long been a popular tool to read from and write to a variety of storage formats.

Chapter 5, Algorithms and How to Apply Them, introduces you to the foundation of performing calculations with the pandas data structures.

Chapter 6, Visualization, shows you how pandas can be used directly for plotting, alongside the seaborn library which integrates well with pandas.

Chapter 7, Reshaping DataFrames, discusses the many ways in which data can be transformed and summarized robustly via the pandas pd.DataFrame.

Chapter 8, Group By, showcases how to segment and summarize subsets of your data contained within a pd.DataFrame.

Chapter 9, Temporal Data Types and Algorithms, introduces users to the date/time types which underlie time-series-based analyses that pandas is famous for and highlights usage against real data.

Chapter 10, General Usage/Performance Tips, goes over common pitfalls users run into when using pandas, and showcases the idiomatic solutions.

Chapter 11, The pandas Ecosystem, discusses other open source libraries that integrate, extend, and/or complement pandas.

To get the most out of this book


There are a couple of things you can do to get the most out of this book. First, and most importantly, you should download all the code, which is stored in Jupyter Notebook. While reading through each recipe, run each step of code in the notebook. Make sure you explore on your own as you run through the code. Second, have the pandas official documentation open (http://pandas.pydata.org/pandas-docs/stable/) in one of your browser tabs. The pandas documentation is an excellent resource containing over 1,000 pages of material. There are examples for most of the pandas operations in the documentation, and they will often be directly linked from the See also section. While it covers the basics of most operations, it does so with trivial examples and fake data that don’t reflect situations that you are likely to encounter when analyzing datasets from the real world.

What you need for this book


pandas is a third-party package for the Python programming language and, as of the printing of this book, is transitioning from the 2.x to the 3.x series. The examples in this book should work with a minimum pandas version of 2.0 along with Python versions 3.9 and above.

The code in this book will make use of the pandas, NumPy, and PyArrow libraries. Jupyter Notebook files are also a popular way to visualize and inspect code. All of these libraries should be installable via pip or the package manager of your choice. For pip users, you can run:

python -m pip install pandas numpy pyarrow notebook

Download the example code files


You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support/errata and register to have the files emailed directly to you.

You can download the code files by following these steps:

  1. Log in or register at www.packt.com.
  2. Select the Support tab.
  3. Click on Code Downloads.
  4. Enter the name of the book in the Search box and follow the on-screen instructions.

The code bundle for the book is also hosted on GitHub at https://github.com/WillAyd/Pandas-Cookbook-Third-Edition. In case there is an update to the code, it will be updated in the existing GitHub repository.

Running a Jupyter notebook


The suggested method to work through the content of this book is to have a Jupyter notebook up and running so that you can run the code while reading through the recipes. Following along on your computer allows you to go off exploring on your own and gain a deeper understanding than by just reading the book alone.

After installing Jupyter notebook, open a Command Prompt (type cmd at the search bar on Windows, or open Terminal on Mac or Linux) and type:

jupyter notebook

It is not necessary to run this command from your home directory. You can run it from any location, and the contents in the browser will reflect that location. Although we have now started the Jupyter Notebook program, we haven’t actually launched a single individual notebook where we can start developing in Python. To do so, you can click on the New button on the right-hand side of the page, which will drop down a list of all the possible kernels available for you to use. If you are working from a fresh installation, then you will only have a single kernel available to you (Python 3). After selecting the Python 3 kernel, a new tab will open in the browser, where you can start writing Python code.

You can, of course, open previously created notebooks instead of beginning a new one. To do so, navigate through the filesystem provided in the Jupyter Notebook browser home page and select the notebook you want to open. All Jupyter Notebook files end in .ipynb.

Alternatively, you may use cloud providers for a notebook environment. Both Google and Microsoft provide free notebook environments that come preloaded with pandas.

Download the color images


We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://packt.link/gbp/9781836205876.

Conventions


There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in...

EPUBEPUB (Ohne DRM)

Digital Rights Management: ohne DRM
Dieses eBook enthält kein DRM oder Kopier­schutz. Eine Weiter­gabe an Dritte ist jedoch rechtlich nicht zulässig, weil Sie beim Kauf nur die Rechte an der persön­lichen Nutzung erwerben.

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belle­tristik und Sach­büchern. Der Fließ­text wird dynamisch an die Display- und Schrift­größe ange­passt. Auch für mobile Lese­geräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür die kostenlose Software Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich
Discover advanced techniques and best practices for efficient search …

von Prashant Agrawal; Jon Handler; Soujanya Konka

eBook Download (2025)
Packt Publishing (Verlag)
CHF 29,30
The definitive guide to creating production-ready Python applications …

von Eric Narro

eBook Download (2025)
Packt Publishing (Verlag)
CHF 29,30