Zum Hauptinhalt springen
Nicht aus der Schweiz? Besuchen Sie lehmanns.de
Data Analytics - Azhar Ul Haque Sario

Data Analytics (eBook)

From Foundation to Specialization (2025 Edition)
eBook Download: EPUB
2025
214 Seiten
Azhar Sario Hungary (Verlag)
978-3-384-75556-8 (ISBN)
Systemvoraussetzungen
5,16 inkl. MwSt
(CHF 4,95)
Der eBook-Verkauf erfolgt durch die Lehmanns Media GmbH (Berlin) zum Preis in Euro inkl. MwSt.
  • Download sofort lieferbar
  • Zahlungsarten anzeigen

Dive into the World of Data Analytics with This 2025 Guide!


Hey there, if you're looking to master data analytics from the ground up, this book is your ultimate companion. It kicks off with the basics in Part I, covering the data analytics landscape, including modern lifecycles like CRISP-DM and agile workflows. You'll explore core data structures: structured, unstructured, and semi-structured. Learn about key roles like data analyst, scientist, and ML engineer. Get an intro to AI and generative AI in analytics. Move to statistical foundations: descriptive stats, distributions, inferential methods like hypothesis testing and confidence intervals. Dive into correlation, linear regression, and applied linear algebra. Then, Python programming: setup, Pandas for manipulation, NumPy for computing, Scikit-learn for modeling. Part II dives into the workflow: relational databases, advanced SQL with joins, window functions, optimization. Data sourcing from enterprises, APIs, web scraping-including AI-powered and ethical aspects. Data prep: profiling, deduplication, missing data handling, outliers, transformations. Exploratory analysis: univariate, bivariate, multivariate with PCA, communicating findings.


What sets this book apart is its 2025 focus-blending timeless foundations with cutting-edge trends like AI automation, real-time streaming, and cloud lakehouses that older books overlook. Unlike generic guides, it packs real-time case studies, like Uber's AI agents or Tesla's data structures, plus job skill enhancements tied to market data showing ML engineer growth at +34%. It bridges gaps other texts miss, like ethical scraping or sentiment analysis for brands, with hands-on Python applications and career pathways. No fluff; it's practical, updated for today's job market, giving you a competitive edge in high-demand roles.


This author has no affiliation with the board and it is independently produced under nominative fair use.

PART II: THE CORE ANALYTICS WORKFLOW


 

Database Mechanics and Advanced SQL for Analysts


 

4.1: Relational Databases and Data Definition (DDL/DML)

 

Welcome to the bedrock of data analysis. Before you can analyze data, you have to understand where it lives and what rules govern its existence. For most analysts, that "home" is a relational database.

 

Think of a relational database not as a single, giant spreadsheet, but as a highly organized digital filing cabinet. This cabinet contains many different folders, and each folder is designed to hold one specific type of information. In database terms, the cabinet is the database schema, and each folder is a table. A table organizes information into columns (the categories of information, like "First Name" or "Email Address") and rows (the specific records, like a single customer).

 

The "relational" part is the secret sauce. It means the tables are linked to each other. The Customers table doesn't also contain all their order information; that would be messy and repetitive. Instead, it's related to a separate Orders table. This structured approach, known as normalization, is incredibly efficient. It prevents errors, saves space, and makes the whole system fast and reliable.

 

To manage this structure, we use SQL, which is split into two main "languages."

 

The Blueprint: Data Definition Language (DDL)

 

First, you must build the cabinet itself. DDL (Data Definition Language) is the set of commands you use to define, build, and modify the database structure. These are the architectural commands. You don't use them every day, but when you do, it's for a significant structural change.

 

The main DDL commands are:

 

CREATE TABLE: This is the command to build a new, empty table. It's here you define every column, what type of data it holds (e.g., text, number, date), and its rules.

 

ALTER TABLE: This modifies an existing table. Need to add a new "Phone Number" column? Need to rename the "Email Address" column to just "Email"? ALTER TABLE is your tool.

 

DROP TABLE: This command completely deletes a table and all the data in it. It's powerful and irreversible.

 

The most important rules you define during a CREATE TABLE statement are the keys.

 

PRIMARY KEY (PK): This is a column (or set of columns) that provides a 100% unique identifier for every single row in that table. For a Customers table, this is customer_id. It's like a social security number for that row. No two customers can have the same customer_id. The database enforces this rule.

 

FOREIGN KEY (FK): This is the magic that creates the "relation." A foreign key is a column in one table that points to the primary key in another table. This is how you link tables.

 

Let's build that e-commerce database. We start with DDL.

SQL

 

/* This is our DDL for defining the 'blueprint' */

 

-- First, create the 'parent' table

CREATE TABLE Customers (

customer_id INT PRIMARY KEY,

first_name VARCHAR(100),

last_name VARCHAR(100),

email VARCHAR(255) UNIQUE

/* UNIQUE is another 'constraint', like PRIMARY KEY */

);

 

-- Now, create the 'child' table that links to it

CREATE TABLE Orders (

order_id INT PRIMARY KEY,

order_date DATE,

total_amount DECIMAL(10, 2),

customer_id INT, /* This column will be our foreign key */

 

/* This line is the 'rule' that links the tables */

FOREIGN KEY (customer_id) REFERENCES Customers(customer_id)

);

 

What does that FOREIGN KEY line really do? It establishes referential integrity. In simple English, it means the database now guarantees that you cannot create an order with a customer_id that doesn't exist in the Customers table. I have seen production systems, built in a hurry, that omitted this rule. Six months later, they had thousands of "orphan" orders in their database—orders with no associated customer. They couldn't be billed, couldn't be shipped, and the data was a nightmare to clean up. DDL and foreign keys are your data quality insurance policy.

 

The Action: Data Manipulation Language (DML)

 

If DDL builds the house, DML (Data Manipulation Language) is how you move furniture in, rearrange it, and take it out. These are the data commands, the ones you (or the application) use every single day.

 

The main DML commands are:

 

INSERT: This adds new rows (data) into a table.

 

UPDATE: This modifies data in existing rows.

 

DELETE: This removes rows from a table.

 

Let's use our new tables. A new customer, Jane Doe, signs up. That's an INSERT.

SQL

 

/* This is our DML for 'manipulating' the data */

 

-- Jane signs up. This is an INSERT into the Customers table.

INSERT INTO Customers (customer_id, first_name, last_name, email)

VALUES (101, 'Jane', 'Doe', 'jane.doe@example.com');

 

A few moments later, Jane places an order. That's another INSERT, this time into the Orders table.

SQL

 

-- Jane buys something.

INSERT INTO Orders (order_id, order_date, total_amount, customer_id)

VALUES (5001, '2025-11-13', 79.99, 101);

 

Because of our FOREIGN KEY rule, the database checks: "Does customer_id = 101 exist in the Customers table?" It does. The INSERT is allowed. If we tried to INSERT an order for customer_id = 999 (who doesn't exist), the database would return an error.

 

The next day, Jane realizes she mistyped her name. She goes to her profile and changes it. That's an UPDATE.

SQL

 

-- Jane corrects her name.

UPDATE Customers

SET first_name = 'Janette'

WHERE customer_id = 101;

/* The WHERE clause is critical! Without it, you'd update *every* customer! */

 

Finally, after a few years, Janette decides to close her account and requests her data be erased. That's a DELETE.

SQL

 

-- First, we must delete her orders (the 'child' records)

DELETE FROM Orders

WHERE customer_id = 101;

 

-- Now we can delete her main record (the 'parent' record)

DELETE FROM Customers

WHERE customer_id = 101;

 

This DDL/DML distinction is the first and most critical concept. DDL is the schema, DML is the data. As an analyst, you'll rarely use DDL, but understanding it is critical. It helps you understand the source of your data, its rules, and why it's structured the way it is.

 

4.2: SQL for Data Retrieval: Filtering, Joining, and Aggregating

 

This is the heart of the analyst's toolkit. While DDL builds the database, and DML fills it, data retrieval is about asking questions. The entire practice of data retrieval is centered on one powerful command: SELECT.

 

This module is about mastering the SELECT statement and its various clauses, which act like a set of filters and lenses to turn billions of raw rows into a single, actionable insight.

 

The Core: SELECT, FROM, and WHERE

 

This is the basic structure of almost every query you'll ever write.

 

SELECT: Tells the database which columns you want to see.

 

FROM: Tells the database which table to get them from.

 

WHERE: Filters the rows based on a condition. This is your primary tool for reducing the data down to a manageable, relevant subset.

 

Imagine our Orders table has 10 million rows. We don't want all of them. We just want to see the "big ticket" orders.

SQL

 

SELECT order_id, customer_id, total_amount

FROM Orders

WHERE total_amount > 1000;

/* This query skips all 9.9 million other rows */

 

The WHERE clause is the workhorse. It can handle dates (WHERE order_date >= '2025-01-01'), text (WHERE region = 'Northeast'), and complex logic (WHERE (total_amount > 500 AND region = 'West') OR promotion_code = 'HOLIDAY25').

 

The Real Power: JOINs

 

Data is almost never in one table. Your job is to combine tables to answer real questions. You can't find your "best customers" by looking only at the Orders table (which has total_amount) or only at the Customers table (which has customer_name). You need both. JOINs are how you weave them together.

 

The join clause connects tables based on their shared key (the Primary Key/Foreign Key relationship we established).

 

INNER JOIN: This is the most common. It only returns rows that have a match in both tables. Think of it as the "intersection." If you INNER JOIN Customers and Orders, you will only see customers who have placed an order. Customers who haven't ordered yet will be excluded.

 

LEFT JOIN: This is arguably the most useful join for analysts. It says, "Get me everything from the 'left' table (the first one you list), and if you find a match in the 'right' table, bring that data along....

Erscheint lt. Verlag 15.11.2025
Sprache englisch
Themenwelt Mathematik / Informatik Informatik Netzwerke
Schlagworte AI in analytics • career skills • data analytics • Data Visualization • machine learning • Python programming • SQL Mastery
ISBN-10 3-384-75556-1 / 3384755561
ISBN-13 978-3-384-75556-8 / 9783384755568
Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?
EPUBEPUB (Ohne DRM)

Digital Rights Management: ohne DRM
Dieses eBook enthält kein DRM oder Kopier­schutz. Eine Weiter­gabe an Dritte ist jedoch rechtlich nicht zulässig, weil Sie beim Kauf nur die Rechte an der persön­lichen Nutzung erwerben.

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belle­tristik und Sach­büchern. Der Fließ­text wird dynamisch an die Display- und Schrift­größe ange­passt. Auch für mobile Lese­geräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür die kostenlose Software Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich