Zum Hauptinhalt springen
Nicht aus der Schweiz? Besuchen Sie lehmanns.de

Data Mining and Predictive Analytics (eBook)

eBook Download: EPUB
2015 | 2. Auflage
John Wiley & Sons (Verlag)
978-1-118-86870-6 (ISBN)

Lese- und Medienproben

Data Mining and Predictive Analytics - Daniel T. Larose
Systemvoraussetzungen
127,99 inkl. MwSt
(CHF 124,95)
Der eBook-Verkauf erfolgt durch die Lehmanns Media GmbH (Berlin) zum Preis in Euro inkl. MwSt.
  • Download sofort lieferbar
  • Zahlungsarten anzeigen

Learn methods of data analysis and their application to real-world data sets

This updated second edition serves as an introduction to data mining methods and models, including association rules, clustering, neural networks, logistic regression, and multivariate analysis. The authors apply a unified 'white box' approach to data mining methods and models. This approach is designed to walk readers through the operations and nuances of the various methods, using small data sets, so readers can gain an insight into the inner workings of the method under review. Chapters provide readers with hands-on analysis problems, representing an opportunity for readers to apply their newly-acquired data mining expertise to solving real problems using large, real-world data sets.

Data Mining and Predictive Analytics:

  • Offers comprehensive coverage of association rules, clustering, neural networks, logistic regression, multivariate analysis, and R statistical programming language
  • Features over 750 chapter exercises, allowing readers to assess their understanding of the new material
  • Provides a detailed case study that brings together the lessons learned in the book
  • Includes access to the companion website, www.dataminingconsultant, with exclusive password-protected instructor content

Data Mining and Predictive Analytics will appeal to computer science and statistic students, as well as students in MBA programs, and chief executives.



Daniel T. Larose is Professor of Mathematical Sciences and Director of the Data Mining programs at Central Connecticut State University. He has published several books, including Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage (Wiley, 2007) and Discovering Knowledge in Data: An Introduction to Data Mining (Wiley, 2005). In addition to his scholarly work, Dr. Larose is a consultant in data mining and statistical analysis working with many high profile clients, including Microsoft, Forbes Magazine, the CIT Group, KPMG International, Computer Associates, and Deloitte, Inc.
Chantal D. Larose is an Assistant Professor of Statistics & Data Science at Eastern Connecticut State University (ECSU).  She has co-authored three books on data science and predictive analytics.  She helped develop data science programs at ECSU and at SUNY New Paltz.  She received her PhD in Statistics from the University of Connecticut, Storrs in 2015 (dissertation title: Model-based Clustering of Incomplete Data).

Daniel T. Larose is Professor of Mathematical Sciences and Director of the Data Mining programs at Central Connecticut State University. He has published several books, including Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage (Wiley, 2007) and Discovering Knowledge in Data: An Introduction to Data Mining (Wiley, 2005). In addition to his scholarly work, Dr. Larose is a consultant in data mining and statistical analysis working with many high profile clients, including Microsoft, Forbes Magazine, the CIT Group, KPMG International, Computer Associates, and Deloitte, Inc. Chantal D. Larose is an Assistant Professor of Statistics & Data Science at Eastern Connecticut State University (ECSU). She has co-authored three books on data science and predictive analytics. She helped develop data science programs at ECSU and at SUNY New Paltz. She received her PhD in Statistics from the University of Connecticut, Storrs in 2015 (dissertation title: Model-based Clustering of Incomplete Data).

Preface


What is Data Mining? What is Predictive Analytics?


Data mining is the process of discovering useful patterns and trends in large data sets.

Predictive analytics is the process of extracting information from large data sets in order to make predictions and estimates about future outcomes.

Data Mining and Predictive Analytics, by Daniel Larose and Chantal Larose, will enable you to become an expert in these cutting-edge, profitable fields.

Why is this Book Needed?


According to the research firm MarketsandMarkets, the global big data market is expected to grow by 26% per year from 2013 to 2018, from $14.87 billion in 2013 to $46.34 billion in 2018.1 Corporations and institutions worldwide are learning to apply data mining and predictive analytics, in order to increase profits. Companies that do not apply these methods will be left behind in the global competition of the twenty-first-century economy.

Humans are inundated with data in most fields. Unfortunately, most of this valuable data, which cost firms millions to collect and collate, are languishing in warehouses and repositories. The problem is that there are not enough trained human analysts available who are skilled at translating all of this data into knowledge, and thence up the taxonomy tree into wisdom. This is why this book is needed.

The McKinsey Global Institute reports2

:

There will be a shortage of talent necessary for organizations to take advantage of big data. A significant constraint on realizing value from big data will be a shortage of talent, particularly of people with deep expertise in statistics and machine learning, and the managers and analysts who know how to operate companies by using insights from big data… We project that demand for deep analytical positions in a big data world could exceed the supply being produced on current trends by 140,000 to 190,000 positions. … In addition, we project a need for 1.5 million additional managers and analysts in the United States who can ask the right questions and consume the results of the analysis of big data effectively.

This book is an attempt to help alleviate this critical shortage of data analysts.

Data mining is becoming more widespread every day, because it empowers companies to uncover profitable patterns and trends from their existing databases. Companies and institutions have spent millions of dollars to collect gigabytes and terabytes of data, but are not taking advantage of the valuable and actionable information hidden deep within their data repositories. However, as the practice of data mining becomes more widespread, companies that do not apply these techniques are in danger of falling behind, and losing market share, because their competitors are applying data mining, and thereby gaining the competitive edge.

Who Will Benefit from this Book?


In Data Mining and Predictive Analytics, the step-by-step hands-on solutions of real-world business problems using widely available data mining techniques applied to real-world data sets will appeal to managers, CIOs, CEOs, CFOs, data analysts, database analysts, and others who need to keep abreast of the latest methods for enhancing return on investment.

Using Data Mining and Predictive Analytics, you will learn what types of analysis will uncover the most profitable nuggets of knowledge from the data, while avoiding the potential pitfalls that may cost your company millions of dollars. You will learn data mining and predictive analytics by doing data mining and predictive analytics.

Danger! Data Mining is Easy to do Badly


The growth of new off-the-shelf software platforms for performing data mining has kindled a new kind of danger. The ease with which these applications can manipulate data, combined with the power of the formidable data mining algorithms embedded in the black-box software, make their misuse proportionally more hazardous.

In short, data mining is easy to do badly. A little knowledge is especially dangerous when it comes to applying powerful models based on huge data sets. For example, analyses carried out on unpreprocessed data can lead to erroneous conclusions, or inappropriate analysis may be applied to data sets that call for a completely different approach, or models may be derived that are built on wholly unwarranted specious assumptions. If deployed, these errors in analysis can lead to very expensive failures. Data Mining and Predictive Analytics will help make you a savvy analyst, who will avoid these costly pitfalls.

“White-Box” Approach


Understanding the Underlying Algorithmic and Model Structures


The best way to avoid costly errors stemming from a blind black-box approach to data mining and predictive analytics is to instead apply a “white-box” methodology, which emphasizes an understanding of the algorithmic and statistical model structures underlying the software.

Data Mining and Predictive Analytics applies this white-box approach by

  • clearly explaining why a particular method or algorithm is needed;
  • getting the reader acquainted with how a method or algorithm works, using a toy example (tiny data set), so that the reader may follow the logic step by step, and thus gain a white-box insight into the inner workings of the method or algorithm;
  • providing an application of the method to a large, real-world data set;
  • using exercises to test the reader's level of understanding of the concepts and algorithms;
  • providing an opportunity for the reader to experience doing some real data mining on large data sets.

Algorithm Walk-Throughs


Data Mining Methods and Models walks the reader through the operations and nuances of the various algorithms, using small data sets, so that the reader gets a true appreciation of what is really going on inside the algorithm. For example, in Chapter 21, we follow step by step as the balanced iterative reducing and clustering using hierarchies (BIRCH) algorithm works through a tiny data set, showing precisely how BIRCH chooses the optimal clustering solution for this data, from start to finish. As far as we know, such a demonstration is unique to this book for the BIRCH algorithm. Also, in Chapter 27, we proceed step by step to find the optimal solution using the selection, crossover, and mutation operators, using a tiny data set, so that the reader may better understand the underlying processes.

Applications of the Algorithms and Models to Large Data Sets


Data Mining and Predictive Analytics provides examples of the application of data analytic methods on actual large data sets. For example, in Chapter 9, we analytically unlock the relationship between nutrition rating and cereal content using a real-world data set. In Chapter 4, we apply principal components analysis to real-world census data about California. All data sets are available from the book series web site: www.dataminingconsultant.com.

Chapter Exercises: Checking to Make Sure You Understand It


Data Mining and Predictive Analytics includes over 750 chapter exercises, which allow readers to assess their depth of understanding of the material, as well as have a little fun playing with numbers and data. These include Clarifying the Concept exercises, which help to clarify some of the more challenging concepts in data mining, and Working with the Data exercises, which challenge the reader to apply the particular data mining algorithm to a small data set, and, step by step, to arrive at a computationally sound solution. For example, in Chapter 14, readers are asked to find the maximum a posteriori classification for the data set and network provided in the chapter.

Hands-On Analysis: Learn Data Mining by Doing Data Mining


Most chapters provide the reader with Hands-On Analysis problems, representing an opportunity for the reader to apply his or her newly acquired data mining expertise to solving real problems using large data sets. Many people learn by doing. Data Mining and Predictive Analytics provides a framework where the reader can learn data mining by doing data mining. For example, in Chapter 13, readers are challenged to approach a real-world credit approval classification data set, and construct their best possible logistic regression model, using the methods learned in this chapter as possible, providing strong interpretive support for the model, including explanations of derived variables and indicator variables.

Exciting New Topics


Data Mining and Predictive Analytics contains many exciting new topics, including the following:

  • Cost-benefit analysis using data-driven misclassification costs.
  • Cost-benefit analysis for trinary and k-nary classification models.
  • Graphical evaluation of classification models.
  • BIRCH clustering.
  • Segmentation models.
  • Ensemble methods: Bagging and boosting.
  • Model voting and propensity averaging.
  • Imputation of missing data.

The R Zone


R is a powerful, open-source language for exploring and analyzing data sets (www.r-project.org). Analysts using R can take advantage of many freely available packages, routines, and graphical user interfaces to tackle most data analysis problems. In most chapters of this book,...

Erscheint lt. Verlag 16.3.2015
Reihe/Serie Wiley Series on Methods and Applications
Wiley Series on Methods and Applications
Wiley Series on Methods and Applications in Data Mining
Sprache englisch
Themenwelt Informatik Datenbanken Data Warehouse / Data Mining
Mathematik / Informatik Informatik Netzwerke
Schlagworte Bayesian probability • big-data tools • Bioinformatik • clustering algorithms • Computer Science • Cost-Benefit Analysis • Data Analysis • Database & Data Warehousing Technologies • Data Mining • Data Mining Statistics • Datenbanken u. Data Warehousing • Decision Tree • Finance & Investments • Finanz- u. Anlagewesen • Informatik • kohonen networks • linear regression • multivariate analysis • Predictive Modelling • R statistical programming • Statistics • Statistik
ISBN-10 1-118-86870-6 / 1118868706
ISBN-13 978-1-118-86870-6 / 9781118868706
Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?
EPUBEPUB (Adobe DRM)

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belle­tristik und Sach­büchern. Der Fließ­text wird dynamisch an die Display- und Schrift­größe ange­passt. Auch für mobile Lese­geräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich
Discover advanced techniques and best practices for efficient search …

von Prashant Agrawal; Jon Handler; Soujanya Konka

eBook Download (2025)
Packt Publishing (Verlag)
CHF 29,30
The definitive guide to creating production-ready Python applications …

von Eric Narro

eBook Download (2025)
Packt Publishing (Verlag)
CHF 29,30