Zum Hauptinhalt springen
Nicht aus der Schweiz? Besuchen Sie lehmanns.de

Practical Text Mining with Perl (eBook)

(Autor)

eBook Download: PDF
2008 | 1. Auflage
320 Seiten
John Wiley & Sons (Verlag)
9780470382851 (ISBN)

Lese- und Medienproben

Practical Text Mining with Perl - Roger Bilisoly
Systemvoraussetzungen
112,99 inkl. MwSt
(CHF 109,95)
Der eBook-Verkauf erfolgt durch die Lehmanns Media GmbH (Berlin) zum Preis in Euro inkl. MwSt.
  • Download sofort lieferbar
  • Zahlungsarten anzeigen
Provides readers with the methods, algorithms, and means to
perform text mining tasks

This book is devoted to the fundamentals of text mining using
Perl, an open-source programming tool that is freely available via
the Internet (www.perl.org). It covers mining ideas from several
perspectives--statistics, data mining, linguistics, and information
retrieval--and provides readers with the means to successfully
complete text mining tasks on their own.

The book begins with an introduction to regular expressions, a
text pattern methodology, and quantitative text summaries, all of
which are fundamental tools of analyzing text. Then, it builds upon
this foundation to explore:

* Probability and texts, including the bag-of-words model

* Information retrieval techniques such as the TF-IDF similarity
measure

* Concordance lines and corpus linguistics

* Multivariate techniques such as correlation, principal
components analysis, and clustering

* Perl modules, German, and permutation tests

Each chapter is devoted to a single key topic, and the author
carefully and thoughtfully introduces mathematical concepts as they
arise, allowing readers to learn as they go without having to refer
to additional books. The inclusion of numerous exercises and
worked-out examples further complements the book's student-friendly
format.

Practical Text Mining with Perl is ideal as a textbook
for undergraduate and graduate courses in text mining and as a
reference for a variety of professionals who are interested in
extracting information from text documents.

Roger Bilisoly, PhD, is an Assistant Professor of Statistics at Central Connecticut State University, where he developed and teaches a new graduate-level course in text mining for the school's data mining program.

List of Figures.

List of Tables.

Preface.

Acknowledgments.

1. Introduction.

1.1 Overview of this Book.

1.2 Text Mining and Related Fields.

1.3 Advice for Reading this Book.

2. Text Patterns.

2.1 Introduction.

2.2 Regular Expressions.

2.3 Finding Words in a Text.

2.4 Decomposing Poe's "The Tell-Tale Heart" into Words.

2.5 A Simple Concordance.

2.6 First Attempt at Extracting Sentences.

2.7 Regex Odds and Ends.

2.8 References.

3. Quantitative Text Summaries.

3.1 Introduction.

3.2 Scalars, Interpolation, and Context in Perl.

3.3 Arrays and Context in Perl.

3.4 Word Lengths in Poe's "The Tell-Tale Heart".

3.5 Arrays and Functions.

3.6 Hashes.

3.7 Two Text Applications.

3.8 Complex Data Structures.

3.9 References.

3.10 First Transition.

4. Probability and Text Sampling.

4.1 Introduction.

4.2 Probability.

4.3 Conditioned Probability.

4.4 Mean and Variance of random Variables.

4.5 The Bag-of-Words Model for Poe's :The Black Cat".

4.6 The Effect of Sample Size.

4.7 References.

5. Applying Information Retrieval to Text Mining.

5.1 Introduction.

5.2 Counting Letters and Words.

5.3 Text Counts and Vectors.

5.4 The Term-Document Matrix Applied to Poe.

5.5 Matrix Multiplication.

5.6 Functions of Counts.

5.7 Document Similarity.

5.8 References.

6. Concordance Lines and Corpus Linguistics.

6.1 Introduction.

6.2 Sampling.

6.3 Corpus as Baseline.

6.4 Concordancing.

6.5 Collocations and Concordance Lines.

6.6 Applications with References.

6.7 Second Transition.

7. Multivariate Techniques with Text.

7.1 Introduction.

7.2 Basic Statistics.

7.3 Basic Linear Algebra.

7.4 Principal Component Matrices.

7.5 Text Applications.

7.6 Applications and References.

8. Text Clustering.

8.1 Introduction.

8.2 Clustering.

8.3 A Note on Classification.

8.4 References.

8.5 Last Transition.

9. A Sample of Additional Topics.

9.1 Introduction.

9.2 Perl Modules.

9.3 Other Languages: Analyzing Goethe in German.

9.4 Permutation Tests.

9.5 References.

Appendix A. Overview of Perl for Text Mining.

A.1 Basic Data Structures.

A.2 Operators.

A.3 Branching and Looping.

A.4 A Few Functions.

A.5 Introduction to Regular Expressions.

Appendix B. Summary of R used in this Book

B.1 Basics of R.

B.2 This Book's R Code..

References.

Index.

"Practical Text Mining with Perl is an excellent book for readers at a variety of different programming skill levels ... Bilisoly's book would serve as a good text for an introductory text mining course, and could be supplemented with lecture notes for Web mining or data mining courses." (Journal of Statistical Software, January 2009)

Erscheint lt. Verlag 26.9.2008
Reihe/Serie Wiley Series on Methods and Applications
Wiley Series on Methods and Applications
Sprache englisch
Themenwelt Informatik Datenbanken Data Warehouse / Data Mining
Mathematik / Informatik Informatik Netzwerke
Schlagworte Bioinformatics & Computational Biology • Bioinformatik u. Computersimulationen in der Biowissenschaften • Biowissenschaften • Computer Science • Database & Data Warehousing Technologies • Data Mining • Data Mining Statistics • Datenbanken u. Data Warehousing • Informatik • Life Sciences • Perl (EDV) • Statistics • Statistik
ISBN-13 9780470382851 / 9780470382851
Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?
PDFPDF (Adobe DRM)

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seiten­layout eignet sich die PDF besonders für Fach­bücher mit Spalten, Tabellen und Abbild­ungen. Eine PDF kann auf fast allen Geräten ange­zeigt werden, ist aber für kleine Displays (Smart­phone, eReader) nur einge­schränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich
Discover advanced techniques and best practices for efficient search …

von Prashant Agrawal; Jon Handler; Soujanya Konka

eBook Download (2025)
Packt Publishing (Verlag)
CHF 29,30
The definitive guide to creating production-ready Python applications …

von Eric Narro

eBook Download (2025)
Packt Publishing (Verlag)
CHF 29,30