Zum Hauptinhalt springen
Nicht aus der Schweiz? Besuchen Sie lehmanns.de
Computational Methods for Corpus Annotation and Analysis - Xiaofei Lu

Computational Methods for Corpus Annotation and Analysis (eBook)

(Autor)

eBook Download: PDF
2014 | 2014
XI, 186 Seiten
Springer Netherland (Verlag)
978-94-017-8645-4 (ISBN)
Systemvoraussetzungen
106,99 inkl. MwSt
(CHF 104,50)
Der eBook-Verkauf erfolgt durch die Lehmanns Media GmbH (Berlin) zum Preis in Euro inkl. MwSt.
  • Download sofort lieferbar
  • Zahlungsarten anzeigen
In the past few decades the use of increasingly large text corpora has grown rapidly in language and linguistics research. This was enabled by remarkable strides in natural language processing (NLP) technology, technology that enables computers to automatically and efficiently process, annotate and analyze large amounts of spoken and written text in linguistically and/or pragmatically meaningful ways. It has become more desirable than ever before for language and linguistics researchers who use corpora in their research to gain an adequate understanding of the relevant NLP technology to take full advantage of its capabilities.
This volume provides language and linguistics researchers with an accessible introduction to the state-of-the-art NLP technology that facilitates automatic annotation and analysis of large text corpora at both shallow and deep linguistic levels. The book covers a wide range of computational tools for lexical, syntactic, semantic, pragmatic and discourse analysis, together with detailed instructions on how to obtain, install and use each tool in different operating systems and platforms. The book illustrates how NLP technology has been applied in recent corpus-based language studies and suggests effective ways to better integrate such technology in future corpus linguistics research.
This book provides language and linguistics researchers with a valuable reference for corpus annotation and analysis.
In the past few decades the use of increasingly large text corpora has grown rapidly in language and linguistics research. This was enabled by remarkable strides in natural language processing (NLP) technology, technology that enables computers to automatically and efficiently process, annotate and analyze large amounts of spoken and written text in linguistically and/or pragmatically meaningful ways. It has become more desirable than ever before for language and linguistics researchers who use corpora in their research to gain an adequate understanding of the relevant NLP technology to take full advantage of its capabilities.This volume provides language and linguistics researchers with an accessible introduction to the state-of-the-art NLP technology that facilitates automatic annotation and analysis of large text corpora at both shallow and deep linguistic levels. The book covers a wide range of computational tools for lexical, syntactic, semantic, pragmatic and discourse analysis, together with detailed instructions on how to obtain, install and use each tool in different operating systems and platforms. The book illustrates how NLP technology has been applied in recent corpus-based language studies and suggests effective ways to better integrate such technology in future corpus linguistics research.This book provides language and linguistics researchers with a valuable reference for corpus annotation and analysis.

Preface.- Chapter 1   Introduction. 1.1 Objectives and Rationale of the Book. 1.2 Why Do We Need to Go Beyond Raw Corpora. 1.3 What is Corpus Annotation. 1.4 Organization of the Book.- Chapter 2 Text Processing with the Command Line Interface. 2.1 The Command Line Interface. 2.2 Basic Commands. 2.2.1 Notational Conventions. 2.2.2 Printing the Current Working Directory. 2.2.3 Listing Files and Subdirectories. 2.2.4 Making New Directories. 2.2.5 Changing Directory Locations. 2.2.6            Creating and Editing Text Files with UTF-8 Encoding. 2.2.7 Viewing, Renaming, Moving, Copying, and Removing Files. 2.2.8 Copying, Moving, and Removing Directories. 2.2.9 Using Shell Meta-Characters for File Matching. 2.2.10 Manual Pages, Command History, and Command Line Completion. 2.3 Tools for Text Processing. 2.3.1 Searching for a String with egrep. 2.3.2 Regular Expressions. 2.3.3 Character Translation with tr. 2.3.4 Editing Files from the Command Line with sed. 2.3.5 Data Filtering and Manipulation Using awk. 2.3.6 Task Decomposition and Pipes. 2.4 Summary.- Chapter 3 Lexical Annotation. 3.1 Part-of-Speech Tagging.  3.1.1 What is Part-of-Speech Tagging. 3.1.2 Understanding Part-of-Speech Tagsets. 3.1.3 The Stanford Part-of-Speech Tagger. 3.2 Lemmatization. 3.2.1 What is Lemmatization and Why is it Useful. 3.2.2 The TreeTagger. 3.3 Additional Tools. 3.3.1 The Stanford Tokenizer. 3.3.2 The Stanford Word Segmenter for Arabic and Chinese. 3.3.3 The CLAWS Tagger for English. 3.3.4 The Morpha Lemmatizer for English. 3.4 Summary.- Chapter 4 Lexical Analysis. 4.1 Frequency Lists. 4.1.1 Working with Output Files from the TreeTagger. 4.1.2 Working with Output Files from the Stanford POS Tagger and Morpha. 4.1.3   Analyzing Frequency Lists with Text Processing Tools. 4.2 N-grams. 4.3 Lexical Richness. 4.3.1 Lexical Density. 4.3.2 Lexical Variation. 4.3.3 Lexical Sophistication. 4.3.4 Tools for Lexical Richness Analysis. 4.4 Summary.- Chapter 5 Syntactic Annotation. 5.1 Syntactic Parsing Overview. 5.1.1 What is Syntactic Parsing and Why is it Useful. 5.1.2 Phrase Structure Grammars. 5.1.3 Dependency Grammars. 5.2 Syntactic Parsers. 5.2.1 The Stanford Parser. 5.2.2 Collins’ Parser. 5.3 Summary.- Chapter 6 Syntactic Analysis. 6.1 Querying Syntactically Parsed Corpora. 6.1.1 Tree Relationships. 6.1.2 Tregex. 6.2 Syntactic Complexity Analysis.      6.2.1 Measures of Syntactic Complexity. 6.2.2 Syntactic Complexity Analyzers. 6.3 Summary.- Chapter 7 Semantic, Pragmatic and Discourse Analysis.- 7.1 Semantic Field Analysis. 7.1.1 The UCREL Semantic Analysis System. 7.1.2 Profile in Semantics-Lexical in Computerized Profiling. 7.2 Analysis of Propositions. 7.2.1 Computerized Propositional Idea Density Rater. 7.2.2 Analysis of Propositions in Computerized Profiling. 7.3 Conversational Act Analysis in Computerized Profiling. 7.4   Coherence and Cohesion Analysis in Coh-Metrix. 7.5 Text Structure Analysis. 7.6 Summary.- Chapter 8 Summary and Outlook. 8.1 Summary of the Book. 8.2 Future Directions in Computational Corpus Analysis. 8.2.1 Computational Analysis of Language Meaning and Use. 8.2.2 Computational Analysis of Learner Language. 8.2.3 Computational Analysis Based on Specific Language Theories.- Appendix

Erscheint lt. Verlag 8.7.2014
Zusatzinfo XI, 186 p. 22 illus.
Verlagsort Dordrecht
Sprache englisch
Themenwelt Schulbuch / Wörterbuch Wörterbuch / Fremdsprachen
Geisteswissenschaften Sprach- / Literaturwissenschaft Sprachwissenschaft
Informatik Theorie / Studium Künstliche Intelligenz / Robotik
Schlagworte Analysis of large text corpora • Collin's parser • Corpus annotation • Creating, editing text files with UTF-8 encoding • Lexical annotation • Natural language processing NLP • Phrase structure grammars • Task decompositon and pipes • Text processing with the command line interface • The UCREL semantic analysis system
ISBN-10 94-017-8645-3 / 9401786453
ISBN-13 978-94-017-8645-4 / 9789401786454
Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?
PDFPDF (Wasserzeichen)

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasser­zeichen und ist damit für Sie persona­lisiert. Bei einer missbräuch­lichen Weiter­gabe des eBooks an Dritte ist eine Rück­ver­folgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seiten­layout eignet sich die PDF besonders für Fach­bücher mit Spalten, Tabellen und Abbild­ungen. Eine PDF kann auf fast allen Geräten ange­zeigt werden, ist aber für kleine Displays (Smart­phone, eReader) nur einge­schränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich
Die Grundlage der Digitalisierung

von Knut Hildebrand; Michael Mielke; Marcus Gebauer

eBook Download (2025)
Springer Fachmedien Wiesbaden (Verlag)
CHF 29,30
Mit Herz, Kopf & Bot zu deinem Skillset der Zukunft

von Jenny Köppe; Michel Braun

eBook Download (2025)
Lehmanns Media (Verlag)
CHF 16,60