Blick ins Buch

Supervised and Unsupervised Learning for Data Science (eBook)

Michael W. Berry, Azlinah Mohamed, Bee Wah Yap (Herausgeber)

eBook Download: PDF

2019 | 1. Auflage
VIII, 191 Seiten
Springer-Verlag
978-3-030-22475-2 (ISBN)

Lese- und Medienproben

Ebook-Leseprobe (PDF)

This book covers the state of the art in learning algorithms with an inclusion of semi-supervised methods to provide a broad scope of clustering and classification solutions for big data applications. Case studies and best practices are included along with theoretical models of learning for a comprehensive reference to the field. The book is organized into eight chapters that cover the following topics: discretization, feature extraction and selection, classification, clustering, topic modeling, graph analysis and applications. Practitioners and graduate students can use the volume as an important reference for their current and future research and faculty will find the volume useful for assignments in presenting current approaches to unsupervised and semi-supervised learning in graduate-level seminar courses. The book is based on selected, expanded papers from the Fourth International Conference on Soft Computing in Data Science (2018).

Includes new advances in clustering and classification using semi-supervised and unsupervised learning;
Address new challenges arising in feature extraction and selection using semi-supervised and unsupervised learning;
Features applications from healthcare, engineering, and text/social media mining that exploit techniques from semi-supervised and unsupervised learning.

Professor Michael W. Berry is a Full Professor in the Departments of Electrical Engineering and Computer Science (EECS) and Mathematics at the University of Tennessee, Knoxville. He served as Interim Department Head of Computer Science from January 2004 to June 2007, and as Associate Head in the Department of Electrical Engineering and Computer Science from July 2007 to July 2012. He worked in the Communications Product Division of IBM in Raleigh, NC for about 1 year before accepting a research staff position in the Center for Supercomputing Research and Development at the University of Illinois at Urbana-Champaign. In 1990, he received a PhD in Computer Science from the University of Illinois at Urbana-Champaign. Prof. Berry is the co-author of 'Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods' (SIAM, 1994) and 'Understanding Search Engines: Mathematical Modeling and Text Retrieval, Second Edition' (Bestseller, SIAM, 2005) and editor of 'Computational Information Retrieval' (SIAM, 2001), 'Survey of Text Mining: Clustering, Classification, and Retrieval' (Springer-Verlag, 2003, 2007), 'Lecture Notes in Data Mining' (Bestseller, World Scientific, 2006), 'Text Mining: Applications and Theory' (Wiley, 2010), and 'High-Performance Scientific Computing' (Springer, 2012). He has published well over 150 peer-refereed journal and conference publications and book chapters. He has organized numerous workshops on Text Mining and was Conference Co-Chair of the 2003 SIAM Third International Conference on Data Mining (May 1-3) in San Francisco, CA. He was Program Co-Chair of the 2004 SIAM Fourth International Conference on Data Mining (April 22-24) in Orlando, FL., and he was a keynote speaker at the 2015 International Conference on Soft Computing in Data Science (SCDS2015). He was also honorary chair of the 2016 International Conference on Soft Computing in Data Science (SCDS2016) in Kuala Lumpur, Malaysia. His research interests include information retrieval, data and text mining, computational science, bioinformatics, and parallel computing. Prof. Berry's research has been supported by grants and contracts from organizations such as the National Science Foundation, National Institutes of Health, the U.S. Department of Energy, the the National Aeronautics and Space Administration, and the Intel Corporation.

Professor Dr Azlinah Mohamed is a Professor at the Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Malaysia. She currently serves as the Dean of the faculty; she was previously the Special Officer to the Vice Chancellor and Head of the Academic Affairs and Development Unit of Universiti Teknologi MARA. She received her MSc (Artificial Intelligence) from University of Bristol, UK and PhD (Decision Support Systems) from Universiti Kebangsaan Malaysia. Her recent research activities and numerous professional publications in international conferences and local journals focus on her interests in the Artificial Intelligence, Decision Support Systems and Soft Computing. She has published well over 180 peer-refereed journal and conference publications and book chapters. She was the Honorary Chair of the 2015, 2016 and 2017 International Conference on Soft Computing in Data Science, and she was a keynote speaker at the 2016 International Conference on Soft Computing in Data Science (SCDS2016). She was also awarded with many competitive grants from ScienceFund, MOSTI and others on both academic and industrial projects for the industry, as well as for the government. Her research works includes the Information Professionals' Competency Assessment Model and the Multi-Parametric Pectin Lyase-Like Protein Function Classifier which had won many awards. She is also an active member of the Malaysia Information Technology Society (MITS), Lembaga Akredetasi Negara, Malaysia and Artificial Intelligence Society.

Professor Bee Wah Yap is a Professor at the Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Malaysia. She is the Head of Advanced Analytics Engineering Centre (AAEC), a Centre of Excellence in FSKM. She received her Bachelor of Science (Education)(Hons) degree, majoring in Mathematics from University of Science Malaysia, Master of Statistics from University of California Riverside and PhD (Statistics) from University of Malaya. Her research interests are in data mining, computational statistics and multivariate data analysis. She actively organizes SCDS2015, SCDS2016 and SCDS2017 conference which focus on Soft Computing in Data Science. She also actively conduct statistical workshops (IBM SPSS STATISTICS, IBM SPSS AMOS, PLS-SEM, SAS EMINER). She has published papers in ISI journals such as Expert Systems with Applications, Journal of Statistical Computation and Simulation, Communication in Statistics-Simulation and Computation, and also in Scopus indexed journals. She is also an active reviewer for international journals such as International Journal of Bank Marketing and Communication in Statistics-Simulation and Computation and Neurocomputing.

Preface 6
Contents 8
Part I Algorithms 10
1 A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science 11
1.1 Introduction 11
1.1.1 Motivation and Scope 13
1.1.2 Novelty and Review Approach 13
1.2 Search Results 14
1.2.1 EBSCO and ProQuest Central Database Results 14
1.2.2 Distribution of Included Articles 17
1.3 Discussion 17
1.3.1 Decision Tree 18
1.3.2 Naïve Bayes 19
1.3.3 Support Vector Machine 21
1.3.4 k-Means Algorithms 22
1.3.5 Semisupervised and Other Learners 23
1.4 Conclusion and Future Work 23
References 24
2 Overview of One-Pass and Discard-After-Learn Conceptsfor Classification and Clustering in Streaming Environmentwith Constraints 30
2.1 Introduction 30
2.2 Constraints and Conditions 31
2.3 Concept of One-Pass and Discard-After-Learn for Classification and Clustering 32
2.4 Structure of Malleable Hyper-ellipsoid Function 35
2.5 Updating Malleable Hyper-ellipsoid Function 36
2.5.1 Recursively Updating Center 36
2.5.2 Recursively Updating Covariance Matrix 36
2.5.3 Merging Two Covariance Matrices 37
2.6 Analysis of Time and Space Complexities of Updating Computation 38
2.7 Applying Discard-After-Learn to Arbitrary Class Drift 38
2.8 Applying Discard-After-Learn to Expired Data in Clustering 41
2.9 Discussion 42
2.10 Conclusion 42
References 43
3 Distributed Single-Source Shortest Path Algorithms with Two-Dimensional Graph Layout 45
3.1 Introduction 45
3.2 Overviews 46
3.2.1 Single-Source Shortest Path Algorithms 46
3.2.2 Two-Dimensional Graph Layout 49
3.3 Novel Parallel SSSP Implementations 51
3.3.1 General Parallel SSSP for Distributed Memory Systems 51
3.3.2 Parallel SSSP with 2D Graph Layout 51
3.3.3 Other Optimizations 54
3.3.4 Summary of Implementations 55
3.4 Performance Results and Analysis 56
3.4.1 Experimental Setup 56
3.4.2 Algorithm and Communication Cost Analysis 57
3.4.3 Benefits of 2D SSSP Algorithms 58
3.4.4 Communication Cost Analysis 59
3.5 Conclusion and Future Work 59
References 63
4 Using Non-negative Tensor Decomposition for Unsupervised Textual Influence Modeling 65
4.1 Introduction 65
4.2 Modeling Influence 66
4.2.1 Tensors and Decompositions 67
4.2.2 Representing Documents as Tensors 71
4.2.3 Modeling Influence 71
4.2.4 Summary of Influence Modeling Procedure 73
4.3 Related Work 73
4.4 Influence Model 74
4.4.1 Approach Overview and Document Preparation 75
4.4.2 Tensor Construction 75
4.4.3 Tensor Decomposition 77
4.4.4 Factor Classification 79
4.5 Implementation 82
4.5.1 Constraining Vocabularies 82
4.6 A Conference Paper Case Study 83
4.7 Conclusions and Future Work 86
References 87
Part II Applications 89
5 Survival Support Vector Machines: A Simulation Study and Its Health-Related Application 90
5.1 Introduction 90
5.2 SURLS-SVM for Survival Analysis 91
5.3 Data Description and Methodology 93
5.4 Empirical Results 94
5.4.1 Effect of Features Dimension and Sample Size 94
5.4.2 Effect of Censoring Percentage 97
5.4.3 Effect of Sample Size 98
5.4.4 Discussion of the Results of the Simulation 101
5.4.5 Application to Health Data 103
5.5 Conclusion 104
References 104
6 Semantic Unsupervised Learning for Word Sense Disambiguation 106
6.1 Introduction 106
6.1.1 Word Sense Disambiguation 106
6.1.2 History and Approaches 107
6.2 Latent Semantic Analysis 108
6.3 LSA-WSD Approach 109
6.3.1 Sense Discovery 110
6.3.2 Sense Identification 110
6.3.3 Semantic Mean Clustering 111
6.4 Sense Discovery Using Synclustering 113
6.4.1 Experimentation Parameters 113
6.4.2 Observations and Results 114
6.5 Sense Identification Using the Context Comparison Method 118
6.5.1 Experimentation Parameters 119
6.5.2 Observations and Results 120
6.6 Conclusion and Future Research 123
References 123
7 Enhanced Tweet Hybrid Recommender System Using Unsupervised Topic Modeling and Matrix Factorization-Based Neural Network 126
7.1 Introduction 126
7.2 Related Works 128
7.2.1 Recommender System 128
7.2.2 Twitter 130
User Interest Prediction in Microblog Using the Recommendation Method 130
Collaborative Personalized Tweet Recommendation 131
7.2.3 Latent Dirichlet Allocation 131
7.2.4 Recommender System with LDA 133
Content-Based Filtering with LDA 133
Collaborative Filtering with LDA 134
7.2.5 Generalized Matrix Factorization 136
Matrix Factorization 136
Neural Network 137
7.3 The Proposed Method 138
7.3.1 Data Preparation 138
7.3.2 Content-Based Filtering Part 139
7.3.3 Collaborative Filtering Part 140
7.3.4 Prediction Step 141
7.4 Experimental Results 142
7.4.1 Dataset 142
7.4.2 Evaluation Metrics 143
7.4.3 Experimental Results 143
7.5 Discussion 144
7.5.1 Comparison Between the Proposed Method and User Interest Prediction in Microblog Using the Recommendation Method (CBF with LDA) 145
7.5.2 Comparison Between the Proposed Method and the Improved Collaborative Filtering Algorithm Using the Topic Model (CF with LDA) 146
7.6 Conclusion 147
References 147
8 New Applications of a Supervised Computational Intelligence (CI) Approach: Case Study in Civil Engineering 149
8.1 Introduction 149
8.2 Prediction of Hyperbolic Nonlinear Soil Stress–Strain Parameters (log k and Rf) by a Supervised Artificial Neural Network (ANN) 151
8.2.1 Development of ANN Models 151
8.2.2 Model Inputs and Outputs 153
8.2.3 Preprocessing and Data Division 154
8.2.4 Scaling of Data 156
8.2.5 Model Architecture, Optimization, and Stopping Criteria 158
8.2.6 Parametric Study 166
8.2.7 Sensitivity Analysis of the ANN Model Inputs 168
8.3 ANN Model Equations 171
8.3.1 ANN Model Equation for log k 171
8.3.2 ANN Model Equation for Rf 173
8.4 Validity of the ANN Models Equation 175
8.5 Comparison Between Measured and Predicted Stress–Strain Relationship 175
8.6 Concluding Remarks 176
B.1 Appendix 2 183
References 185
Index 187

Erscheint lt. Verlag	4.9.2019
Reihe/Serie	Unsupervised and Semi-Supervised Learning
Zusatzinfo	VIII, 187 p. 55 illus., 45 illus. in color.
Sprache	englisch
Themenwelt	Mathematik / Informatik ► Informatik ► Datenbanken
	Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik
	Technik ► Elektrotechnik / Energietechnik
Schlagworte	discretization • Feature Extraction and Selection • Learning algorithm design • Learning Detection • Semi-supervised and unsupervised learning
ISBN-10	3-030-22475-9 / 3030224759
ISBN-13	978-3-030-22475-2 / 9783030224752

Haben Sie eine Frage zum Produkt?

PDF (Wasserzeichen)
Größe: 4,3 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Print-Ausgabe

Buch | Hardcover

CHF 149,75