Data Mining and Machine Learning Applications (eBook)
John Wiley & Sons (Verlag)
978-1-119-79250-5 (ISBN)
The book elaborates in detail on the current needs of data mining and machine learning and promotes mutual understanding among research in different disciplines, thus facilitating research development and collaboration.
Data, the latest currency of today's world, is the new gold. In this new form of gold, the most beautiful jewels are data analytics and machine learning. Data mining and machine learning are considered interdisciplinary fields. Data mining is a subset of data analytics and machine learning involves the use of algorithms that automatically improve through experience based on data.
Massive datasets can be classified and clustered to obtain accurate results. The most common technologies used include classification and clustering methods. Accuracy and error rates are calculated for regression and classification and clustering to find actual results through algorithms like support vector machines and neural networks with forward and backward propagation. Applications include fraud detection, image processing, medical diagnosis, weather prediction, e-commerce and so forth.
The book features:
- A review of the state-of-the-art in data mining and machine learning,
- A review and description of the learning methods in human-computer interaction,
- Implementation strategies and future research directions used to meet the design and application requirements of several modern and real-time applications for a long time,
- The scope and implementation of a majority of data mining and machine learning strategies.
- A discussion of real-time problems.
Audience
Industry and academic researchers, scientists, and engineers in information technology, data science and machine and deep learning, as well as artificial intelligence more broadly.
Rohit Raja, PhD is an associate professor in the IT Department, Guru Ghasidas Vishwavidyalaya, Bilaspur (CG), India. He has published more than 80 research papers in peer-reviewed journals as well as 9 patents.
Kapil Kumar Nagwanshi, PhD is an associate professor at Mukesh Patel School of Technology Management & Engineering, Shirpur Campus, SVKM's Narsee Monjee Institute of Management Studies Mumbai, India.
Sandeep Kumar, PhD is a professor in the Department of Electronics & Communication Engineering, Sreyas Institute of Engineering & Technology, Hyderabad, India. His area of research includes embedded systems, image processing, and biometrics. He has published more than 60 research papers in peer-reviewed journals as well as 6 patents.
K. Ramya Laxmi, PhD is an associate professor in the CSE Department at the Sreyas Institute of Engineering and Technology, Hyderabad. Her research interest covers the fields of data mining and image processing.
DATA MINING AND MACHINE LEARNING APPLICATIONS The book elaborates in detail on the current needs of data mining and machine learning and promotes mutual understanding among research in different disciplines, thus facilitating research development and collaboration. Data, the latest currency of today s world, is the new gold. In this new form of gold, the most beautiful jewels are data analytics and machine learning. Data mining and machine learning are considered interdisciplinary fields. Data mining is a subset of data analytics and machine learning involves the use of algorithms that automatically improve through experience based on data. Massive datasets can be classified and clustered to obtain accurate results. The most common technologies used include classification and clustering methods. Accuracy and error rates are calculated for regression and classification and clustering to find actual results through algorithms like support vector machines and neural networks with forward and backward propagation. Applications include fraud detection, image processing, medical diagnosis, weather prediction, e-commerce and so forth. The book features: A review of the state-of-the-art in data mining and machine learning, A review and description of the learning methods in human-computer interaction, Implementation strategies and future research directions used to meet the design and application requirements of several modern and real-time applications for a long time, The scope and implementation of a majority of data mining and machine learning strategies. A discussion of real-time problems. Audience Industry and academic researchers, scientists, and engineers in information technology, data science and machine and deep learning, as well as artificial intelligence more broadly.
Rohit Raja, PhD is an associate professor in the IT Department, Guru Ghasidas Vishwavidyalaya, Bilaspur (CG), India. He has published more than 80 research papers in peer-reviewed journals as well as 9 patents. Kapil Kumar Nagwanshi, PhD is an associate professor at Mukesh Patel School of Technology Management & Engineering, Shirpur Campus, SVKM's Narsee Monjee Institute of Management Studies Mumbai, India. Sandeep Kumar, PhD is a professor in the Department of Electronics & Communication Engineering, Sreyas Institute of Engineering & Technology, Hyderabad, India. His area of research includes embedded systems, image processing, and biometrics. He has published more than 60 research papers in peer-reviewed journals as well as 6 patents. K. Ramya Laxmi, PhD is an associate professor in the CSE Department at the Sreyas Institute of Engineering and Technology, Hyderabad. Her research interest covers the fields of data mining and image processing.
1
Introduction to Data Mining
Santosh R. Durugkar1, Rohit Raja2, Kapil Kumar Nagwanshi3* and Sandeep Kumar4
1Amity University Rajasthan, Jaipur, India
2IT Department, GGV Bilaspur Central University, Bilaspur, India
3ASET, Amity University Rajasthan, Jaipur, India
4Computer Science and Engineering Department, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Andra Pradesh, India
Abstract
Data mining, as its name suggests “mining”, is nothing but extracting the desired, meaningful exact information from the datasets. Its methods and algorithms help researchers and students develop the numerous applications to be used by the end-users. Its presence in the healthcare industry, marketing, scientific applications, etc., enables the end-users to extract the meaningful required information from the collection. In the initial section, we discuss KDD—knowledge discovery in the database with its different phases like data cleaning, data integration, data selection and transformation, representation. In this chapter, we give a brief introduction to data mining. Comparative discussion about classification and clustering helps the end-user to distinguish these techniques. We also discuss its applications, algorithms, etc. An introduction to a basic clustering algorithm, K-means clustering, hierarchical clustering, fuzzy clustering, and density-based clustering, will help the end-user to select a specific algorithm as per the application. In the last section of this chapter, we introduce various data mining tools like Python, Rapid Miner, and KNIME, etc., to the user to extract the required information.
Keywords: Data mining, KDD, clustering, classification, Python, KNIME
1.1 Introduction
1.1.1. Data Mining
‘Mining’—extracts the meaningful information from the databases. This method helps the researchers, students, and other IT professionals remove the exact significant details and develop the desired applications [1, 2]. It is also known as Knowledge Discovery from databases—KDD. The applications of KDD may include medical/hospitals, Marketing, Educational systems, Scientific applications, E-commerce, Retail industries, Biological analysis, Counterterrorism, use in data-warehouse, in the energy sector for decision making, Spatial data mining, and Logistics [4–6].
1.2 Knowledge Discovery in Database (KDD)
It helps detect the new patterns of previously unknown data, i.e., extracting the hidden patterns, data from the massive volume of datasets [3, 6]. Figure 1.1 gives an idea about Knowledge discovery in Database—KDD, which consists of the following phases:
- Data cleaning: This step can be defined as removing irrelevant data. Removing irrelevant data is nothing but unwanted data; records can be removed. Data collection may consist of missing values which must be either needs to be removed or should impute the missing information [7].
Figure 1.1 Knowledge discovery in Database—KDD.
- Data integration: Data is collected from heterogeneous sources and integrated into a common source like data-warehouse (DW). A very common technique, Extract-Transform-Load (ETL), is beneficial in this regard. Integrating the data from multiple sources requires proper synchronization between the systems [2].
- Data selection & transformation: Once the required data is selected, the next task is data transformation. As its name suggests transformation, it is nothing but transforming it into the desired mining procedure [8, 9].
- Pattern evaluation: Evaluation is based on some measures; once these measures are applied, retrieved results are strictly compared/evaluated based on the stored patterns [9–11].
- Knowledge representation: It is nothing but representing the processed data into the required formats such as tables and reports. One can say knowledge representation generates the rules, and using the exact visualization is possible [10].
1.2.1 Importance of Data Mining
- ◦ Useful in predictive analysis.
- ◦ They are storing and managing data in multidimensional systems.
- ◦ They are identifying the hidden patterns.
- ◦ Knowledge representation in desired formats, etc. [11].
1.2.2 Applications of Data Mining
- Fraud Detection
- ◦ Data mining identifies patterns, i.e., user-specific patterns, and builds a model based on valid and invalid states. Using data mining techniques, one can classify records based on fraudulent and non-fraudulent patterns [14].
- Marketing Analysis
- ◦ It is based on Association mining, i.e., identifying user’s preferences. With such techniques, one can identify purchasing habits of the users. Using this technique, one can compare different items, pricing of the items, etc. [13].
- Customer Relationship Management
- ◦ Every organization is keenly observing and maintains this segment which is popularly known as CRM. In this segment, one can distinguish users/customers based on loyalty towards the organization. User’s/Customer’s data can be collected and analyzed to get desired results [13].
- Banking and Finance
- ◦ The banking and finance sector holds huge data related to clients. Banking and financial software systems help different managers to identify the correct client segment, loyal clients. These software systems process ‘n’ transactions which a person cannot handle manually. Such soft-ware systems stores process a large volume of data and produce desired results less time [13].
- Healthcare Industries
- ◦ Everyone concerns about health. Different parameters and values help the health care professionals to diagnose the disease. The number of patients, diseases and symptoms can be processed to get an accurate prediction. Software systems used in the health care industry process a large chunk of observed values and compare them with the stored patterns to draw an accurate conclusion [13].
- Educational Purpose
- ◦ Using data mining, one can identify the student’s interests in different fields. It also helps in improving teaching methodology with new trends [13].
- Crime Investigation
- ◦ Data mining helps in identifying different patterns applied in other crimes. Crimes, criminals, and their crime characteristics are analyzed under this category. A large volume of (stored data) can be processed to identify different relationships with criminals. In this category, face recognition, fingerprint recognition, etc., are considered and used in the investigation [14].
1.2.3 Databases
It is a collection of records. With databases and their structures, records may vary with the applications. Here are the following types of databases that can be used in many applications [15].
- Transactional Database: It is a popular type of database that consists of rows and columns, i.e., known as transactions. The transaction has the following parameters.
- Transaction id
- Timestamp
- List of items
- Item description
- The transaction id is a unique identifier generated by the system. Transactional databases are mostly related to financial matters such as banking transactions, booking a movie ticket, booking a flight, etc. [16].
- Multimedia Database: The data integration phase from the KDD process integrates data from multiple sources, and that data could be in the form of text, document, video, image, audio, etc. Storing these different data types (multimedia data) requires high dimensional space, which is a characteristic of a multimedia database [17]. Its examples are
- Video-on-demand
- Digital libraries
- Animations
- Images.
- Spatial Database: Similar to multimedia and transactional database, there is a spatial database which can store geographical information. This information maps, positioning of the object, etc. Geographic coordinates are handy in determining the topographic data [17].
Figure 1.2 Time series database.
- Time-Series Database: As its name suggests time-series database—holds information related to a specific item w.r.t. time. E.g., weekly, monthly, yearly, etc. Such patterns help predict the trends and movements of an item in a particular time zone and are represented in Figure 1.2.
1.3 Issues in Data Mining
Data mining consists of tasks like user interfacing, mining, security, performance, and data source. The following is a discussion on various tradeoffs of data mining [3–5, 14].
- ◦ User interface design
- As discussed in the KDD process where discovered knowledge needs to be represented using good, accurate visualization. The user interface design issue addresses the interaction required within users and the systems, information rendering. This issue requires analysts, programmers to work on different conceptual levels.
- ◦...
| Erscheint lt. Verlag | 26.1.2022 |
|---|---|
| Sprache | englisch |
| Themenwelt | Informatik ► Datenbanken ► Data Warehouse / Data Mining |
| Mathematik / Informatik ► Informatik ► Netzwerke | |
| Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik | |
| Technik ► Elektrotechnik / Energietechnik | |
| Schlagworte | biomedical engineering • Biomedizintechnik • Computer Science • Database & Data Warehousing Technologies • Data Mining • Datenbanken • Datenbanken u. Data Warehousing • Electrical & Electronics Engineering • Elektrotechnik u. Elektronik • Informatik • Intelligente Systeme u. Agenten • Intelligent Systems & Agents • Maschinelles Lernen • Medical Informatics & Biomedical Information Technology • Medizininformatik u. biomedizinische Informationstechnologie |
| ISBN-10 | 1-119-79250-9 / 1119792509 |
| ISBN-13 | 978-1-119-79250-5 / 9781119792505 |
| Informationen gemäß Produktsicherheitsverordnung (GPSR) | |
| Haben Sie eine Frage zum Produkt? |
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich