Fundamentals of Big Data Network Analysis for Research and Industry - Hyunjoung Lee, Il Sohn

Blick ins Buch

Fundamentals of Big Data Network Analysis for Research and Industry (eBook)

Hyunjoung Lee, Il Sohn (Autoren)

eBook Download: EPUB

2015
John Wiley & Sons (Verlag)
978-1-119-01549-9 (ISBN)

Lese- und Medienproben

Ebook-Leseprobe (EPUB)

Fundamentals of Big Data Network Analysis for Research and Industry

Hyunjoung Lee, Institute of Green Technology, Yonsei University, Republic of Korea

Il Sohn, Material Science and Engineering, Yonsei University, Republic of Korea

Presents the methodology of big data analysis using examples from research and industry

There are large amounts of data everywhere, and the ability to pick out crucial information is increasingly important. Contrary to popular belief, not all information is useful; big data network analysis assumes that data is not only large, but also meaningful, and this book focuses on the fundamental techniques required to extract essential information from vast datasets.

Featuring case studies drawn largely from the iron and steel industries, this book offers practical guidance which will enable readers to easily understand big data network analysis. Particular attention is paid to the methodology of network analysis, offering information on the method of data collection, on research design and analysis, and on the interpretation of results. A variety of programs including UCINET, NetMiner, R, NodeXL, and Gephi for network analysis are covered in detail.

Fundamentals of Big Data Network Analysis for Research and Industry looks at big data from a fresh perspective, and provides a new approach to data analysis.

This book:

Explains the basic concepts in understanding big data and filtering meaningful data
Presents big data analysis within the networking perspective
Features methodology applicable to research and industry
Describes in detail the social relationship between big data and its implications
Provides insight into identifying patterns and relationships between seemingly unrelated big data

Fundamentals of Big Data Network Analysis for Research and Industry will prove a valuable resource for analysts, research engineers, industrial engineers, marketing professionals, and any individuals dealing with accumulated large data whose interest is to analyze and identify potential relationships among data sets.

Hyunjoung Lee, Institute of Green Technology, Yonsei University, Republic of Korea
Il Sohn, Material Science and Engineering, Yonsei University, Republic of Korea

Presents the methodology of big data analysis using examples from research and industry There are large amounts of data everywhere, and the ability to pick out crucial information is increasingly important. Contrary to popular belief, not all information is useful; big data network analysis assumes that data is not only large, but also meaningful, and this book focuses on the fundamental techniques required to extract essential information from vast datasets. Featuring case studies drawn largely from the iron and steel industries, this book offers practical guidance which will enable readers to easily understand big data network analysis. Particular attention is paid to the methodology of network analysis, offering information on the method of data collection, on research design and analysis, and on the interpretation of results. A variety of programs including UCINET, NetMiner, R, NodeXL, and Gephi for network analysis are covered in detail. Fundamentals of Big Data Network Analysis for Research and Industry looks at big data from a fresh perspective, and provides a new approach to data analysis. This book: Explains the basic concepts in understanding big data and filtering meaningful data Presents big data analysis within the networking perspective Features methodology applicable to research and industry Describes in detail the social relationship between big data and its implications Provides insight into identifying patterns and relationships between seemingly unrelated big data Fundamentals of Big Data Network Analysis for Research and Industry will prove a valuable resource for analysts, research engineers, industrial engineers, marketing professionals, and any individuals dealing with accumulated large data whose interest is to analyze and identify potential relationships among data sets.

Hyunjoung Lee, Institute of Green Technology, Yonsei University, Republic of Korea. Il Sohn, Material Science and Engineering, Yonsei University, Republic of Korea.

Preface ix

About the Authors xi

List of Figures xiii

List of Tables xvii

1 Why Big Data? 1

1.1 Big Data 1

1.2 What Creates Big Data? 6

1.3 How Do We Use Big Data? 9

1.4 Essential Issues Related to Big Data 13

References 14

2 Basic Programs for Analyzing Networks 15

2.1 UCINET 15

2.2 NetMiner 20

2.3 R 22

2.4 Gephi 28

2.5 NodeXL 31

References 32

3 Understanding Network Analysis 35

3.1 Defining Social Network Analysis 35

3.2 Basic SNA Concepts 37

3.2.1 Basic Terminology 37

3.2.2 Representation of a Network 38

3.3 Social Network Data 40

3.3.1 One?]Mode and Two?]Mode Networks 40

3.3.2 Attributes and Weights 42

3.3.3 Network Data Form 42

References 44

4 Research Methods Using SNA 45

4.1 SNA Research Procedures 46

4.2 Identifying the Research Problem and Developing Hypotheses 47

4.2.1 Identifying the Research Problem 47

4.2.2 Developing Hypotheses 47

4.3 Research Design 49

4.3.1 Defining the Network Model 49

4.3.2 Establishing Network Boundaries 51

4.3.3 Measurement Evaluation 52

4.4 Acquisition of Network Data 54

4.4.1 Survey 54

4.4.2 Interview, Observation, and Experiment 55

4.4.3 Existing Data 56

4.5 Data Cleansing 58

4.5.1 Extraction of the Node and Link 59

4.5.2 Merging and Separation of Data 59

4.5.3 Directional Transformation in the Link 61

4.5.4 Transformation of the Weights in Links 64

4.5.5 Transformation of the Two?]Mode Network to a One?]Mode Network 66

References 69

5 Position and Structure 71

5.1 Position 71

5.1.1 Degree Centrality 72

5.1.2 Closeness Centrality 82

5.1.3 Betweenness Centrality 84

5.1.4 Prestige Centrality 85

5.1.5 Broker 88

5.2 Cohesive Subgroup 91

5.2.1 Component 91

5.2.2 Community 92

5.2.3 Clique 93

5.2.4 k?]Core 95

References 96

6 Connectivity and Role 97

6.1 Connection Analysis 98

6.1.1 Connectivity 98

6.1.2 Reciprocity 99

6.1.3 Transitivity 102

6.1.4 Assortativity 104

6.1.5 Network Properties 104

6.2 Role 104

6.2.1 Structural Equivalence 105

6.2.2 Automorphic Equivalence 107

6.2.3 Role Equivalence 109

6.2.4 Regular Equivalence 111

6.2.5 Block Modeling 115

References 117

7 Data Structure in NetMiner 119

7.1 Sample Data 119

7.1.1 01.Org_Net_Tiny1 120

7.1.2 02.Org_Net_Tiny2 120

7.1.3 03.Org_Net_Tiny3 121

7.2 Main Concept 122

7.2.1 Data Structure 122

7.2.2 Creating Data 124

7.2.3 Inserting Data 125

7.2.4 Importing Data 129

7.3 Data Preprocessing 130

7.3.1 Change of Link 130

7.3.2 Extraction and Reordering of the Node and Link 133

7.3.3 Data Merge and Split 136

Reference 140

8 Network Analysis Using NetMiner 141

8.1 Centrality and Cohesive Subgroup 141

8.1.1 Centrality 141

8.1.2 Cohesive Subgroup 147

8.2 Connectivity and Equivalence 153

8.2.1 Connectivity 153

8.2.2 Equivalence 156

8.3 Visualization and Exploratory Analysis 161

8.3.1 Visualization 161

8.3.2 Transformation of the Two?]Mode Network to a One?]Mode Network 168

Appendix A Visualization 171

A.1 Spring Algorithm 171

A.2 Multidimensional Scaling Algorithm 173

A.3 Cluster Algorithm 173

A.4 Layered Algorithm 174

A.5 Circular Algorithm 174

A.6 Simple Algorithm 175

References 176

Appendix B Case Study: Knowledge Structure of Steel Research 179

Index 193

List of Figures

1.1	Hard-disk drive average cost per gigabytes (unit: US$)

2.1	UCINET 6 interface

2.2	Results of density and degree centrality using UCINET. (a) Density and (b) degree centrality

2.3	Visualization using NetDraw

2.4	NetMiner4 work environment

2.5	Results of density and degree centrality analyses in NetMiner. (a) Density and (b) degree centrality

2.6	NetMiner data structure and data set. (a) Data structure and (b) data set

2.7	The R interface

2.8	The Gephi interface

2.9	Gephi data laboratory and preview screens

2.10	NodeXL interface

3.1	A network graph and matrix. (a) Graph (b) matrix

3.2	(a) Path and (b) degree

3.3	Cut-point and bridges of a network component

3.4	Structure of the network data

3.5	Transformation of a two-mode network into a one-mode network

4.1	Research procedure

4.2	PSY’s tweets

4.3	Visualization of the extracted node and link. (a) Visual network of the extracted node. Node attribute: total export amount >US$5000 million. (b) Visual network of the extracted link. Link attribute: export amount >US$2500 million

4.4	Visualization of the two-mode and one-mode networks. (a) two-mode network (export: products–countries) and (b) one-mode network (export: countries–countries)

5.1	Visual representation of iron and steel trade

5.2	Visualization of the non-directional trade relationship

5.3	Visualization of the trade relationship with direction

5.4	Visualization of betweenness centrality

5.5	Type of broker

5.6	(Strong vs. weak) component

5.7	Results of component analysis. (a) Weak component and (b) strong component

5.8

Community

5.9	Results of community analysis

5.10	Clique, n-clique, n-clan, and k-plex, and k-core

5.11	Results of clique, n-clique, n-clan, k-plex, and k-core

6.1	Walk, trail, path

6.2	Results of link connectivity

6.3	Type of dyad relationship

6.4	Type of triad relationship

6.5	Triad isomorphism classes

6.6	Assortative relationship

6.7	Network properties

6.8	Structural equivalence

6.9	Dendrogram of structural equivalence. (a) Import relationship dendrogram and (b) export relationship dendrogram

6.10	Automorphic equivalence

6.11	Regular equivalence

6.12	Block modeling. (a) Visualization of network, (b) matrix (node by node), (c) block-node affiliation matrix (node by group), (d) block image matrix (group by group), and (e) visualization of block image matrix

6.13	Results of block modeling. (a) Block-node affiliation matrix (node by group), (b) block image matrix (group by group), and (c) visualization of block image matrix

7.1	Hierarchical structure of NetMiner data

7.2	Attribute of node and link

7.3	New project type

7.4

Workfile

7.5	Create the network and node set. (a) Create new 1-mode network, (b) create new sub nodeset, and (c) create new 2-monde network

7.6	Insert nodes and node’s attributes. (a) Insert new node and (b) insert new node attribute

7.7	Insert links and link’s attributes. (a) Insert new link and (b) insert new link attribute

7.8	Data import

7.9	Symmetrize

7.10

Transpose

7.11	Dichotomize

7.12

Reverse

7.13

Normalize

7.14	Recode. (a) Input variable, (b) dialog box for recode, (c) recoding rules, and (d) output of recoding

7.15

Self-loop

7.16	Extraction of node and link. (a) QuerySet and (b) new workfile

7.17	Neighbor node. (a) Output summary and ego network details

7.18	Merge. (a) Main process for merge, (b) one-mode networks before the merge, and (c) one-mode network after the merge

7.19	Split. (a) Main process for split and (b) one-mode networks after the split

8.1	Degree. (a) [R]Main, (b) [T]Degree, and [T]Node Type

8.2	[M]Spring map of degree

8.3	Degree centrality. (a) [R]Main, (b) [T]Degree centrality vector, (c) [M]Spring (node size: in degree centrality), and (d) [M]Concentric

8.4	Closeness centrality. (a) [R]Main, (b) [T]Closeness centrality vector, (c) [M]Spring (node size: in-closeness centrality), and (d) [M]Concentric

8.5	Betweenness centrality. (a) [R]Main, (b) [T]Betweenness centrality vector, (c) [M]Spring (node size: betweenness centrality), and (d) [M]Concentric

8.6	Prestige centrality. (a) [R]Main, (b) [T]Eigenvector centrality vector, (c) [T]Reflected/Derived/Constant, (d) [M]Spring (node size: Eigenvector centrality), and (e) [M]Concentric

8.7	Brokerage. (a) [R]Main, (b) [T]Brokerage, (c) [M]Spring (node size: total score of brokerage), and (d) [M]Concentric

8.8	Component. (a) Input data and main process, (b) [R]Main, (c) [T]Component partition vector, and (d) [M]Clustered

8.9	Modularity (community). (a) [R]Main and (b) [T]Community Partition

8.10	Betweenness (community). (a) [T]Community Cluster Matrix, (b) [T]Permutation Vector, (c) [C]Dendrogram, and (d) [M]Clustered

8.11	Clique. (a) [R]Main, (b) [T]Clique Affiliation Matrix, and (c) [M]Spring

8.12	n-Clique. (a) [R]Main, (b) [T]n-Clique Affiliation Matrix, and (c) [M]Spring (n-clique 2)

8.13	k-core. (a) [R]Main and (b) [T]k-Core Affiliation Matrix

8.14	Connectivity. (a) [T]Node Connectivity Matrix and (b) [M]Spring

8.15	Reciprocity and transitivity. (a) Dyad census and (b) triad census

8.16	Assortativity. (a) [R]Main – Degree, (b) [R]Main – Team, (c) [T]Assortativity – Degree, and (d) [T]Assortativity – Team

8.17	Network properties

8.18	Structural equivalence (profile). (a) [T]Profile Matrix and (b) [M]MDS

8.19	Structural equivalence (CONCOR). (a) [T]CONCOR Matrix and (b) [M]MDS

8.20	Role equivalence. (a) [T]Triad Role...

Erscheint lt. Verlag	16.11.2015
Sprache	englisch
Themenwelt	Informatik ► Datenbanken ► Data Warehouse / Data Mining
	Mathematik / Informatik ► Informatik ► Netzwerke
	Mathematik / Informatik ► Mathematik
	Naturwissenschaften
Schlagworte	Big Data • Business & Management • Business Statistics & Math • Computer Science • Database & Data Warehousing Technologies • Data Mining • Data Mining Statistics • Datenbanken u. Data Warehousing • Gephi • Industry Analysis • Informatik • NetMiner • network analysis • Netzwerkanalyse • NodeXL • R • Statistics • Statistik • Steel Research • trading data • UCInet • Wirtschaftsmathematik u. -statistik • Wirtschaft u. Management
ISBN-10	1-119-01549-9 / 1119015499
ISBN-13	978-1-119-01549-9 / 9781119015499

Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?

EPUB (Adobe DRM)

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.