Strategies in Biomedical Data Science (eBook)
John Wiley & Sons (Verlag)
978-1-119-25618-2 (ISBN)
Strategies in Biomedical Data Science provides medical professionals with much-needed guidance toward managing the increasing deluge of healthcare data. Beginning with a look at our current top-down methodologies, this book demonstrates the ways in which both technological development and more effective use of current resources can better serve both patient and payer. The discussion explores the aggregation of disparate data sources, current analytics and toolsets, the growing necessity of smart bioinformatics, and more as data science and biomedical science grow increasingly intertwined. You'll dig into the unknown challenges that come along with every advance, and explore the ways in which healthcare data management and technology will inform medicine, politics, and research in the not-so-distant future. Real-world use cases and clear examples are featured throughout, and coverage of data sources, problems, and potential mitigations provides necessary insight for forward-looking healthcare professionals.
Big Data has been a topic of discussion for some time, with much attention focused on problems and management issues surrounding truly staggering amounts of data. This book offers a lifeline through the tsunami of healthcare data, to help the medical community turn their data management problem into a solution.
- Consider the data challenges personalized medicine entails
- Explore the available advanced analytic resources and tools
- Learn how bioinformatics as a service is quickly becoming reality
- Examine the future of IOT and the deluge of personal device data
The sheer amount of healthcare data being generated will only increase as both biomedical research and clinical practice trend toward individualized, patient-specific care. Strategies in Biomedical Data Science provides expert insight into the kind of robust data management that is becoming increasingly critical as healthcare evolves.
JAY A. ETCHINGS is the director of operations at Arizona State University's Research Computing program, where he is responsible for developing innovative architectures to progress fluid technical environments supporting highly computational workloads, peta-scale data analysis, next-generation cyber capabilities, and emerging network innovations.
An essential guide to healthcare data problems, sources, and solutions Strategies in Biomedical Data Science provides medical professionals with much-needed guidance toward managing the increasing deluge of healthcare data. Beginning with a look at our current top-down methodologies, this book demonstrates the ways in which both technological development and more effective use of current resources can better serve both patient and payer. The discussion explores the aggregation of disparate data sources, current analytics and toolsets, the growing necessity of smart bioinformatics, and more as data science and biomedical science grow increasingly intertwined. You'll dig into the unknown challenges that come along with every advance, and explore the ways in which healthcare data management and technology will inform medicine, politics, and research in the not-so-distant future. Real-world use cases and clear examples are featured throughout, and coverage of data sources, problems, and potential mitigations provides necessary insight for forward-looking healthcare professionals. Big Data has been a topic of discussion for some time, with much attention focused on problems and management issues surrounding truly staggering amounts of data. This book offers a lifeline through the tsunami of healthcare data, to help the medical community turn their data management problem into a solution. Consider the data challenges personalized medicine entails Explore the available advanced analytic resources and tools Learn how bioinformatics as a service is quickly becoming reality Examine the future of IOT and the deluge of personal device data The sheer amount of healthcare data being generated will only increase as both biomedical research and clinical practice trend toward individualized, patient-specific care. Strategies in Biomedical Data Science provides expert insight into the kind of robust data management that is becoming increasingly critical as healthcare evolves.
JAY A. ETCHINGS is the director of operations at Arizona State University's Research Computing program, where he is responsible for developing innovative architectures to progress fluid technical environments supporting highly computational workloads, peta-scale data analysis, next-generation cyber capabilities, and emerging network innovations.
Cover 1
Title Page 9
Copyright 10
Contents 11
Foreword 15
Notes 17
Acknowledgments 19
Introduction 21
Who Should Read This Book? 23
What’s in This Book? 24
How to Contact Us 26
Chapter 1 Healthcare, History, and Heartbreak 27
Top Issues in Healthcare 29
Mergers and Partnerships 29
Cybersecurity and Data Security 30
Securing Multitenant Hosts 32
Insider Threats 32
Data Integrity 32
Encryption 33
Homomorphic Encryption 33
The Proliferation of Devices and Apps 34
New Sources of Data Both Public and Private 35
Data Management 36
The Precision Medicine Initiative 37
Biosimilars, Drug Pricing, and Pharmaceutical Compounding 38
Promising Areas of Innovation 39
The Internet of Things 39
Data Visualization and Imaging 41
Data Storage 42
Data Analytics 43
Compute Capabilities 43
Cloud 44
Conclusion 45
Notes 45
Chapter 2 Genome Sequencing: Know Thyself, One Base Pair at a Time 47
Challenges of Genomic Analysis 49
The Language of Life 50
A Brief History of DNA Sequencing 51
DNA Sequencing and the Human Genome Project 55
Select Tools for Genomic Analysis 58
Genbank 58
The R Project 61
Genome Analysis Toolkit 62
Molecular Evolutionary Genetics Analysis 64
Bowtie 66
Conclusion 67
Notes 68
Chapter 3 Data Management 73
Bits about Data 74
Data Types 76
Structured Data 76
Unstructured Data 77
Semistructured Data 77
Polystructured Data 78
Data Security and Compliance 79
Controls and Responsibility 80
National Institute of Standards and Technology, Federal Information Processing Standards, and Federal Information Security Management Act 80
University Research Data Life Cycle 84
Data Storage 86
Object Storage 86
Integrated Rule-Oriented Data System 87
SwiftStack 90
Genomic Sequencing with Object Storage 91
Multiregion Management 94
Storage Policy Management 95
Example Use Cases 96
Multigeneration Hardware Support 97
Data Placement 97
Gradual Capacity Adjust 97
Undelete and Delete Prev 97
OpenStack Swift Architecture 98
Applications: How to Access Object Storage 112
Conclusion 114
Notes 114
Chapter 4 Designing a Data-Ready Network Infrastructure 125
Research Networks: A Primer 128
ESnet at 30: Evolving toward Exascale and Raising Expectations 129
Esnet Then and Now 130
Internet2 Innovation Platform 131
Advances in Networking 133
InfiniBand and Microsecond Latency 134
What Is RDMA, and What Are Its Benefits? 136
How Is InfiniBand Different from Traditional Network Protocols? 136
The Future of High-Performance Fabrics 137
Intel Omni-Path Key Fabric Features and Innovations 137
Network Function Virtualization 139
Software-Defined Networking 141
OpenDaylight 142
End-to-End Scenarios for Common Usage in Large Carrier, Enterprise, and Research Networks 143
OpenDaylight Architecture 144
Test Environment 146
Performance Results 147
NETCONF 156
OVSDB 169
BGP 171
PCEP 174
Key Factors That Affect Performance 176
Additional References 176
Conclusion 177
Notes 177
Chapter 5 Data-Intensive Compute Infrastructures 183
Big Data Applications in Health Informatics 186
Sources of Big Data in Health Informatics 188
Infrastructure for Big Data Analytics 191
Service-Oriented Architecture Combined with Cloud Computing 191
Hierarchical Structure of Systems 192
Fundamental System Properties 206
GPU-Accelerated Computing and Biomedical Informatics 207
How Do Accelerators Impact Applications? 208
CPU versus GPU* 209
Conclusion 210
Notes 211
Chapter 6 Cloud Computing and Emerging Architectures 231
Cloud Basics 233
Essential Characteristics* 233
Service Models 234
Deployment Models 234
Challenges Facing Cloud Computing Applications in Biomedicine 235
Hybrid Campus Clouds 236
Research as a Service 237
Federated Access Web Portals 239
Cluster Homogeneity 240
Emerging Architectures (Zeta Architecture) 241
A Brief History of Enterprise Architectures 241
Isolated Workloads Come at a Cost 242
Goals with a New Approach 243
The Google Example 246
Integration of the Zeta Architecture 247
Zeta and Application Architectures 248
Streaming Applications 248
Conclusion 249
Notes 249
Chapter 7 Data Science 255
NoSQL Approaches to Biomedical Data Science 257
Graph Databases 261
Using Splunk for Data Analytics 264
The Healthcare Challenge 264
Data Analytics in Healthcare 264
Statistical Analysis of Genomic Data with Hadoop 270
Extracting and Transforming Genomic Data 273
Processing eQTL Data 276
Input Data 276
Software Tools 276
NGCC Hadoop Cluster Configuration 276
Steps to Process Data 277
Statistical Analysis in Spark 278
Create Hive Tables 279
Generating Master SNP Files for Cases and Controls 279
Input Data Format 279
Generating Gene Expression Files for Cases and Controls 280
Input Data Format 280
Cleaning Raw Data Using MapReduce 281
Input Data Format 281
MapReduce Code: Controls 282
Cases 282
Transpose Data Using Python 283
Controls 283
Cases 284
Statistical Analysis Using Spark 284
Launching Spark Applications 285
Configuration 285
Controls 286
Cases 286
Final Output Size 287
Hive Tables with Partitions 288
Controls 288
Cases 288
Summary 289
Conclusion 290
Notes 290
Appendix: A Brief Statistics Primer 310
Foundations 310
Population and Sample 310
Random Variables 311
Discrete Random Variables 311
Continuous Random Variables 314
Expected Value and Variance 317
Regression Analysis 318
Ordinary Least Squares (OLS) 319
Estimator Accuracy 320
Goodness of Fit 322
Multivariate Linear Regression 323
OLS Estimators 323
Relationship Strength 323
Logistic Regression 324
Estimating Parameters 325
So Why Logistic Regression? 325
References 326
Chapter 8 Next-Generation Cyberinfrastructures 327
Next-Generation Cyber Capability 328
NGCC Design and Infrastructure 330
Systems Architecture 332
Enterprise 333
Physical Infrastructure 335
Big Data 337
Conclusion 347
Logical Infrastructure 339
TransCORE Framework 343
Mixed Capacity 344
Staff Resources 344
Resource Management (Space Allocation) 346
Conclusion 347
Design 347
Deployment 349
Note 350
Conclusion 355
Appendix A The Research Data Management Survey: From Concepts to Practice 357
Appendix B Central IT and Research Support 373
Appendix C HPC Working Example: Using Parallelization Programs Such as GNU Parallel and OpenMP, with Serial Tools 397
Appendix D HPC and Hadoop: Bridging HPC to Hadoop 405
Appendix E Bioinformatics + Docker: Simplifying Bioinformatics Tools Delivery with Docker Containers 411
Glossary 419
About the Author 439
About the Contributors 441
Index 447
EULA 466
| Erscheint lt. Verlag | 3.1.2017 |
|---|---|
| Reihe/Serie | SAS Institute Inc |
| SAS Institute Inc | Wiley and SAS Business Series |
| Vorwort | Ken Buetow |
| Sprache | englisch |
| Themenwelt | Mathematik / Informatik ► Informatik ► Datenbanken |
| Medizin / Pharmazie ► Allgemeines / Lexika | |
| Medizin / Pharmazie ► Gesundheitswesen | |
| Medizin / Pharmazie ► Medizinische Fachgebiete | |
| Naturwissenschaften ► Biologie | |
| Schlagworte | Big Data Analytics • Big data in healthcare • Bioinformatics • biomedical data careers • biomedical data generation • biomedical engineering • biomedical research analytics • Biomedizintechnik • Healthcare Analytics • healthcare data challenges • healthcare data management • healthcare data solutions • healthcare data sources • healthcare data volume • health device data • Jay Etchings • Medical Data • medical data and politics • medical data management • Medical Informatics & Biomedical Information Technology • Medizininformatik u. biomedizinische Informationstechnologie • personalized medicine data • Strategies in Biomedical Data Science: Driving Force for Innovation |
| ISBN-10 | 1-119-25618-6 / 1119256186 |
| ISBN-13 | 978-1-119-25618-2 / 9781119256182 |
| Informationen gemäß Produktsicherheitsverordnung (GPSR) | |
| Haben Sie eine Frage zum Produkt? |
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich