Zum Hauptinhalt springen
Nicht aus der Schweiz? Besuchen Sie lehmanns.de

Integrative Cluster Analysis in Bioinformatics (eBook)

eBook Download: EPUB
2015
John Wiley & Sons (Verlag)
978-1-118-90655-2 (ISBN)

Lese- und Medienproben

Integrative Cluster Analysis in Bioinformatics - Basel Abu-Jamous, Rui Fa, Asoke K. Nandi
Systemvoraussetzungen
101,99 inkl. MwSt
(CHF 99,60)
Der eBook-Verkauf erfolgt durch die Lehmanns Media GmbH (Berlin) zum Preis in Euro inkl. MwSt.
  • Download sofort lieferbar
  • Zahlungsarten anzeigen

Clustering techniques are increasingly being put to use in the analysis of high-throughput biological datasets. Novel computational techniques to analyse high throughput data in the form of sequences, gene and protein expressions, pathways, and images are becoming vital for understanding diseases and future drug discovery.

This book details the complete pathway of cluster analysis, from the basics of molecular biology to the generation of biological knowledge. The book also presents the latest clustering methods and clustering validation, thereby offering the reader a comprehensive review of clustering analysis in bioinformatics from the fundamentals through to state-of-the-art techniques and applications.

Key Features:

  • Offers a contemporary review of clustering methods and applications in the field of bioinformatics, with particular emphasis on gene expression analysis
  • Provides an excellent introduction to molecular biology with computer scientists and information engineering researchers in mind, laying out the basic biological knowledge behind the application of clustering analysis techniques in bioinformatics
  • Explains the structure and properties of many types of high-throughput datasets commonly found in biological studies
  • Discusses how clustering methods and their possible successors would be used to enhance the pace of biological discoveries in the future
  • Includes a companion website hosting a selected collection of codes and links to publicly available datasets

Clustering techniques are increasingly being put to use in the analysis of high-throughput biological datasets. Novel computational techniques to analyse high throughput data in the form of sequences, gene and protein expressions, pathways, and images are becoming vital for understanding diseases and future drug discovery. This book details the complete pathway of cluster analysis, from the basics of molecular biology to the generation of biological knowledge. The book also presents the latest clustering methods and clustering validation, thereby offering the reader a comprehensive review of clustering analysis in bioinformatics from the fundamentals through to state-of-the-art techniques and applications. Key Features: Offers a contemporary review of clustering methods and applications in the field of bioinformatics, with particular emphasis on gene expression analysis Provides an excellent introduction to molecular biology with computer scientists and information engineering researchers in mind, laying out the basic biological knowledge behind the application of clustering analysis techniques in bioinformatics Explains the structure and properties of many types of high-throughput datasets commonly found in biological studies Discusses how clustering methods and their possible successors would be used to enhance the pace of biological discoveries in the future Includes a companion website hosting a selected collection of codes and links to publicly available datasets

Asoke K. Nandi, Department of Electronic & Computer Engineering, Brunel University London, UK Prof. Nandi is Chair and Head of the Electronic and Computer Engineering Department at Brunel University London, UK. He leads the Signal Processing and Communications Research Group with interests in the areas of signal processing, machine learning, and communications research. He is a Finland Distinguished Professor at the University of Jyvaskyla, Finland. In 1983 Professor Nandi was a member of the UA1 team at CERN that discovered the three fundamental particles known as W+, W- and Z0, providing the evidence for the unification of the electromagnetic and weak forces, which was recognized by the Nobel Committee for Physics in 1984. He has authored or co-authored more than 190 journal papers, and 2 books. The Google Scholar h-index of his publications is 54. In 2010 he received the Glory of Bengal Award for his outstanding achievements in scientific research, and in 2012 was awarded the IEEE Heinrich Hertz Award. Prof. Nandi is a Fellow of the IEEE. Rui Fa, Department of Electronic & Computer Engineering, Brunel University London, UK Dr. Fa is a Senior Reseach Fellow in the Department of Electronic & Computer Engineering at Brunel University London, UK. Prior to this, he held research positions at the University of York and the University of Leeds working in radar signal processing and wireless communication projects. His current research interests include bioinformatics, machine learning, Bayesian statistics, statistical signal processing, and network science. Dr Rui Fa received his Bachelor and Master degrees in electronic and electrical engineering from Nanjing University of Science and Technology, China, in 2000 and in 2003, respectively, and his PhD degree in electrical engineering from University of Newcastle, UK, in 2007. Basel Abu-Jamous, Department of Electronic & Computer Engineering, Brunel University London, UK Basel Abu-Jamous is currently studying for his PhD in electrical engineering & electronics at Brunel University London, UK. His research interests include bioinformatics, computational biology, and the broader areas of information engineering and machine learning.Basel Abu-Jamous received his B.Sc. degree in computer engineering from the University of Jordan, Amman, Jordan, in 2010. He received his M.Sc. degree in information and intelligence engineering from the University of Liverpool, Liverpool, U.K., in 2011. He was awarded the Sir Robin Saxby Prize of the 2010/2011 academic year based on his performance in his M.Sc. degree.

Preface xix

List of Symbols xxi

About the Authors xxiii

Part One Introduction 1

1 Introduction to Bioinformatics 3

2 Computational Methods in Bioinformatics 9

Part Two Introduction to Molecular Biology 19

3 The Living Cell 21

4 Central Dogma of Molecular Biology 33

Part Three Data Acquisition and Pre-processing 53

5 High-throughput Technologies 55

6 Databases, Standards and Annotation 67

7 Normalisation 87

8 Feature Selection 109

9 Differential Expression 119

Part Four Clustering Methods 133

10 Clustering Forms 135

11 Partitional Clustering 143

12 Hierarchical Clustering 157

13 Fuzzy Clustering 167

14 Neural Network-based Clustering 181

15 Mixture Model Clustering 197

16 Graph Clustering 227

17 Consensus Clustering 247

18 Biclustering 265

19 Clustering Methods Discussion 283

Part Five Validation and Visualisation 303

20 Numerical Validation 305

21 Biological Validation 323

22 Visualisations and Presentations 339

Part Six New Clustering Frameworks Designed for Bioinformatics 363

23 Splitting-Merging Awareness Tactics (SMART) 365

24 Tightness-tunable Clustering (UNCLES) 385

Appendix 395

Index 409

1
Introduction to Bioinformatics


1.1 Introduction


Interesting research fields emerge through the collaboration of researchers from different, sometimes distant, disciplines. Examples include biochemistry, biophysics, quantum information science, systems engineering, mechatronics, business information systems, management information systems, geophysics, biomedical engineering, cybernetics, art history, media technology and others. This marriage between disciplines yields findings which blend the views of different areas over the same subject or set of data.

The stimuli leading to such collaborations are numerous. For example, one discipline may develop tools that generate types of data that require another discipline to analyse. In other cases, one field scratches a layer of unknowns to discover that significant parts of its scope are actually based on the principles of another field, such as the low-level biological studies of the chemical interactions in the cells, which delivered biochemistry as an interdisciplinary field. Other interdisciplinary fields emerged because of their complementary involvement in building different parts of the same target system or in understanding different sides of the same research question; for example, mechatronics engineering aims at building systems which have both mechanical and electronic parts, such as all modern automobiles. Interdisciplinary areas like business information systems and management information systems have emerged due to the high demand for information systems which target business and management aspects; although generic information systems would meet many of those requirements, a customised field focusing on such applications is indeed more efficient given such high demand.

The interdisciplinary field of this book’s focus is bioinformatics. The motive behind this field’s emergence is the increasingly expanding generation of massive raw biological data following the developments in high-throughput techniques in the last couple of decades. The scale of this high-throughput data is orders of magnitude higher than what can be efficiently analysed in a manual fashion. Consequently, information engineers were recruited in order to contribute to data analysis by employing their computational methods. Cycles of computational analysis, sharing of results, interdisciplinary discussions and abstractions have led, and are still leading, to many key discoveries in biology and medicine. This success has attracted many information engineers towards biology and many biologists towards information engineering to meet in a potentially rich intersection area, which itself has grown in size to establish the field of bioinformatics.

1.2 The “Omics” Era


A new suffix has been introduced to the English language in this era of high-throughput data expansion; that is “-omics”, and its relatives “-ome” and “-omic”. This started in the 1930s when the entire set of genes carried by a chromosome was called the genome, blending the words “gene” and “chromosome” (OED, 2014). Consequently, the analysis of the entire genome was called genomics, and many known research journals carried the term “genome” or “genomics” in their titles such as Genomics, Genome Research, Genome Biology, BMC Genomics, Genome Medicine, the Journal of Genetics and Genomics (JGG), and others.

The -ome suffix was not kept exclusive for the genome; it has been rather generalised to indicate the complete set of some type of molecule or object. The proteome is the complete set of proteins in a cell, tissue or organism. Similarly are the transcriptome, metabolome, glycome and lipidome for the complete sets of transcripts, metabolites, glycans (carbohydrates) and lipids. In a respective order, large-scale studies of those complete sets are known as proteomics, transcriptomics, metabolomics, glycomics and lipidomics. The -ome suffix was further generalised to include the complete sets of objects other than basic molecules. For example, the microbiome is the complete set of microorganisms (e.g. bacteria, microscopic fungi, etc.) in a given environment such as a building, a sample of soil or the human gut (Kembel et al., 2014). More omic fields have also emerged such as agrigenomics (the application of genomics in agriculture), pharmacogenomics and pharmacoproteomics (the application of genomics and proteomics to pharmacology), and others.

All of those biological fields of omics involve high-throughput datasets which are subject to information engineering involvements, and therefore reside at the core focus of bioinformatic research. An even higher level of omics analysis involves integrative analysis of many types of omic datasets. OMICS: a Journal of Integrative Biology is a journal which targets research studies that consider such collective analysis at different levels from single cells to societies.

More types of high-throughput omic datasets are expected to emerge. The role of bioinformatics as an interdisciplinary field will be more important. This is not only because each of those omic datasets is massive in size when considered individually; it is also because of the size of information hidden in the relations between those generally heterogeneous datasets, which requires more sophisticated computational methods to analyse.

1.3 The Scope of Bioinformatics


The scope of bioinformatics includes the development of methods, techniques and tools which target storage, retrieval, organisation, analysis and presentation of high-throughput biological data.

1.3.1 Areas of Molecular Biology Subject to Bioinformatics Analysis


In a very general statement, each part of molecular biology which produces high-throughput data is subject to bioinformatics analysis. On the other hand, low-throughput data which can be manually analysed do not represent subjects for bioinformatics. The omics fields described in the previous section are indeed included in bioinformatics analysis. This includes aspects of DNA, RNA and protein sequence analysis, gene and protein expression, genetics of diseases including cancers and special phenotypes, analysis of gene regulation, chemical interaction regulation, enzymatic regulation, other types of regulation, analysis of flowing signals in cells, networks of genetic, protein and other molecular interactions, comparable analysis of the diversity of genomes between individuals or organisms in an environment or across different environments, and others.

1.3.2 Data Storage, Retrieval and Organisation


The human genome is a linear thread of more than three billion base-pairs (letters). In 2012, and after more than 4 years after its starting point, the 1000 Genomes Project Consortium announced the completion of sequencing of the complete genomes of 1092 individuals from fourteen different populations (The 1000 Genomes Project Consortium, 2014). Moreover, the genomes of thousands of organisms, other than humans, have been sequenced and stored during the last two to three decades. As for gene expression data, tens of thousands of massive microarray datasets have been generated in the last two decades. Add to that the increasing amounts of data generated for protein expression, DNA binding and other types of high-throughput data. Data generation has not stopped and is expected to increase rapidly due to the massive advances in technologies and cost reduction. Therefore, it is crucial to store such amounts of datasets in an efficient manner which allows for quick and efficient access by large numbers of researchers from different parts of the world simultaneously.

Given the current trend, which is to offer most of the generated high-throughput datasets for public use in centralised databases, it becomes essential to standardise the way in which data are organised, annotated and labelled. This enhances information exchange and mutual understanding between different research groups in the world.

Taken together, the scope of bioinformatics indeed includes designing and implementing appropriate databases for high-throughput biological data storage, building means of data access to those databases such as web services and network applications, organising different levels of data pieces by standard formats and annotations, and, undoubtedly, maintaining and enhancing the availability and the scalability of these data repositories.

1.3.3 Data Analysis


Elaine Mardis, the Professor of Genetics in the Genome Institute at Washington University, and a collaborator in the 1000 Genomes Project, titled her “musing” published in Genome Medicine in 2010 as “the $1,000 genome, the $100,000 analysis?” (Mardis, 2010). Mardis discussed the tremendous drop in the cost of sequencing the complete genome of an individual human from hundreds of millions of dollars to a few thousands, and that it is expected to reach the line of $1,000. She mused, based on many facts and observations, that the cost of data analysis, which does not seem to be dropping, will constitute the major part of the total cost, rather than the cost of data generation.

A small proportion of human genes, out of 20 000–25 000, has been well described and understood, while many gaps in our understanding of the vast majority of them do still exist. Identifying the sequence of a gene from a thousand individuals and measuring...

Erscheint lt. Verlag 16.4.2015
Sprache englisch
Themenwelt Mathematik / Informatik Informatik
Naturwissenschaften Biologie
Technik Elektrotechnik / Energietechnik
Schlagworte Bioinformatics • biomedical engineering • Biomedizintechnik • Clustering • Clustering validation • Computational Bioengineering • Computational Biology • Electrical & Electronics Engineering • Elektrotechnik u. Elektronik • Genetics • High-throughput data analysis • Integrative cluster analysis • microarray • Mustererkennung • Omics • Pattern Analysis • Rechnergestütztes Bioengineering • Rechnergestütztes Bioengineering • Signal Processing • Signalverarbeitung • Unsupervised Learning
ISBN-10 1-118-90655-1 / 1118906551
ISBN-13 978-1-118-90655-2 / 9781118906552
Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?
EPUBEPUB (Adobe DRM)

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belle­tristik und Sach­büchern. Der Fließ­text wird dynamisch an die Display- und Schrift­größe ange­passt. Auch für mobile Lese­geräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich

von Herbert Voß

eBook Download (2025)
Lehmanns Media (Verlag)
CHF 19,50
Management der Informationssicherheit und Vorbereitung auf die …

von Michael Brenner; Nils gentschen Felde; Wolfgang Hommel …

eBook Download (2024)
Carl Hanser Fachbuchverlag
CHF 68,35