Bioinformatics and Functional Genomics (eBook)
John Wiley & Sons (Verlag)
978-1-118-58176-6 (ISBN)
The bestselling introduction to bioinformatics and genomics - now in its third edition
Widely received in its previous editions, Bioinformatics and Functional Genomics offers the most broad-based introduction to this explosive new discipline. Now in a thoroughly updated and expanded third edition, it continues to be the go-to source for students and professionals involved in biomedical research.
This book provides up-to-the-minute coverage of the fields of bioinformatics and genomics. Features new to this edition include:
- Extensive revisions and a slight reorder of chapters for a more effective organization
- A brand new chapter on next-generation sequencing
- An expanded companion website, also updated as and when new information becomes available
- Greater emphasis on a computational approach, with clear guidance of how software tools work and introductions to the use of command-line tools such as software for next-generation sequence analysis, the R programming language, and NCBI search utilities
The book is complemented by lavish illustrations and more than 500 figures and tables - many newly-created for the third edition to enhance clarity and understanding. Each chapter includes learning objectives, a problem set, pitfalls section, boxes explaining key techniques and mathematics/statistics principles, a summary, recommended reading, and a list of freely available software. Readers may visit a related Web page for supplemental information such as PowerPoints and audiovisual files of lectures, and videocasts of how to perform many basic operations: www.wiley.com/go/pevsnerbioinformatics.
Bioinformatics and Functional Genomics, Third Edition serves as an excellent single-source textbook for advanced undergraduate and beginning graduate-level courses in the biological sciences and computer sciences. It is also an indispensable resource for biologists in a broad variety of disciplines who use the tools of bioinformatics and genomics to study particular research problems; bioinformaticists and computer scientists who develop computer algorithms and databases; and medical researchers and clinicians who want to understand the genomic basis of viral, bacterial, parasitic, or other diseases.
Jonathan Pevsner, PhD, is a Professor in the Department of Neurology at Kennedy Krieger Institute, an internationally recognized institution dedicated to improving the lives of children with neurodevelopmental disorders. He holds a primary faculty appointment as Professor in the Department of Psychiatry and Behavioral Sciences (Johns Hopkins University School of Medicine). He holds joint or secondary appointments in the Department of Neuroscience, the Institute of Genetic Medicine, and the Division of Health Sciences Informatics (Johns Hopkins School of Medicine), and the Department of Molecular Microbiology and Immunology (Johns Hopkins Bloomberg School of Public Health). He has taught bioinformatics courses since 2000 at the Johns Hopkins School of Medicine, and was awarded Teacher of the Year honors by the Graduate Student Association in both 2001 and 2006, the Professors' Award for Excellence in Teaching awarded by the medical faculty (2003), Teacher of the Year (Advanced Academic Programs, 2009), and Teaching Excellence Award in the Johns Hopkins Bloomberg School of Public Health (2011). In 2013 his lab used whole genome sequencing and reported a mutation that causes a rare disease, Sturge-Weber syndrome, as well as a commonly occurring port-wine stain birthmark.
Jonathan Pevsner, PhD, is a Professor in the Department of Neurology at Kennedy Krieger Institute, an internationally recognized institution dedicated to improving the lives of children with neurodevelopmental disorders. He holds a primary faculty appointment as Professor in the Department of Psychiatry and Behavioral Sciences (Johns Hopkins University School of Medicine). He holds joint or secondary appointments in the Department of Neuroscience, the Institute of Genetic Medicine, and the Division of Health Sciences Informatics (Johns Hopkins School of Medicine), and the Department of Molecular Microbiology and Immunology (Johns Hopkins Bloomberg School of Public Health). He has taught bioinformatics courses since 2000 at the Johns Hopkins School of Medicine, and was awarded Teacher of the Year honors by the Graduate Student Association in both 2001 and 2006, the Professors' Award for Excellence in Teaching awarded by the medical faculty (2003), Teacher of the Year (Advanced Academic Programs, 2009), and Teaching Excellence Award in the Johns Hopkins Bloomberg School of Public Health (2011). In 2013 his lab used whole genome sequencing and reported a mutation that causes a rare disease, Sturge-Weber syndrome, as well as a commonly occurring port-wine stain birthmark.
Part I Analyzing DNA, RNA, and Protein Sequences
1 Introduction 3
2 Access to Sequence Data and Related Information 19
3 Pairwise Sequence Alignment 69
4 Basic Local Alignment Search Tool (BLAST) 121
5 Advanced Database Searching 167
6 Multiple Sequence Alignment 205
7 Molecular Phylogeny and Evolution 245
Part II Genomewide Analysis of DNA, RNA, and Protein
8 DNA: The Eukaryotic Chromosome 307
9 Analysis of Next-Generation Sequence Data 377
10 Bioinformatic Approaches to Ribonucleic Acid (RNA) 433
11 Gene Expression: Microarray and RNA-seq Data Analysis 479
12 Protein Analysis and Proteomics 539
13 Protein Structure 589
14 Functional Genomics 635
Part III Genome Analysis
15 Genomes Across the Tree of Life 699
16 Completed Genomes: Viruses 755
17 Completed Genomes: Bacteria and Archaea 797
18 Eukaryotic Genomes: Fungi 847
19 Eukaryotic Genomes: From Parasites to Primates 887
20 Human Genome 957
21 Human Disease 1011
Glossary 1075
Self-Test Quiz: Solutions 1103
Author Index 1105
Subject Index 1109
The first third of this book covers essential topics in bioinformatics. Chapter 1 provides an overview of the approaches we take, including the use of web-based and command-line software. We describe how to access sequences (Chapter 2). We then align them in a pairwise fashion (Chapter 3) or compare them to members of a database using BLAST (Chapter 4), including specialized searches of protein or DNA databases (Chapter 5). We next perform multiple sequence alignment (Chapter 6) and visualize these alignments as phylogenetic trees with an evolutionary perspective (Chapter 7).
The upper image shows the connectivity of the internet (from the Wikipedia entry for “internet”), while the lower image shows a map of human protein interactions (from the Wikipedia entry for “Protein–protein interaction”). We seek to understand biological principles on a genome-wide scale using the tools of bioinformatics.
Sources: Upper: Dcrjsr, 2002. Licensed under the Creative Commons Attribution 3.0 Unported license. Lower: The Opte Project, 2006. Licensed under the Creative Commons Attribution 2.5 Generic license.
CHAPTER 1
Introduction
Penetrating so many secrets, we cease to believe in the unknowable. But there it sits nevertheless, calmly licking its chops.
— H.L. Mencken
LEARNING OBJECTIVES
After reading this chapter you should be able to:
- define the terms bioinformatics;
- explain the scope of bioinformatics;
- explain why globins are a useful example to illustrate this discipline; and
- describe web-based versus command-line approaches to bioinformatics.
Bioinformatics represents a new field at the interface of the ongoing revolutions in molecular biology and computers. I define bioinformatics as the use of computer databases and computer algorithms to analyze proteins, genes, and the complete collection of deoxyribonucleic acid (DNA) that comprises an organism (the genome). A major challenge in biology is to make sense of the enormous quantities of sequence data and structural data that are generated by genome-sequencing projects, proteomics, and other large-scale molecular biology efforts. The tools of bioinformatics include computer programs that help to reveal fundamental mechanisms underlying biological problems related to the structure and function of macromolecules, biochemical pathways, disease processes, and evolution.
According to a National Institutes of Health (NIH) definition, bioinformatics is “research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral, or health data, including those to acquire, store, organize, analyze, or visualize such data.” The related discipline of computational biology is “the development and application of data-analytical and theoretical methods, mathematical modeling, and computational simulation techniques to the study of biological, behavioral, and social systems.” Another definition from the National Human Genome Research Institute (NHGRI) is that “Bioinformatics is the branch of biology that is concerned with the acquisition, storage, display, and analysis of the information found in nucleic acid and protein sequence data.”
The NIH Bioinformatics Definition Committee findings are reported at http://www.bisti.nih.gov/docs/CompuBioDef.pdf (WebLink 1.1 at http://bioinfbook.org). The NHGRI definition is available at http://www.genome.gov/19519278 (WebLink 1.2).
Russ Altman (1998) and Altman and Dugan (2003) offer two definitions of bioinformatics. The first involves information flow following the central dogma of molecular biology (Fig. 1.1). The second definition involves information flow that is transferred based on scientific methods. This second definition includes problems such as designing, validating, and sharing software; storing and sharing data; performing reproducible research workflows; and interpreting experiments.
Figure 1.1 A first perspective of the field of bioinformatics is the cell. Bioinformatics has emerged as a discipline as biology has become transformed by the emergence of molecular sequence data. Databases such as the European Molecular Biology Laboratory (EMBL), GenBank, the Sequence Read Archive, and the DNA Database of Japan (DDBJ) serve as repositories for quadrillions (1015) of nucleotides of DNA sequence data (see Chapter 2). Corresponding databases of expressed genes (RNA) and protein have been established. A main focus of the field of bioinformatics is to study molecular sequence data to gain insight into a broad range of biological problems.
While the discipline of bioinformatics focuses on the analysis of molecular sequences, genomics and functional genomics are two closely related disciplines. The goal of genomics is to determine and analyze the complete DNA sequence of an organism, that is, its genome. The DNA encodes genes can be expressed as ribonucleic acid (RNA) transcripts and then, in many cases, further translated into protein. Functional genomics describes the use of genome-wide assays to study gene and protein function. For humans and other species, it is now possible to characterize an individual’s genome, collection of RNA (transcriptome), proteome and even the collections of metabolites and epigenetic changes, and the catalog of organisms inhabiting the body (the microbiome) (Topol, 2014).
The aim of this book is to explain both the theory and practice of bioinformatics and genomics. The book is especially designed to help the biology student use computer programs and databases to solve biological problems related to proteins, genes, and genomes. Bioinformatics is an integrative discipline, and our focus on individual proteins and genes is part of a larger effort to understand broad issues in biology such as the relationship of structure to function, development, and disease. For the computer scientist, this book explains the motivations for creating and using algorithms and databases.
Organization of the Book
There are three main sections of the book. Part I (Chapters 2–7) explains how to access biological sequence data, particularly DNA and protein sequences (Chapter 2). Once sequences are obtained, we show how to compare two sequences (pairwise alignment; Chapter 3) and how to compare multiple sequences (primarily by the Basic Local Alignment Search Tool or BLAST; Chapters 4 and 5). We introduce multiple sequence alignment (Chapter 6) and show how multiply aligned proteins or nucleotides can be visualized in phylogenetic trees (Chapter 7). Chapter 7 therefore introduces the subject of molecular evolution.
Part II describes functional genomics approaches to DNA, RNA, and protein and the determination of gene function (Chapters 8–14). The central dogma of biology states that DNA is transcribed into RNA then translated into protein. Chapter 8 introduces chromosomes and DNA, while Chapter 9 describes next-generation sequencing technology (emphasizing practical data analysis). We next examine bioinformatic approaches to RNA (Chapter 10), including both noncoding and coding RNAs. We then describe the measurement of mRNA (i.e., gene expression profiling) using microarrays and RNA-seq. Again we focus on practical data analysis (Chapter 11). From RNA we turn to consider proteins from the perspective of protein families, and the analysis of individual proteins (Chapter 12) and protein structure (Chapter 13). We conclude the second part of the book with an overview of the rapidly developing field of functional genomics (Chapter 14),which integrates contemporary approaches to characterizing the genome, transcriptome, and proteome.
Part III covers genome analysis across the tree of life (Chapters 15–21). Since 1995, the genomes have been sequenced for several thousand viruses, bacteria, and archaea as well as eukaryotes such as fungi, animals, and plants. Chapter 15 provides an overview of the study of completed genomes. We describe bioinformatics resources for the study of viruses (Chapter 16) and bacteria and archaea (Chapter 17; these are two of the three main branches of life). Next we explore the genomes of a variety of eukaryotes including fungi (Chapter 18), organisms from parasites to primates (Chapter 19) and then the human genome (Chapter 20). Finally, we explore bioinformatic approaches to human disease (Chapter 21).
The third part of the book, spanning the tree of life from the perspective of genomics, depends strongly on the tools of bioinformatics from the first two parts of the book. I felt that this book would be incomplete if it introduced bioinformatics without also applying its tools and principles to the genomes of all life.
Bioinformatics: The Big Picture
We can summarize the fields of bioinformatics and genomics with three perspectives. The first perspective on bioinformatics is the cell (Fig. 1.1). Here we follow the central dogma. A focus of the field of bioinformatics is the collection of DNA (the genome), RNA (the transcriptome), and protein sequences (the proteome) that have been amassed. These millions–quadrillions of molecular sequences present both great opportunities and great challenges. A bioinformatics approach to molecular sequence data involves the...
| Erscheint lt. Verlag | 17.8.2015 |
|---|---|
| Sprache | englisch |
| Themenwelt | Informatik ► Weitere Themen ► Bioinformatik |
| Naturwissenschaften ► Biologie ► Genetik / Molekularbiologie | |
| Technik | |
| Schlagworte | Bioinformatics • Bioinformatics & Computational Biology • Bioinformatik • Bioinformatik u. Computersimulationen in der Biowissenschaften • Biowissenschaften • BLAST • Cell & Molecular Biology • Evolution • functional genomics • genomics • human disease • Life Sciences • medical genetics • Medical Science • Medizin • Medizinische Genetik • Next-generation sequencing • Phylogeny • Proteomics • Sequence Analysis • Zell- u. Molekularbiologie |
| ISBN-10 | 1-118-58176-8 / 1118581768 |
| ISBN-13 | 978-1-118-58176-6 / 9781118581766 |
| Informationen gemäß Produktsicherheitsverordnung (GPSR) | |
| Haben Sie eine Frage zum Produkt? |
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.