Zum Hauptinhalt springen
Nicht aus der Schweiz? Besuchen Sie lehmanns.de

Bioinformatics and Functional Genomics (eBook)

eBook Download: PDF
2015 | 3. Auflage
John Wiley & Sons (Verlag)
978-1-118-58169-8 (ISBN)

Lese- und Medienproben

Bioinformatics and Functional Genomics - Jonathan Pevsner
Systemvoraussetzungen
120,99 inkl. MwSt
(CHF 118,20)
Der eBook-Verkauf erfolgt durch die Lehmanns Media GmbH (Berlin) zum Preis in Euro inkl. MwSt.
  • Download sofort lieferbar
  • Zahlungsarten anzeigen

The bestselling introduction to bioinformatics and genomics - now in its third edition

Widely received in its previous editions, Bioinformatics and Functional Genomics offers the most broad-based introduction to this explosive new discipline. Now in a thoroughly updated and expanded third edition, it continues to be the go-to source for students and professionals involved in biomedical research.

This book provides up-to-the-minute coverage of the fields of bioinformatics and genomics. Features new to this edition include:

  • Extensive revisions and a slight reorder of chapters for a more effective organization
  • A brand new chapter on next-generation sequencing
  • An expanded companion website, also updated as and when new information becomes available
  • Greater emphasis on a computational approach, with clear guidance of how software tools work and introductions to the use of command-line tools such as software for next-generation sequence analysis, the R programming language, and NCBI search utilities

The book is complemented by lavish illustrations and more than 500 figures and tables - many newly-created for the third edition to enhance clarity and understanding. Each chapter includes learning objectives, a problem set, pitfalls section, boxes explaining key techniques and mathematics/statistics principles, a summary, recommended reading, and a list of freely available software. Readers may visit a related Web page for supplemental information such as PowerPoints and audiovisual files of lectures, and videocasts of how to perform many basic operations: www.wiley.com/go/pevsnerbioinformatics.

Bioinformatics and Functional Genomics, Third Edition serves as an excellent single-source textbook for advanced undergraduate and beginning graduate-level courses in the biological sciences and computer sciences. It is also an indispensable resource for biologists in a broad variety of disciplines who use the tools of bioinformatics and genomics to study particular research problems; bioinformaticists and computer scientists who develop computer algorithms and databases; and medical researchers and clinicians who want to understand the genomic basis of viral, bacterial, parasitic, or other diseases.



Jonathan Pevsner, PhD, is a Professor in the Department of Neurology at Kennedy Krieger Institute, an internationally recognized institution dedicated to improving the lives of children with neurodevelopmental disorders. He holds a primary faculty appointment as Professor in the Department of Psychiatry and Behavioral Sciences (Johns Hopkins University School of Medicine). He holds joint or secondary appointments in the Department of Neuroscience, the Institute of Genetic Medicine, and the Division of Health Sciences Informatics (Johns Hopkins School of Medicine), and the Department of Molecular Microbiology and Immunology (Johns Hopkins Bloomberg School of Public Health). He has taught bioinformatics courses since 2000 at the Johns Hopkins School of Medicine, and was awarded Teacher of the Year honors by the Graduate Student Association in both 2001 and 2006, the Professors' Award for Excellence in Teaching awarded by the medical faculty (2003), Teacher of the Year (Advanced Academic Programs, 2009), and Teaching Excellence Award in the Johns Hopkins Bloomberg School of Public Health (2011). In 2013 his lab used whole genome sequencing and reported a mutation that causes a rare disease, Sturge-Weber syndrome, as well as a commonly occurring port-wine stain birthmark.


The bestselling introduction to bioinformatics and genomics now in its third edition Widely received in its previous editions, Bioinformatics and Functional Genomics offers the most broad-based introduction to this explosive new discipline. Now in a thoroughly updated and expanded third edition, it continues to be the go-to source for students and professionals involved in biomedical research. This book provides up-to-the-minute coverage of the fields of bioinformatics and genomics. Features new to this edition include: Extensive revisions and a slight reorder of chapters for a more effective organization A brand new chapter on next-generation sequencing An expanded companion website, also updated as and when new information becomes available Greater emphasis on a computational approach, with clear guidance of how software tools work and introductions to the use of command-line tools such as software for next-generation sequence analysis, the R programming language, and NCBI search utilities The book is complemented by lavish illustrations and more than 500 figures and tables - many newly-created for the third edition to enhance clarity and understanding. Each chapter includes learning objectives, a problem set, pitfalls section, boxes explaining key techniques and mathematics/statistics principles, a summary, recommended reading, and a list of freely available software. Readers may visit a related Web page for supplemental information such as PowerPoints and audiovisual files of lectures, and videocasts of how to perform many basic operations: www.wiley.com/go/pevsnerbioinformatics. Bioinformatics and Functional Genomics, Third Edition serves as an excellent single-source textbook for advanced undergraduate and beginning graduate-level courses in the biological sciences and computer sciences. It is also an indispensable resource for biologists in a broad variety of disciplines who use the tools of bioinformatics and genomics to study particular research problems; bioinformaticists and computer scientists who develop computer algorithms and databases; and medical researchers and clinicians who want to understand the genomic basis of viral, bacterial, parasitic, or other diseases.

Jonathan Pevsner, PhD, is a Professor in the Department of Neurology at Kennedy Krieger Institute, an internationally recognized institution dedicated to improving the lives of children with neurodevelopmental disorders. He holds a primary faculty appointment as Professor in the Department of Psychiatry and Behavioral Sciences (Johns Hopkins University School of Medicine). He holds joint or secondary appointments in the Department of Neuroscience, the Institute of Genetic Medicine, and the Division of Health Sciences Informatics (Johns Hopkins School of Medicine), and the Department of Molecular Microbiology and Immunology (Johns Hopkins Bloomberg School of Public Health). He has taught bioinformatics courses since 2000 at the Johns Hopkins School of Medicine, and was awarded Teacher of the Year honors by the Graduate Student Association in both 2001 and 2006, the Professors' Award for Excellence in Teaching awarded by the medical faculty (2003), Teacher of the Year (Advanced Academic Programs, 2009), and Teaching Excellence Award in the Johns Hopkins Bloomberg School of Public Health (2011). In 2013 his lab used whole genome sequencing and reported a mutation that causes a rare disease, Sturge-Weber syndrome, as well as a commonly occurring port-wine stain birthmark.

Bioinformatics and Functional Genomics 3
Contents in Brief 9
Contents 11
Preface to the third edition 33
About the Companion Website 35
PART I Analyzing DNA, RNA, and Protein Sequences 37
1 Introduction 39
Organization of the Book 40
Bioinformatics: The Big Picture 41
A Consistent Example: Globins 42
Organization of the Chapters 44
Suggestions For Students and Teachers: Web Exercises, Find-a-Gene, and Characterize-a-Genome 45
Bioinformatics Software: Two Cultures 46
Web-Based Software 47
Command-Line Software 47
Bridging the Two Cultures 48
New Paradigms for Learning Programming for Bioinformatics 49
Reproducible Research in Bioinformatics 50
Bioinformatics and Other Informatics Disciplines 51
Advice for Students 51
Suggested Reading 51
References 52
2 Access to Sequence Data and Related Information 55
Introduction to Biological Databases 55
Centralized Databases Store DNA Sequences 56
Contents of DNA, RNA, and Protein Databases 60
Organisms in GenBank/EMBL-Bank/DDBJ 60
Types of Data in GenBank/EMBL-Bank/DDBJ 62
Genomic DNA Databases 63
DNA-Level Data: Sequence-Tagged Sites (STSs) 63
DNA-Level Data: Genome Survey Sequences (GSSs) 63
DNA-Level Data: High-Throughput Genomic Sequence (HTGS) 63
RNA data 63
RNA-Level Data: cDNA Databases Corresponding to Expressed Genes 63
RNA-Level Data: Expressed Sequence Tags (ESTs) 64
RNA-Level Data: UniGene 64
Access to Information: Protein Databases 65
UniProt 67
Central Bioinformatics Resources: NCBI and EBI 67
Introduction to NCBI 67
The European Bioinformatics Institute (EBI) 68
Ensembl 70
Access to Information: Accession Numbers to Label and Identify Sequences 70
The Reference Sequence (RefSeq) Project 72
RefSeqGene and the Locus Reference Genomic Project 73
The Consensus Coding Sequence CCDS Project 73
The Vertebrate Genome Annotation (VEGA) Project 73
Access to Information via Gene Resource at NCBI 74
Relationship Between NCBI Gene, Nucleotide, and Protein Resources 77
Comparison of NCBI’s Gene and UniGene 77
NCBI’s Gene and HomoloGene 78
Command-Line Access to Data at NCBI 78
Using Command-Line Software 78
Accessing NCBI Databases with EDirect 81
EDirect Example 1 82
EDirect Example 2 82
EDirect Example 3 82
EDirect Example 4 83
EDirect Example 5 84
EDirect Example 6 84
EDirect Example 7 84
Access to Information: Genome Browsers 85
Genome Builds 85
The University of California, Santa Cruz (UCSC) Genome Browser 86
The Ensembl Genome Browser 86
The Map Viewer at NCBI 88
Examples of How to Access Sequence Data: Individual Genes/Proteins 88
Histones 88
HIV-1 pol 89
How to Access Sets of Data: Large-Scale Queries of Regions and Features 90
Thinking About One Gene (or Element) Versus Many Genes (Elements) 90
The BioMart Project 90
Using the UCSC Table Browser 90
Custom Tracks: Versatility of the BED File 92
Galaxy: Reproducible, Web-Based, High-Throughput Research 93
Access to Biomedical Literature 94
Example of PubMed Search 95
Perspective 95
Pitfalls 96
Advice for Students 96
Web Resources 96
Discussion Questions 97
Problems/Computer Lab 97
Self-Test Quiz 99
Suggested Reading 100
References 100
3 Pairwise Sequence Alignment 105
Introduction 105
Protein Alignment: Often More Informative than DNA Alignment 106
Definitions: Homology, Similarity, Identity 106
Gaps 114
Pairwise Alignment, Homology, and Evolution of Life 114
Scoring Matrices 115
Dayhoff Model Step 1 (of 7): Accepted Point Mutations 115
Dayhoff Model Step 2 (of 7): Frequency of Amino Acids 115
Dayhoff Model Step 3 (of 7): Relative Mutability of Amino Acids 116
Dayhoff Model Step 4 (of 7): Mutation Probability Matrix for the Evolutionary Distance of 1 PAM 118
Dayhoff Model Step 5 (of 7): PAM250 and Other PAM Matrices 120
Dayhoff Model Step 6 (of 7): From a Mutation Probability Matrix to a Relatedness Odds Matrix 124
Dayhoff Model Step 7 (of 7): Log-Odds Scoring Matrix 125
Practical Usefulness of PAM Matrices in Pairwise Alignment 127
Important Alternative to PAM: BLO SUM Scoring Matrices 127
Pairwise Alignment and Limits of Detection: The “Twilight Zone” 130
Alignment Algorithms: Global and Local 132
Global Sequence Alignment: Algorithm of Needleman and Wunsch 132
Step 1: Setting Up a Matrix 132
Step 2: Scoring the Matrix 133
Step 3: Identifying the Optimal Alignment 135
Local Sequence Alignment: Smith and Waterman Algorithm 137
Rapid, Heuristic Versions of Smith–Waterman: FASTA and BLAST 139
Basic Local Alignment Search Tool (BLAST) 140
Pairwise Alignment with Dotplots 140
The Statistical Significance of Pairwise Alignments 142
Statistical Significance of Global Alignments 142
Statistical Significance of Local Alignments 144
Percent Identity and Relative Entropy 144
Perspective 146
Pitfalls 148
Advice for Students 148
Web Resources 148
Discussion Questions 149
Problems/Computer Lab 149
Self-Test Quiz 150
Suggested Reading 151
References 152
4 Basic Local Alignment Search Tool (BLAST) 157
Introduction 157
BLAST Search Steps 160
Step 1: Specifying Sequence of Interest 160
Step 2: Selecting BLAST Program 160
Step 3: Selecting a Database 162
Step 4a: Selecting Optional Search Parameters 163
Step 4b: Selecting Formatting Parameters 168
Stand-Alone BLAST 171
BLAST Algorithm Uses Local Alignment Search Strategy 174
BLAST Algorithm Parts: List, Scan, Extend 174
BLAST Algorithm: Local Alignment Search Statistics and E Value 177
Making Sense of Raw Scores with Bit Scores 179
BLAST Algorithm: Relation Between E and p Values 179
BLAST Search Strategies 181
General Concepts 181
Principles of BLAST Searching 182
How to Evaluate the Significance of Results 182
How to Handle Too Many Results 186
How to Handle Too Few Results 186
BLAST Searching with Multidomain Protein: HIV-1 Pol 187
Using Blast For Gene Discovery: Find-a-Gene 191
Perspective 195
Pitfalls 196
Advice for Students 196
Web Resources 196
Discussion Questions 196
Problems/Computer Lab 196
Self-Test Quiz 197
Suggested Reading 198
References 199
5 Advanced Database Searching 203
Introduction 203
Specialized BLAST Sites 204
Organism-Specific BLAST Sites 204
Ensembl BLAST 204
Wellcome Trust Sanger Institute 206
Specialized BLAST-Related Algorithms 206
WU BLAST 2.0 206
European Bioinformatics Institute (EBI) 206
Specialized NCBI BLAST Sites 206
BLAST of Next-Generation Sequence Data 206
Finding Distantly Related Proteins: Position-Specific Iterated BLAST (PSI-BLAST) and DELT A-BLAST 207
PSI-BLAST Errors: Problem of Corruption 213
Reverse Position-Specific BLAST 213
Domain Enhanced Lookup Time Accelerated BLAST (DELT A-BLAST) 213
Assessing Performance of PSI-BLAST and DELT A-BLAST 215
Pattern-Hit Initiated BLAST (PHI-BLAST) 215
Profile Searches: Hidden Markov Models 217
HMMER Software: Command-Line and Web-Based 220
BLAST-Like Alignment Tools to Search Genomic DNA Rapidly 222
Benchmarking to Assess Genomic Alignment Performance 223
PatternHunter: Nonconsecutive Seeds Boost Sensitivity 224
BLASTZ 224
Enredo and Pecan 227
MegaBLAST and Discontinuous MegaBLAST 227
BLAST-Like Tool (BLAT) 228
LAGAN 228
SSAHA2 230
Aligning Next-Generation Sequence (NGS) Reads to a Reference Genome 230
Alignment Based on Hash Tables 230
Alignment Based on the Burrows–Wheeler Transform 232
Perspective 233
Pitfalls 233
Advice For Students 234
Web Resources 234
Discussion Questions 234
Problems/Computer Lab 234
Self-Test Quiz 235
Suggested Reading 236
References 237
6 Multiple Sequence Alignment 241
Introduction 241
Definition of Multiple Sequence Alignment 242
Typical Uses and Practical Strategies of Multiple Sequence Alignment 243
Benchmarking: Assessment of Multiple Sequence Alignment Algorithms 243
Five Main Approaches to Multiple Sequence Alignment 244
Exact Approaches to Multiple Sequence Alignment 244
Progressive Sequence Alignment 244
Iterative Approaches 250
Consistency-Based approaches 254
Structure-Based Methods 256
Benchmarking Studies: Approaches, Findings, Challenges 257
Databases of Multiple Sequence Alignments 258
Pfam: Protein Family Database of Profile HMMs 259
SMART 260
Conserved Domain Database 262
Integrated Multiple Sequence Alignment Resources: InterPro and iProClass 262
Multiple Sequence Alignment Database Curation: Manual Versus Automated 263
Multiple Sequence Alignments of Genomic Regions 263
Analyzing Genomic DNA Alignments via UCSC 265
Analyzing Genomic DNA Alignments via Galaxy 265
Analyzing Genomic DNA Alignments via Ensembl 267
Alignathon Competition to Assess Whole-Genome Alignment Methods 267
Perspective 270
Pitfalls 270
Advice for Students 271
Discussion Questions 271
Problems/Computer Lab 271
Self-Test Quiz 273
Suggested Reading 274
References 275
7 Molecular Phylogeny and Evolution 281
Introduction to Molecular Evolution 281
Principles of Molecular Phylogeny and Evolution 282
Goals of Molecular Phylogeny 282
Historical Background 283
Molecular Clock Hypothesis 286
Positive and Negative Selection 290
Neutral Theory of Molecular Evolution 294
Molecular Phylogeny: Properties of Trees 295
Topologies and Branch Lengths of Trees 295
Tree Roots 298
Enumerating Trees and Selecting Search Strategies 299
Type of Trees 302
Species Trees versus Gene/Protein Trees 302
DNA, RNA, or Protein-Based Trees 304
Five Stages of Phylogenetic Analysis 306
Stage 1: Sequence Acquisition 306
Stage 2: Multiple Sequence Alignment 307
Stage 3: Models of DNA and Amino Acid Substitution 308
Stage 4: Tree-Building Methods 317
Distance-Based 318
Phylogenetic Inference: Maximum Parsimony 323
Model-Based Phylogenetic Inference: Maximum Likelihood 325
Tree Inference: Bayesian Methods 326
Stage 5: Evaluating Trees 329
Perspective 331
Pitfalls 331
Advice for Students 332
Web Resources 333
Discussion Questions 333
Problems/Computer Lab 333
Self-Test Quiz 334
Suggested Reading 334
References 335
PART II Genomewide Analysis of DNA, RNA, and Protein 341
8 DNA: The Eukaryotic Chromosome 343
Introduction 344
Major Differences between Eukaryotes and Bacteria and Archaea 344
General Features of Eukaryotic Genomes and Chromosomes 346
C Value Paradox: Why Eukaryotic Genome Sizes Vary So Greatly 348
Organization of Eukaryotic Genomes into Chromosomes 346
Analysis of Chromosomes Using Genome Browsers 350
Analysis of Chromosomes Using BioMart and biomaRt 350
Example 1 353
Example 2 355
Example 3 355
Example 4 355
Example 5 356
Analysis of Chromosomes by the ENCODE Project 356
Critiques of ENCODE: the C Value Paradox Revisited and the Definition of Function 358
Repetitive DNA Content of Eukaryotic Chromosomes 359
Eukaryotic Genomes Include Noncoding and Repetitive DNA Sequences 359
Interspersed Repeats (Transposon-Derived Repeats) 361
Processed Pseudogenes 362
Simple Sequence Repeats 367
Segmental Duplications 367
Blocks of Tandemly Repeated Sequences 369
Gene Content of Eukaryotic Chromosomes 370
Definition of Gene 370
Finding Genes in Eukaryotic Genomes 372
Finding Genes in Eukaryotic Genomes: EGASP Competition 375
Three Resources for Studying Protein-Coding Genes: RefSeq, UCSC Genes, GENCODE 376
Protein-Coding Genes in Eukaryotes: New Paradox 378
Regulatory Regions of Eukaryotic Chromosomes 378
Databases of Genomic Regulatory Factors 378
Ultraconserved Elements 381
Nonconserved Elements 381
Comparison of Eukaryotic DNA 382
Variation in Chromosomal DNA 383
Dynamic Nature of Chromosomes: Whole-Genome Duplication 383
Chromosomal Variation in Individual Genomes 385
Structural Variation: Six Types 387
Inversions 387
Mechanisms of Creating Duplications, Deletions, and Inversions 387
Models for Creating Gene Families 389
Chromosomal Variation in Individual Genomes: SNPs 390
Techniques to Measure Chromosomal Change 391
Array Comparative Genomic Hybridization 392
SNP Microarrays 392
Next-Generation Sequencing 395
Perspective 395
Pitfalls 395
Advice to Students 396
Web Resources 396
Discussion Questions 397
Problems/Computer Lab 397
Self-Test Quiz 400
Suggested Reading 401
References 402
9 Analysis of Next-Generation Sequence Data 413
Introduction 414
DNA Sequencing Technologies 413
Sanger Sequencing 415
Next-Generation Sequencing 415
Cyclic Reversible Termination: Illumina 418
Pyrosequencing 420
Sequencing by Ligation: Color Space with ABI SOLiD 421
Ion Torrent: Genome Sequencing by Measuring pH 423
Pacific Biosciences: Single-Molecule Sequencing with Long Read Lengths 423
Complete Genomics: Self-Assembling DNA Nanoarrays 423
Analysis of Next-Generation Sequencing of Genomic DNA 423
Overview of Next-Generation Sequencing Data Analysis 423
Topic 1: Experimental Design and Sample Preparation 425
Topic 2: From Generating Sequence Data to FASTQ 426
Finding and Viewing FASTQ files 428
Quality Assessment of FASTQ data 429
FASTG: A Richer Format than FASTQ 430
Topic 3: Genome Assembly 430
Competitions and Critical Evaluations of the Performance of Genome Assemblers 432
The End of Assembly: Standards for Completion 434
Topic 4: Sequence Alignment 435
Alignment of Repetitive DNA 436
Genome Analysis Toolkit (GATK) Workflow: Alignment with BWA 437
Topic 5: The SAM/BAM Format and SAMtools 438
Calculating Read Depth 441
Finding and Viewing BAM/SAM files 441
Compressed Alignments: CRAM File Format 442
Topic 6: Variant Calling: Single-Nucleotide Variants and Indels 444
Topic 7: Variant Calling: Structural Variants 445
Topic 8: Summarizing Variation: The VCF Format and VCFtools 446
Finding and Viewing VCF files 449
Topic 9: Visualizing and Tabulating Next-Generation Sequence Data 449
Topic 10: Interpreting the Biological Significance of Variants 453
Topic 11: Storing Data in Repositories 457
Specialized Applications of Next-Generation Sequencing 457
Perspective 458
Pitfalls 459
Advice for Students 459
Web Resources 460
Discussion Questions 460
Problems/Computer Lab 460
Self-Test Quiz 461
Suggested Reading 461
References 461
10 Bioinformatic Approaches to Ribonucleic Acid (RNA) 469
Introduction to RNA 469
Noncoding RNA 472
Noncoding RNAs in the Rfam Database 472
Transfer RNA 474
Ribosomal RNA 477
Small Nuclear RNA 481
Small Nucleolar RNA 481
MicroRNA 481
Short Interfering RNA 483
Long Noncoding RNA (lncRNA) 483
Other Noncoding RNA 484
Noncoding RNAs in the UCSC Genome and Table Browser 484
Introduction to Messenger RNA 486
mRNA: Subject of Gene Expression Studies 486
Low- and High-Throughput Technologies to Study mRNAs 488
Analysis of Gene Expression in cDNA Libraries 491
Full-Length cDNA Projects 495
BodyMap2 and GTEx: Measuring Gene Expression Across the Body 495
Microarrays and RNA-Seq: Genome-Wide Measurement of Gene Expression 496
Stage 1: Experimental Design for Microarrays and RNA-seq 497
Stage 2: RNA Preparation and Probe Preparation 497
Stage 3: Data Acquisition 500
Hybridization of Labeled Samples to DNA Microarrays 500
Data acquisition for RNA-seq 501
Stage 4: Data Analysis 501
Stage 5: Biological Confirmation 501
Microarray and RNA-seq Databases 501
Further Analyses 501
Interpretation of RNA Analyses 502
The Relationship between DNA, mRNA, and Protein Levels 502
The Pervasive Nature of Transcription 503
eQTLs: Understanding the Genetic Basis of Variation in Gene Expression through Combined RNA-seq and DNA-seq 504
Perspective 505
Pitfalls 506
Advice to Students 506
Web Resources 506
Discussion Questions 507
Problems/Computer Lab 507
Self-Test Quiz 507
Suggested Reading 508
References 509
11 Gene Expression: Microarray and RNA-seq Data Analysis 515
Introduction 515
Microarray Analysis Method 1: GEO2R at NCBI 518
GEO2R Executes a Series of R Scripts 518
GEO2R Identifies the Chromosomal Origin of Regulated Transcripts 521
GEO2R Normalizes Data 522
GEO2R uses RMA Normalization for Accuracy and Precision 524
Fold Change (Expression Ratios) 526
GEO2R Performs > 22,000 Statistical Tests
GEO2R Offers Corrections for Multiple Comparisons 530
Microarray Analysis Method 2: Partek 531
Importing Data 532
Quality Control 532
Adding Sample Information 533
Sample Histogram 534
Scatter Plots and MA Plots 534
Working with Log2 Transformed Microarray Data 534
Exploratory Data Analysis with Principal Components Analysis (PCA) 534
Performing ANOVA in Partek 537
From t-test to ANOVA 539
Microarray Analysis Method 3: Analyzing a GEO Dataset with R 540
Setting up the Analyses 540
Reading CEL Files and Normalizing with RMA 542
Identifying Differentially Expressed Genes (Limma) 544
Microarray Analysis and Reproducibility 546
Microarray Data Analysis: Descriptive Statistics 547
Hierarchical Cluster Analysis of Microarray Data 547
Partitioning Methods for Clustering: k-Means Clustering 552
Multidimensional Scaling Compared to Principal Components Analysis 553
Clustering Strategies: Self-Organizing Maps 553
Classification of Genes or Samples 553
RNA-Seq 555
Setting up a TopHat and CuffLinks Sample Protocol 559
TopHat to Map Reads to a Reference Genome 560
Cufflinks to Assemble Transcripts 561
Cuffdiff to Determine Differential Expression 561
CummeRbund to Visualize RNA-seq Results 562
RNA-seq Genome Annotation Assessment Project (RGASP) 563
Functional Annotation of Microarray Data 564
Perspective 565
Pitfalls 566
Advice for Students 567
Discussion Questions 567
Problems/Computer Lab 568
Self-Test Quiz 568
Suggested Reading 569
References 570
12 Protein Analysis and Proteomics 575
Introduction 575
Protein Databases 576
Community Standards for Proteomics Research 578
Evaluating the State-of-the-Art: ABRF analytic challenges 578
Techniques for Identifying Proteins 579
Direct Protein Sequencing 579
Gel Electrophoresis 579
Mass Spectrometry 583
Four Perspectives on Proteins 587
Perspective 1: Protein Domains and Motifs: Modular Nature of Proteins 588
Added Complexity of Multidomain Proteins 592
Protein Patterns: Motifs or Fingerprints Characteristic of Proteins 593
Perspective 2: Physical Properties of Proteins 595
Accuracy of Prediction Programs 600
Proteomic Approaches to Phosphoryation 600
Proteomic Approaches to Transmembrane Regions 601
Introduction to Perspectives 3 and 4: Gene Ontology Consortium 602
Perspective 3: Protein Localization 606
Perspective 4: Protein Function 606
Perspective 609
Pitfalls 610
Advice for Students 610
Web Resources 612
Discussion Questions 614
Problems/Computer Lab 614
Self-Test Quiz 615
Suggested Reading 616
References 616
13 Protein Structure 625
Overview of Protein Structure 625
Protein Sequence and Structure 626
Biological Questions Addressed by Structural Biology: Globins 627
Principles of Protein Structure 627
Primary Structure 627
Secondary Structure 630
Tertiary Protein Structure: Protein-Folding Problem 634
Structural Genomics, the Protein Structure Initiative, and Target Selection 636
Protein Data Bank 638
Accessing PDB Entries at NCBI Website 642
Integrated Views of Universe of Protein Folds 645
Taxonomic System for Protein Structures: SCOP Database 646
CATH Database 649
Dali Domain Dictionary 651
Comparison of Resources 653
Protein Structure Prediction 653
Homology Modeling (Comparative Modeling) 654
Fold Recognition (Threading) 655
Ab Initio Prediction (Template-Free Modeling) 657
A Competition to Assess Progress in Structure Prediction 657
Intrinsically Disordered Proteins 658
Protein Structure and Disease 658
Perspective 661
Pitfalls 661
Advice for Students 661
Discussion Questions 661
Problems/Computer Lab 662
Self-Test Quiz 663
Suggested Reading 664
References 664
14 Functional Genomics 671
Introduction to Functional Genomics 671
The Relationship Between Genotype and Phenotype 673
Eight-Model Organisms For Functional Genomics 674
1. The Bacterium Escherichia coli 675
2. The Yeast Saccharomyces cerevisiae 676
3. The Plant Arabidopsis thaliana 679
4. The Nematode Caenorhabditis elegans 679
5. The Fruit Fly Drosophila melanogaster 681
6. The Zebrafish Danio rerio 681
7. The Mouse Mus musculus 682
8. Homo sapiens: Variation in Humans 683
Functional Genomics Using Reverse and Forward Genetics 684
Reverse Genetics: Mouse Knockouts and the ß-Globin Gene 686
Reverse Genetics: Knocking Out Genes in Yeast Using Molecular Barcodes 689
Reverse Genetics: Random Insertional Mutagenesis (Gene Trapping) 693
Reverse Genetics: Insertional Mutagenesis in Yeast 696
Reverse Genetics: Gene Silencing by Disrupting RNA 698
Forward Genetics: Chemical Mutagenesis 701
Comparison of Reverse and Forward Genetics 701
Functional Genomics and the Central Dogma 702
Approaches to Function and Definitions of Function 682
Functional Genomics and DNA: Integrating Information 704
Functional Genomics and RNA 704
Functional Genomics and Protein 706
Proteomics Approaches to Functional Genomics 706
Functional Genomics and Protein: Critical Assessment of Protein Function Annotation 708
Protein–Protein Interactions 708
Yeast Two-Hybrid System 709
Protein Complexes: Affinity Chromatography and Mass Spectrometry 711
Protein–Protein Interaction Databases 712
From Pairwise Interactions to Protein Networks 714
Assessment of Accuracy 716
Choice of Data 716
Experimental Organism 716
Variation in Pathways 717
Categories of Maps 717
Pathways, Networks, and Integration: Bioinformatics Resources 718
Perspective 721
Pitfalls 722
Advice for Students 722
Web Resources 722
Discussion Questions 722
Problems/Computer Lab 722
Self-Test Quiz 723
Suggested Reading 724
References 724
PART III Genome Analysis 733
15 Genomes Across the Tree of Life 735
Introduction 736
Five Perspectives on Genomics 737
Brief History of Systematics 737
History of Life on Earth 741
Molecular Sequences as the Basis of the Tree of Life 741
Role of Bioinformatics in Taxonomy 745
Prominent Web Resources 746
Ensembl Genomes 746
NCBI Genome 746
Genome Portal of DOE JGI and the Integrated Microbial Genomes 746
Genomes On Line Database (GOLD) 746
UCSC 746
Genome-Sequencing Projects: Chronology 747
Brief Chronology 747
1976–1978: First Bacteriophage and Viral Genomes 747
1981: First Eukaryotic Organellar Genome 748
1986: First Chloroplast Genomes 750
1992: First Eukaryotic Chromosome 751
1995: Complete Genome of Free-Living Organism 751
1996: First Eukaryotic Genome 751
1997: Escherichia coli 751
1998: First Genome of Multicellular Organism 752
1999: Human Chromosome 752
2000: Fly, Plant, and Human Chromosome 21 752
2001: Draft Sequences of Human Genome 752
2002: Continuing Rise in Completed Genomes 753
2003: HapMap 753
2004: Chicken, Rat, and Finished Human Sequences 753
2005: Chimpanzee, Dog, Phase I HapMap 754
2006: Sea Urchin, Honeybee, dbGaP 754
2007: Rhesus Macaque, First Individual Human Genome, ENCODE Pilot 754
2008: Platypus, First Cancer Genome, First Personal Genome Using NGS 754
2009: Bovine, First Human Methlyome Map 754
2010: 1000 Genomes Pilot, Neandertal , Exome Sequencing to Find Disease Genes 755
2011: A Vision for the Future of Genomics 755
2012: Denisovan Genome, Bonobo, and 1000 Genomes Project 755
2013: The Simplest Animal and a 700,000-Year-Old Horse 755
2014: Mouse ENCODE, Primates, Plants, and Ancient Hominids 755
2015: Diversity in Africa 756
Genome Analysis Projects: Introduction 756
Large-Scale Genomics Projects 757
Criteria for Selection of Genomes for Sequencing 758
Genome Size 758
Cost 758
Relevance to Human Disease 759
Relevance to Basic Biological Questions 760
Relevance to Agriculture 760
Sequencing of One Versus Many Individuals from a Species 760
Role of Comparative Genomics 760
Resequencing Projects 761
Ancient DNA Projects 761
Metagenomics Projects 761
Genome Analysis Projects: Sequencing 764
Genome-Sequencing Centers 764
Trace Archive: Repository for Genome Sequence Data 764
HTGS Archive: Repository for Unfinished Genome Sequence Data 766
Genome Analysis Projects: Assembly 766
Four Approaches to Genome Assembly 766
Genome Assembly: From FASTQ to Contigs with Velvet 769
Comparative Genome Assembly: Mapping Contigs to Known Genomes 770
Finishing: When Has a Genome Been Fully Sequenced? 771
Genome Assembly: Measures of Success 771
Genome Assembly: Challenges 771
Genome Analysis Projects: Annotation 773
Annotation of Genes in Eukaryotes: Ensembl Pipeline 774
Annotation of Genes in Eukaryotes: NCBI Pipeline 775
Core Eukaryotic Genes Mapping Approach (CEGMA) 775
Assemblies from the Genome Reference Consortium 777
Assembly Hubs and Transfers at UCSC, Ensembl, and NCBI 777
Annotation of Genes in Bacteria and Archaea 777
Genome Annotation Standards 777
Perspective 778
Pitfalls 778
Advice for Students 779
Discussion Questions 779
Problems/Computer Lab 779
Self-Test Quiz 781
Suggested Reading 779
References 781
16 Completed Genomes: Viruses 791
Introduction 791
International Committee on Taxonomy of Viruses (ICTV) and Virus Species 792
Classification of Viruses 794
Classification of Viruses Based on Morphology 794
Classification of Viruses Based on Nucleic Acid Composition 794
Classification of Viruses Based on Genome Size 794
Classification of Viruses Based on Disease Relevance 796
Diversity and Evolution of Viruses 798
Metagenomics and Virus Diversity 800
Bioinformatics Approaches to Problems in Virology 801
Human Immunodeficiency Virus (HIV) 802
NCBI and LANL resources for HIV-1 802
Influenza Virus 807
Measles Virus 810
Ebola Virus 811
Herpesvirus: From Phylogeny to Gene Expression 812
The Pairwise Sequence Comparison (PASC) Tool 816
Giant Viruses 818
Comparing genomes with MUMmer 819
Perspectives 821
Pitfalls 822
Advice for Students 822
Web Resources 822
Discussion Questions 823
Problems/Computer Lab 823
Self-Test Quiz 824
Suggested Reading 825
References 825
17 Completed Genomes: Bacteria and Archaea 833
Introduction 833
Classification of Bacteria and Archaea 834
Classification of Bacteria by Morphological Criteria 836
Classification of Bacteria and Archaea Based on Genome Size and Geometry 837
Classification of Bacteria and Archaea Based on Lifestyle 841
Classification of Bacteria Based on Human Disease Relevance 844
Classification of Bacteria and Archaea Based on Ribosomal RNA Sequences 845
Classification of Bacteria and Archaea Based on Other Molecular Sequences 846
The Human Microbiome 847
Analysis of Bacterial and Archaeal Genomes 850
Nucleotide Composition 853
Finding Genes 855
Interpolated Context Model (ICM) 858
GLIMMER3 860
Challenges of Bacterial and Archaeal Gene Prediction 861
Gene Annotation 861
Lateral Gene Transfer 863
Comparison of Bacterial Genomes 866
TaxPlot 866
MUMmer 869
Perspective 870
Pitfalls 871
Advice for Students 871
Web Resources 871
Discussion Questions 872
Problems/Computer Lab 872
Self-Test Quiz 872
Suggested Reading 873
References 873
18 Eukaryotic Genomes: Fungi 883
Introduction 883
Description and Classification of Fungi 884
Introduction to Budding Yeast Saccharomyces Cerevisiae 885
Sequencing Yeast Genome 887
Features of Budding Yeast Genome 887
Exploring Typical Yeast Chromosome 890
Web Resources for Analyzing a Chromosome 890
Exploring Variation in a Chromosome with Command-Line Tools 893
Finding Genes in a Chromosome with Command-Line Tools 894
Properties of Yeast Chromosome XII 896
Gene Duplication and Genome Duplication of S. cerevisiae 896
Comparative Analyses of Hemiascomycetes 901
Comparative Analyses of Whole-Genome Duplication 902
Identification of Functional Elements 904
Analysis of Fungal Genomes 905
Fungi in the Human Microbiome 906
Aspergillus 907
Candida albicans 907
Cryptococcus neoformans: model fungal pathogen 908
Atypical Fungus: Microsporidial Parasite Encephalitozoon cuniculi 909
Neurospora crassa 909
First Basidiomycete: Phanerochaete chrysosporium 911
Fission Yeast Schizosaccharomyces pombe 911
Other Fungal Genomes 912
Ten Leading Fungal Plant Pathogens 912
Perspective 912
Pitfalls 913
Advice for Students 913
Web Resources 913
Discussion Questions 913
Problems/Computer Lab 914
Self-Test Quiz 915
Suggested Reading 916
References 916
19 Eukaryotic Genomes: From Parasites to Primates 923
Introduction 923
Protozoans at Base of Tree Lacking Mitochondria 926
Trichomonas 926
Giardia lamblia: A Human Intestinal Parasite 927
Genomes of Unicellular Pathogens: Trypanosomes and Leishmania 926
Trypanosomes 928
Leishmania 930
The Chromalveolates 931
Malaria Parasite Plasmodium falciparum 931
More Apicomplexans 934
Astonishing Ciliophora: Paramecium and Tetrahymena 935
Nucleomorphs 938
Kingdom Stramenopila 940
Plant Genomes 942
Overview 942
Green Algae (Chlorophyta) 944
Arabidopsis thaliana Genome 946
The Second Plant Genome: Rice 949
Third Plant: Poplar 950
Fourth Plant: Grapevine 951
Giant and Tiny Plant Genomes 951
Hundreds More Land Plant Genomes 951
Moss 952
Slime and Fruiting Bodies at the Feet of Metazoans 952
Social Slime Mold Dictyostelium discoideum 952
Metazoans 953
Introduction to Metazoans 953
900 MYA: the Simple Animal Caenorhabditis elegans 954
900 MYA: Drosophila melanogaster (First Insect Genome) 955
900 MYA: Anopheles gambiae (Second Insect Genome) 957
900 MYA: Silkworm and Butterflies 958
900 MYA: Honeybee 959
900 MYA: A Swarm of Insect Genomes 959
840 MYA: A Sea Urchin on the Path to Chordates 960
800 MYA: Ciona intestinalis and the Path to Vertebrates 961
450 MYA: Vertebrate Genomes of Fish 962
350 MYA: Frogs 965
320 MYA: Reptiles (Birds, Snakes, Turtles, Crocodiles) 965
180 MYA: The Platypus and Opposum Genomes 967
100 MYA: Mammalian Radiation from Dog to Cow 969
80 MYA: The Mouse and Rat 970
5–50 MYA: Primate Genomes 973
Perspective 976
Pitfalls 977
Advice for Students 977
Web Resources 978
Discussion Questions 978
Problems/Computer Lab 978
Self-Test Quiz 979
Suggested Reading 980
References 980
20 Human Genome 993
Introduction 993
Main Conclusions of Human Genome Project 994
Gateways to Access the Human Genome 995
NCBI 995
Ensembl 995
University of California at Santa Cruz Human Genome Browser 997
NHGRI 997
Wellcome Trust Sanger Institute 1000
Human Genome Project 1000
Background of Human Genome Project 1000
Strategic Issues: Hierarchical Shotgun Sequencing to Generate Draft Sequence 1002
Human Genome Assemblies 1002
Broad Genomic Landscape 1004
Long-Range Variation in GC Content 1005
CpG Islands 1005
Comparison of Genetic and Physical Distance 1006
Repeat Content of Human Genome 1007
Transposon-Derived Repeats 1008
Simple Sequence Repeats 1009
Segmental Duplications 1009
Gene Content of Human Genome 1010
Noncoding RNAs 1011
Protein-Coding Genes 1011
Comparative Proteome Analysis 1011
Complexity of Human Proteome 1014
25 Human Chromosomes 1015
Group A (Chromosomes 1–3) 1017
Group B (Chromosomes 4, 5) 1018
Group C (Chromosomes 6–12, X) 1019
Group D (Chromosomes 13–15) 1019
Group E (Chromosomes 16–18) 1020
Group F (Chromosomes 19, 20) 1020
Group G (Chromosomes 21, 22, Y) 1020
Mitochondrial Genome 1021
Human Genome Variation 1022
SNPs, Haplotypes, and HapMap 1022
Viewing and Analyzing SNPs and Haplotypes 1024
HaploView 1024
HapMap Browser 1024
Integrative Genomics Browser (IGV) 1024
NCBI dbSNP 1024
PLINK 1028
SNPduo 1026
Major Conclusions of HapMap Project 1030
The 1000 Genomes Project 1031
Variation: Sequencing Individual Genomes 1034
Perspective 1035
Pitfalls 1036
Advice for Students 1037
Discussion Questions 1037
Problems/Computer Lab 1037
Self-Test Quiz 1039
Suggested Reading 1040
References 1040
21 Human Disease 1047
Human Genetic Disease: A Consequence of DNA Variation 1047
A Bioinformatics Perspective on Human Disease 1048
Garrod’s View of Disease 1050
Classification of Disease 1051
NIH Disease Classification: MeSH Terms 1053
Categories of Disease 1056
Allele Frequencies and Effect Sizes 1056
Monogenic Disorders 1057
Complex Disorders 1060
Genomic Disorders 1061
Environmentally Caused Disease 1065
Disease and Genetic Background 1066
Mitochondrial Disease 1066
Somatic Mosaic Disease 1068
Cancer: A Somatic Mosaic Disease 1069
Disease Databases 1072
OMIM: Central Bioinformatics Resource for Human Disease 1072
Human Gene Mutation Database (HGMD) 1075
ClinVar and Databases of Clinically Relevant Variants 1076
GeneCards 1077
Integration of Disease Database Information at the UCSC Genome Browser 1077
Locus-Specific Mutation Databases and LO VD 1077
The PhenCode Project 1080
Limitations of Disease Databases: The Growing Interpretive Gap 1081
Human Disease Genes and Amino Acid Substitutions 1081
Approaches to Identifying Disease-Associated Genes and Loci 1082
Linkage Analysis 1083
Genome-Wide Association Studies 1083
Identification of Chromosomal Abnormalities 1086
Human Genome Sequencing 1087
Genome Sequencing to Identify Monogenic Disorders 1087
Genome Sequencing to Solve Complex Disorders 1087
Research Versus Clinical Sequencing and Incidental Findings 1088
Disease-causing Variants in Apparently Normal I ndividuals 1090
Human Disease Genes in Model Organisms 1091
Human Disease Orthologs in Nonvertebrate Species 1092
Human Disease Orthologs in Rodents 1094
Human Disease Orthologs in Primates 1095
Functional Classification of Disease Genes 1096
Perspective 1099
Pitfalls 1099
Advice for Students 1099
Discussion Questions 1100
Problems/Computer Lab 1098
Self-Test Quiz 1101
Suggested Reading 1102
References 1102
Glossary 1111
Self-Test Quiz: Solutions 1139
Author Index 1141
Subject Index 1145
EULA 1161

Erscheint lt. Verlag 17.8.2015
Sprache englisch
Themenwelt Informatik Weitere Themen Bioinformatik
Naturwissenschaften Biologie Genetik / Molekularbiologie
Technik
Schlagworte Bioinformatics • Bioinformatics & Computational Biology • Bioinformatik • Bioinformatik u. Computersimulationen in der Biowissenschaften • Biowissenschaften • BLAST • Cell & Molecular Biology • Evolution • functional genomics • genomics • human disease • Life Sciences • medical genetics • Medical Science • Medizin • Medizinische Genetik • Next-generation sequencing • Phylogeny • Proteomics • Sequence Analysis • Zell- u. Molekularbiologie
ISBN-10 1-118-58169-5 / 1118581695
ISBN-13 978-1-118-58169-8 / 9781118581698
Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?
PDFPDF (Adobe DRM)

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seiten­layout eignet sich die PDF besonders für Fach­bücher mit Spalten, Tabellen und Abbild­ungen. Eine PDF kann auf fast allen Geräten ange­zeigt werden, ist aber für kleine Displays (Smart­phone, eReader) nur einge­schränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.