Basic Applied Bioinformatics (eBook)
John Wiley & Sons (Verlag)
978-1-119-24441-7 (ISBN)
An accessible guide that introduces students in all areas of life sciences to bioinformatics
Basic Applied Bioinformatics provides a practical guidance in bioinformatics and helps students to optimize parameters for data analysis and then to draw accurate conclusions from the results. In addition to parameter optimization, the text will also familiarize students with relevant terminology. Basic Applied Bioinformatics is written as an accessible guide for graduate students studying bioinformatics, biotechnology, and other related sub-disciplines of the life sciences.
This accessible text outlines the basics of bioinformatics, including pertinent information such as downloading molecular sequences (nucleotide and protein) from databases; BLAST analyses; primer designing and its quality checking, multiple sequence alignment (global and local using freely available software); phylogenetic tree construction (using UPGMA, NJ, MP, ME, FM algorithm and MEGA7 suite), prediction of protein structures and genome annotation, RNASeq data analyses and identification of differentially expressed genes and similar advanced bioinformatics analyses. The authors Chandra Sekhar Mukhopadhyay, Ratan Kumar Choudhary, and Mir Asif Iquebal are noted experts in the field and have come together to provide an updated information on bioinformatics.
Salient features of this book includes:
- Accessible and updated information on bioinformatics tools
- A practical step-by-step approach to molecular-data analyses
- Information pertinent to study a variety of disciplines including biotechnology, zoology, bioinformatics and other related fields
- Worked examples, glossary terms, problems and solutions
Basic Applied Bioinformatics gives students studying bioinformatics, agricultural biotechnology, animal biotechnology, medical biotechnology, microbial biotechnology, and zoology an updated introduction to the growing field of bioinformatics.
About the Authors
Chandra Sekhar Mukhopadhyay is an Assistant Scientist (Senior Scale) at the School of Animal Biotechnology, Guru Angad Dev Veterinary and Animal Sciences University (GADVASU) at Ludhiana, Punjab, India.
Ratan Kumar Choudhary is an Assistant Professor at the School of Animal Biotechnology, Guru Angad Dev Veterinary and Animal Sciences University (GADVASU) at Ludhiana, Punjab, India.
Mir Asif Iquebal is a Scientist at the Centre for Agricultural Bioinformatics, Indian Council of Agricultural Research-Indian Agricultural Statistics and Research Institute (ICAR-IASRI) at Pusa, New Delhi, India.
An accessible guide that introduces students in all areas of life sciences to bioinformatics Basic Applied Bioinformatics provides a practical guidance in bioinformatics and helps students to optimize parameters for data analysis and then to draw accurate conclusions from the results. In addition to parameter optimization, the text will also familiarize students with relevant terminology. Basic Applied Bioinformatics is written as an accessible guide for graduate students studying bioinformatics, biotechnology, and other related sub-disciplines of the life sciences. This accessible text outlines the basics of bioinformatics, including pertinent information such as downloading molecular sequences (nucleotide and protein) from databases; BLAST analyses; primer designing and its quality checking, multiple sequence alignment (global and local using freely available software); phylogenetic tree construction (using UPGMA, NJ, MP, ME, FM algorithm and MEGA7 suite), prediction of protein structures and genome annotation, RNASeq data analyses and identification of differentially expressed genes and similar advanced bioinformatics analyses. The authors Chandra Sekhar Mukhopadhyay, Ratan Kumar Choudhary, and Mir Asif Iquebal are noted experts in the field and have come together to provide an updated information on bioinformatics. Salient features of this book includes: Accessible and updated information on bioinformatics tools A practical step-by-step approach to molecular-data analyses Information pertinent to study a variety of disciplines including biotechnology, zoology, bioinformatics and other related fields Worked examples, glossary terms, problems and solutions Basic Applied Bioinformatics gives students studying bioinformatics, agricultural biotechnology, animal biotechnology, medical biotechnology, microbial biotechnology, and zoology an updated introduction to the growing field of bioinformatics.
About the Authors Chandra Sekhar Mukhopadhyay is an Assistant Scientist (Senior Scale) at the School of Animal Biotechnology, Guru Angad Dev Veterinary and Animal Sciences University (GADVASU) at Ludhiana, Punjab, India. Ratan Kumar Choudhary is an Assistant Professor at the School of Animal Biotechnology, Guru Angad Dev Veterinary and Animal Sciences University (GADVASU) at Ludhiana, Punjab, India. Mir Asif Iquebal is a Scientist at the Centre for Agricultural Bioinformatics, Indian Council of Agricultural Research-Indian Agricultural Statistics and Research Institute (ICAR-IASRI) at Pusa, New Delhi, India.
PREFACE, xi
ACKNOWLEDGEMENTS, xiii
LIST OF ABBREVIATIONS, xv
SECTION I Molecular Sequences and Structures
1 Retrieval of Sequence(s) from the NCBI Nucleotide Database, 3
2 Retrieval of Protein Sequence from UniProtKB, 9
3 Downloading Protein Structure, 15
4 Visualizing Protein Structure, 19
5 Sequence Format Conversion, 23
6 Nucleotide Sequence Analysis Using Sequence Manipulation Suite (SMS), 31
7 Detection of Restriction Enzyme Sites, 43
SECTION II Sequence Alignment
8 Dot Plot Analysis, 53
9 Needleman-Wunsch Algorithm (Global Alignment), 59
10 Smith-Waterman Algorithm (Local Alignment), 67
11 Sequence Alignment Using Online Tools, 73
SECTION III Basic Local Alignment Search Tools
12 Basic Local Alignment Search Tool for Nucleotide (BLASTn), 81
13 Basic Local Alignment Search Tool for Amino Acid Sequences (BLASTp), 91
14 BLASTx, 103
15 tBLASTn, 109
16 tBLASTx, 113
SECTION IV Primer Designing and Quality Checking
17 Primer Designing - Basics, 121
18 Designing PCR Primers Using the Primer3 Online Tool, 125
19 Quality Checking of the Designed Primers, 139
20 Primer Designing for SYBR Green Chemistry of qPCR, 147
SECTION V Molecular Phylogenetics
21 Construction of Phylogenetic Tree: Unweighted?]Pair Group Method with Arithmetic Mean (UPGMA), 151
22 Construction of Phylogenetic Tree: Fitch Margoliash (FM) Algorithm, 159
23 Construction of Phylogenetic Tree: Neighbor?]Joining Method, 165
24 Construction of Phylogenetic Tree: Maximum Parsimony Method, 175
25 Construction of Phylogenetic Tree: Minimum Evolution Method, 183
26 Construction of Phylogenetic Tree Using MEGA7, 187
27 Interpretation of Phylogenetic Trees, 197
SECTION VI Protein Structure Prediction
28 Prediction of Secondary Structure of Protein, 211
29 Prediction of Tertiary Structure of Protein: Sequence Homology, 217
30 Protein Structure Prediction Using Threading Method, 223
31 Prediction of Tertiary Structure of Protein: Ab Initio Approach, 229
32 Validation of Predicted Tertiary Structure of Protein, 235
SECTION VII Molecular Docking and Binding Site Prediction
33 Prediction of Transcription Binding Sites, 243
34 Prediction of Translation Initiation Sites, 251
35 Molecular Docking, 257
SECTION VIII Genome Annotation
36 Genome Annotation in Prokaryotes, 265
37 Genome Annotation in Eukaryotes, 269
SECTION IX Advanced Biocomputational Analyses
38 Concepts of Real?]Time PCR Data Analysis, 275
39 Overview of Microarray Data Analysis, 283
40 Single Nucleotide Polymorphism (SNP) Mining Tools, 289
41 In Silico Mining of Simple Sequence Repeats (SSR) Markers, 299
42 Basics of RNA?]Seq Data Analysis, 305
43 Functional Annotation of Common Differentially Expressed Genes, 313
44 Identification of Differentially Expressed Genes (DEGs), 325
45 Estimating MicroRNA Expression Using the miRDeep2 Tool, 357
46 miRNA Target Prediction, 365
Appendices
Appendix A: Usage of Internet for Bioinformatics, 377
Appendix B: Important Web Resources for Bioinformatics Databases and Tools, 381
Appendix C: NCBI Database: A Brief Account, 389
Appendix D: EMBL Databases and Tools: An Overview, 395
Appendix E: Basics of Molecular Phylogeny, 403
Appendix F: Evolutionary Models of Molecular Phylogeny, 411
GLOSSARY, 415
REFERENCES, 423
WEBLIOGRAPHY, 431
INDEX, 435
CHAPTER 5
Sequence Format Conversion
CS Mukhopadhyay and RK Choudhary
School of Animal Biotechnology, GADVASU, Ludhiana
5.1 INTRODUCTION
A computer file format is a distinct way of encoding data to store in a file. Biological sequence format is an assemblage of distinct file formats, with the aim of rendering the files legible to specific programs.
Note: Biological sequences are generally written in Courier New font. This enables us to arrange the sequences uniformly in each line of the text
Sequence formats are manipulated or inter‐converted by the system in the base level through ASCII (American Standard Code for Information Interchange – i.e. binary code) text – that is, A–Z characters are encoded by 65–90; a–z characters by 97–122. Thus, the sequence formats are the required arrangement of characters, symbols, and keywords that specify the sequence, ID name, comments, and so on.
The sequence formats are needed for two purposes:
- Different programs recognize different types of formats. We need to convert one format to an other to use the sequence for that program.
- Presentations of the molecular sequence are sometimes required in a particular format.
Commonly used sequence formats.
| 1. IG/Stanford | 7. Fitch | 13. Plain/Raw |
| 2. GenBank/GB | 8. Pearson/Fasta | 14. PIR/CODATA |
| 3. NBRF | 9. Zuker (in‐only) | 15. MSF |
| 4. EMBL | 10. Olsen (in‐only) | 16. ASN.1 |
| 5. GCG | 11. Phylip3.2 | 17. PAUP |
| 6. DNAStrider | 12. Phylip | 18. Pretty (out‐only) |
5.2 OBJECTIVE
To convert the format of a given molecular sequence to other sequence formats like NCBI, EMBL, PIR, etc.
5.3 PROCEDURE
The online program ReadSeq (by Don Gilbert) will be used to convert the sequence formats. ReadSeq accepts the following formats: FASTA, Abstract Syntax Notation (ASN.1), National Biomedical Research Foundation (NBRF), EMBL, Fitch (phylogenetic analysis), GenBank, GCG, DNA Strider, Intelligenetics, Multiple sequence format, Protein Information Resource (PIR), and eight additional specialised formats.
- Open the online ReadSeq sequence conversion tool using the URL: http://www‐bimas.cit.nih.gov/molbio/readseq/
- A molecular sequence (nucleotide or amino acid sequence) in any format is pasted into the text box. The software can determine the input sequence automatically (Figure 5.1).
- Click on the drop‐down menu, just above the text box (on the left side) and select the desired output format.
- There are additional formatting options:
- Altering the case of the output sequence: click on one of the radio buttons “MiXeD case”, “UPPER” or “lower” case.
- Removal of the gaps: click on the check box to remove existing gaps in the input sequence.
- Click on the “Submit” button to get the output.
- The “reset” button is there to erase all the input data and start afresh with default settings.
FIGURE 5.1 Homepage of the ReadSeq biosequence format conversion tool.
The International Union of Pure and Applied Chemistry (IUPAC) nucleic acid code has been adopted to specify a single or a group of nucleotide(s) by a single alphabet:
| A = adenine | U = uracil | M = A or C (amino) | D = G or A or T |
| C = cytosine | R = G or A (purine) | S = G or C | H = A or C or T |
| G = guanine | Y = T or C (pyrimidine) | W = A or T | V = G or C or A |
| T = thymine | K = G or T (keto) | B = G or T or C | N = A or G or C or T (any) |
IUPAC amino acid codes:
| A = Alanine | G = Glycine | M = Methionine | S = Serine |
| C = Cysteine | H = Histidine | N = Asparagine | T = Threonine |
| D = Aspartic Acid | I = Isoleucine | P = Proline | V = Valine |
| E = Glutamic Acid | K = Lysine | Q = Glutamine | W = Tryptophan |
| F = Phenylalanine | L = Leucine | R = Arginine | Y = Tyrosine |
5.3.1 Other online sequence conversion tools
- FMTSeq – This is an elaborative version of ReadSeq. It is furnished with data manipulation for ClustalW, Zuker, ELEX (I/O files) and so on. URL: http://www.bioinformatics.org/JaMBW/1/2/
- Emboss: This has several features, including cutseq, pasteseq, nthseq, extractseq, and so on. URL: http://emboss.sourceforge.net/docs/themes/SequenceFormats.html
- EMBOSS Seqret: This is another sequence format conversion tool available online, offering several output formats for conversion. The URL is as follows: http://www.ebi.ac.uk/Tools/sfc/emboss_seqret/
5.4 QUESTIONS
- 1. Identify the sequence format given below:
A >DL;readseq‐43434_tmp_1
readseq‐43434_tmp_1 100 bases
cagacggaaaagctggagcgcaggcgcaagccccacctggaccgcagagg
cgccatcatccggggcatccccggcttctgggccaatgccattgcgaacc*B LOCUS readseq‐13129_tmp_1 100 bp
ORIGIN
1 cagacggaaaagctggagcgcaggcgcaagccccacctggaccgcagaggcgccatcatc
61 cggggcatccccggcttctgggccaatgccattgcgaacc
//C >readseq‐14738_tmp_1 100 bp
cagacggaaaagctggagcgcaggcgcaagccccacctggaccgcagaggcgccatcatc
cggggcatccccggcttctgggccaatgccattgcgaaccD ID readseq‐10695_tmp_1 standard; DNA; UNC; 100 BP.
SQ Sequence 100 BP;
cagacggaaaagctggagcgcaggcgcaagccccacctggaccgcagaggcgccatcatc 60
cggggcatccccggcttctgggccaatgccattgcgaacc 100E readseq‐946_tmp_1 cagacggaaaagctggagcgcaggcgcaagccccacctggaccgcagaggcgccatcatc
readseq‐946_tmp_1 cggggcatccccggcttctgggccaatgccattgcgaaccF 1 100
readseq‐26 cagacggaaaagctggagcgcaggcgcaagccccacctggaccgcagagg
cgccatcatccggggcatccccggcttctgggccaatgccattgcgaacc - 2. Download a nucleotide sequence of your interest from NCBI Nucleotide. Then convert it to the following formats:
a. Clustal b. EMBL c. Phylip - 3. Given below is an amino acid sequence (GenBank: BAA36473.1) in lower case. Convert it to upper case and show in PIR format:QTEKLERRRKPHLDRRGAIIRGIPGFWANAIANHPQMSALITDQDE
- 4. Suppose you have custom sequenced a cloned product. How will you open the sequence file, and to which format will you convert it to do basic biocomputational analysis (i.e., using BLAST, Alignment, in silico translation (if applicable), etc.)?
- 5. What are the uses of sequence format conversion? A DNA sequence has been presented in some of the commonly used formats. Please write the name of the formats.
(A)
>readseq‐26104_tmp_1 204 bp
ccatgaacgccttcattgtgtggtctcgtgaacgaagacgaaaggtggctctagagaatc
ccaaaatgaaaaactcagacatcagcaagcagctgggatatgagtggaaaaggcttacag
atgctgaaaagcgcccattctttgaggaggcacagagactactagccatacaccgagaca
aatacccgggctataaatatcgac
(B)
LOCUS readseq‐11577_tmp_1 204 bp
ORIGIN
1 ccatgaacgccttcattgtgtggtctcgtgaacgaagacgaaaggtggctctagagaatc
61 ccaaaatgaaaaactcagacatcagcaagcagctgggatatgagtggaaaaggcttacag
121 atgctgaaaagcgcccattctttgaggaggcacagagactactagccatacaccgagaca
181 aatacccgggctataaatatcgac
(C)
ID readseq‐2117_tmp_1 standard; DNA; UNC; 204 BP.
SQ Sequence 204 BP;
ccatgaacgccttcattgtgtggtctcgtgaacgaagacgaaaggtggctctagagaatc 60
ccaaaatgaaaaactcagacatcagcaagcagctgggatatgagtggaaaaggcttacag 120
atgctgaaaagcgcccattctttgaggaggcacagagactactagccatacaccgagaca 180
aatacccgggctataaatatcgac 204
//
(D)
///
ENTRY readseq‐18456_tmp_1
TITLE readseq‐18456_tmp_1 204 bases
SEQUENCE
5 10 15 20 25 30
1 c c a t g a a c g c c t t c a t t g t g t g g t c t c g t g
31 a a c g a a g a c g a aa g g t g g c t c t a g a g a a t c
61 c c a aaa t g a aaaa c t c a g a c a t c a g c a a g c
91 a g c t g gg a t a t g a g t g g a aaa g g c t t a c a g
121 a t g c t g a aaa g c g c cc a t t c t tt g a g g a g g
151 c a c a g a g a c t a c t a g c c a t a c a c c g a g a c a
181 a a t a c...
| Erscheint lt. Verlag | 15.9.2017 |
|---|---|
| Sprache | englisch |
| Themenwelt | Informatik ► Weitere Themen ► Bioinformatik |
| Studium ► Querschnittsbereiche ► Epidemiologie / Med. Biometrie | |
| Naturwissenschaften ► Biologie | |
| Technik ► Medizintechnik | |
| Schlagworte | Basic Applied Bioinformatics • Bioinformatics • Bioinformatics & Computational Biology • bioinformatics reliable data analysis • Bioinformatik u. Computersimulationen in der Biowissenschaften • Biowissenschaften • BLAST analyses • Cell & Molecular Biology • examples of bioinformatics • Genetics • Genetik • genome annotation</p> • introduction to agricultural biotechnology • introduction to animal biotechnology • introduction to medical biotechnology • introduction to microbial biotechnology • introduction to zoology • Life Sciences • <p>introduction to bioinformatics • MEGA6 suite • Mir Asif Iquebal • molecular sequences • Multiple sequence alignment • Nucleotide • phylogenetic tree construction • prediction of protein structures • primer designing and its quality checking • problems in bioinformatics Chandra Sekhar Mukhopadhyay • protein sequences • Ratan Kumar Choudhary • Zell- u. Molekularbiologie |
| ISBN-10 | 1-119-24441-2 / 1119244412 |
| ISBN-13 | 978-1-119-24441-7 / 9781119244417 |
| Informationen gemäß Produktsicherheitsverordnung (GPSR) | |
| Haben Sie eine Frage zum Produkt? |
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.