Basic Applied Bioinformatics - Chandra Sekhar Mukhopadhyay, Ratan Kumar Choudhary, Mir Asif Iquebal

Blick ins Buch

Basic Applied Bioinformatics (eBook)

Chandra Sekhar Mukhopadhyay, Ratan Kumar Choudhary, Mir Asif Iquebal (Autoren)

eBook Download: EPUB

2017
John Wiley & Sons (Verlag)
978-1-119-24441-7 (ISBN)

Lese- und Medienproben

Ebook-Leseprobe (EPUB)

An accessible guide that introduces students in all areas of life sciences to bioinformatics

Basic Applied Bioinformatics provides a practical guidance in bioinformatics and helps students to optimize parameters for data analysis and then to draw accurate conclusions from the results. In addition to parameter optimization, the text will also familiarize students with relevant terminology. Basic Applied Bioinformatics is written as an accessible guide for graduate students studying bioinformatics, biotechnology, and other related sub-disciplines of the life sciences.

This accessible text outlines the basics of bioinformatics, including pertinent information such as downloading molecular sequences (nucleotide and protein) from databases; BLAST analyses; primer designing and its quality checking, multiple sequence alignment (global and local using freely available software); phylogenetic tree construction (using UPGMA, NJ, MP, ME, FM algorithm and MEGA7 suite), prediction of protein structures and genome annotation, RNASeq data analyses and identification of differentially expressed genes and similar advanced bioinformatics analyses. The authors Chandra Sekhar Mukhopadhyay, Ratan Kumar Choudhary, and Mir Asif Iquebal are noted experts in the field and have come together to provide an updated information on bioinformatics.

Salient features of this book includes:

Accessible and updated information on bioinformatics tools
A practical step-by-step approach to molecular-data analyses

Information pertinent to study a variety of disciplines including biotechnology, zoology, bioinformatics and other related fields
Worked examples, glossary terms, problems and solutions

Basic Applied Bioinformatics gives students studying bioinformatics, agricultural biotechnology, animal biotechnology, medical biotechnology, microbial biotechnology, and zoology an updated introduction to the growing field of bioinformatics.

About the Authors

Chandra Sekhar Mukhopadhyay is an Assistant Scientist (Senior Scale) at the School of Animal Biotechnology, Guru Angad Dev Veterinary and Animal Sciences University (GADVASU) at Ludhiana, Punjab, India.

Ratan Kumar Choudhary is an Assistant Professor at the School of Animal Biotechnology, Guru Angad Dev Veterinary and Animal Sciences University (GADVASU) at Ludhiana, Punjab, India.

Mir Asif Iquebal is a Scientist at the Centre for Agricultural Bioinformatics, Indian Council of Agricultural Research-Indian Agricultural Statistics and Research Institute (ICAR-IASRI) at Pusa, New Delhi, India.

An accessible guide that introduces students in all areas of life sciences to bioinformatics Basic Applied Bioinformatics provides a practical guidance in bioinformatics and helps students to optimize parameters for data analysis and then to draw accurate conclusions from the results. In addition to parameter optimization, the text will also familiarize students with relevant terminology. Basic Applied Bioinformatics is written as an accessible guide for graduate students studying bioinformatics, biotechnology, and other related sub-disciplines of the life sciences. This accessible text outlines the basics of bioinformatics, including pertinent information such as downloading molecular sequences (nucleotide and protein) from databases; BLAST analyses; primer designing and its quality checking, multiple sequence alignment (global and local using freely available software); phylogenetic tree construction (using UPGMA, NJ, MP, ME, FM algorithm and MEGA7 suite), prediction of protein structures and genome annotation, RNASeq data analyses and identification of differentially expressed genes and similar advanced bioinformatics analyses. The authors Chandra Sekhar Mukhopadhyay, Ratan Kumar Choudhary, and Mir Asif Iquebal are noted experts in the field and have come together to provide an updated information on bioinformatics. Salient features of this book includes: Accessible and updated information on bioinformatics tools A practical step-by-step approach to molecular-data analyses Information pertinent to study a variety of disciplines including biotechnology, zoology, bioinformatics and other related fields Worked examples, glossary terms, problems and solutions Basic Applied Bioinformatics gives students studying bioinformatics, agricultural biotechnology, animal biotechnology, medical biotechnology, microbial biotechnology, and zoology an updated introduction to the growing field of bioinformatics.

About the Authors Chandra Sekhar Mukhopadhyay is an Assistant Scientist (Senior Scale) at the School of Animal Biotechnology, Guru Angad Dev Veterinary and Animal Sciences University (GADVASU) at Ludhiana, Punjab, India. Ratan Kumar Choudhary is an Assistant Professor at the School of Animal Biotechnology, Guru Angad Dev Veterinary and Animal Sciences University (GADVASU) at Ludhiana, Punjab, India. Mir Asif Iquebal is a Scientist at the Centre for Agricultural Bioinformatics, Indian Council of Agricultural Research-Indian Agricultural Statistics and Research Institute (ICAR-IASRI) at Pusa, New Delhi, India.

PREFACE, xi

ACKNOWLEDGEMENTS, xiii

LIST OF ABBREVIATIONS, xv

SECTION I Molecular Sequences and Structures

1 Retrieval of Sequence(s) from the NCBI Nucleotide Database, 3

2 Retrieval of Protein Sequence from UniProtKB, 9

3 Downloading Protein Structure, 15

4 Visualizing Protein Structure, 19

5 Sequence Format Conversion, 23

6 Nucleotide Sequence Analysis Using Sequence Manipulation Suite (SMS), 31

7 Detection of Restriction Enzyme Sites, 43

SECTION II Sequence Alignment

8 Dot Plot Analysis, 53

9 Needleman-Wunsch Algorithm (Global Alignment), 59

10 Smith-Waterman Algorithm (Local Alignment), 67

11 Sequence Alignment Using Online Tools, 73

SECTION III Basic Local Alignment Search Tools

12 Basic Local Alignment Search Tool for Nucleotide (BLASTn), 81

13 Basic Local Alignment Search Tool for Amino Acid Sequences (BLASTp), 91

14 BLASTx, 103

15 tBLASTn, 109

16 tBLASTx, 113

SECTION IV Primer Designing and Quality Checking

17 Primer Designing - Basics, 121

18 Designing PCR Primers Using the Primer3 Online Tool, 125

19 Quality Checking of the Designed Primers, 139

20 Primer Designing for SYBR Green Chemistry of qPCR, 147

SECTION V Molecular Phylogenetics

21 Construction of Phylogenetic Tree: Unweighted?]Pair Group Method with Arithmetic Mean (UPGMA), 151

22 Construction of Phylogenetic Tree: Fitch Margoliash (FM) Algorithm, 159

23 Construction of Phylogenetic Tree: Neighbor?]Joining Method, 165

24 Construction of Phylogenetic Tree: Maximum Parsimony Method, 175

25 Construction of Phylogenetic Tree: Minimum Evolution Method, 183

26 Construction of Phylogenetic Tree Using MEGA7, 187

27 Interpretation of Phylogenetic Trees, 197

SECTION VI Protein Structure Prediction

28 Prediction of Secondary Structure of Protein, 211

29 Prediction of Tertiary Structure of Protein: Sequence Homology, 217

30 Protein Structure Prediction Using Threading Method, 223

31 Prediction of Tertiary Structure of Protein: Ab Initio Approach, 229

32 Validation of Predicted Tertiary Structure of Protein, 235

SECTION VII Molecular Docking and Binding Site Prediction

33 Prediction of Transcription Binding Sites, 243

34 Prediction of Translation Initiation Sites, 251

35 Molecular Docking, 257

SECTION VIII Genome Annotation

36 Genome Annotation in Prokaryotes, 265

37 Genome Annotation in Eukaryotes, 269

SECTION IX Advanced Biocomputational Analyses

38 Concepts of Real?]Time PCR Data Analysis, 275

39 Overview of Microarray Data Analysis, 283

40 Single Nucleotide Polymorphism (SNP) Mining Tools, 289

41 In Silico Mining of Simple Sequence Repeats (SSR) Markers, 299

42 Basics of RNA?]Seq Data Analysis, 305

43 Functional Annotation of Common Differentially Expressed Genes, 313

44 Identification of Differentially Expressed Genes (DEGs), 325

45 Estimating MicroRNA Expression Using the miRDeep2 Tool, 357

46 miRNA Target Prediction, 365

Appendices

Appendix A: Usage of Internet for Bioinformatics, 377

Appendix B: Important Web Resources for Bioinformatics Databases and Tools, 381

Appendix C: NCBI Database: A Brief Account, 389

Appendix D: EMBL Databases and Tools: An Overview, 395

Appendix E: Basics of Molecular Phylogeny, 403

Appendix F: Evolutionary Models of Molecular Phylogeny, 411

GLOSSARY, 415

REFERENCES, 423

WEBLIOGRAPHY, 431

INDEX, 435

CHAPTER 5
Sequence Format Conversion

CS Mukhopadhyay and RK Choudhary

School of Animal Biotechnology, GADVASU, Ludhiana

5.1 INTRODUCTION

A computer file format is a distinct way of encoding data to store in a file. Biological sequence format is an assemblage of distinct file formats, with the aim of rendering the files legible to specific programs.

Note: Biological sequences are generally written in Courier New font. This enables us to arrange the sequences uniformly in each line of the text

Sequence formats are manipulated or inter‐converted by the system in the base level through ASCII (American Standard Code for Information Interchange – i.e. binary code) text – that is, A–Z characters are encoded by 65–90; a–z characters by 97–122. Thus, the sequence formats are the required arrangement of characters, symbols, and keywords that specify the sequence, ID name, comments, and so on.

The sequence formats are needed for two purposes:

Different programs recognize different types of formats. We need to convert one format to an other to use the sequence for that program.
Presentations of the molecular sequence are sometimes required in a particular format.

Commonly used sequence formats.

1. IG/Stanford

7. Fitch

13. Plain/Raw

2. GenBank/GB

8. Pearson/Fasta

14. PIR/CODATA

3. NBRF

9. Zuker (in‐only)

15. MSF

4. EMBL

10. Olsen (in‐only)

16. ASN.1

5. GCG

11. Phylip3.2

17. PAUP

6. DNAStrider

12. Phylip

18. Pretty (out‐only)

5.2 OBJECTIVE

To convert the format of a given molecular sequence to other sequence formats like NCBI, EMBL, PIR, etc.

5.3 PROCEDURE

The online program ReadSeq (by Don Gilbert) will be used to convert the sequence formats. ReadSeq accepts the following formats: FASTA, Abstract Syntax Notation (ASN.1), National Biomedical Research Foundation (NBRF), EMBL, Fitch (phylogenetic analysis), GenBank, GCG, DNA Strider, Intelligenetics, Multiple sequence format, Protein Information Resource (PIR), and eight additional specialised formats.

Open the online ReadSeq sequence conversion tool using the URL: http://www‐bimas.cit.nih.gov/molbio/readseq/
A molecular sequence (nucleotide or amino acid sequence) in any format is pasted into the text box. The software can determine the input sequence automatically (Figure 5.1).
Click on the drop‐down menu, just above the text box (on the left side) and select the desired output format.
There are additional formatting options:
Altering the case of the output sequence: click on one of the radio buttons “MiXeD case”, “UPPER” or “lower” case.
Removal of the gaps: click on the check box to remove existing gaps in the input sequence.
Click on the “Submit” button to get the output.
The “reset” button is there to erase all the input data and start afresh with default settings.

FIGURE 5.1 Homepage of the ReadSeq biosequence format conversion tool.

The International Union of Pure and Applied Chemistry (IUPAC) nucleic acid code has been adopted to specify a single or a group of nucleotide(s) by a single alphabet:

A = adenine

U = uracil

M = A or C (amino)

D = G or A or T

C = cytosine

R = G or A (purine)

S = G or C

H = A or C or T

G = guanine

Y = T or C (pyrimidine)

W = A or T

V = G or C or A

T = thymine

K = G or T (keto)

B = G or T or C

N = A or G or C or T (any)

IUPAC amino acid codes:

A = Alanine

G = Glycine

M = Methionine

S = Serine

C = Cysteine

H = Histidine

N = Asparagine

T = Threonine

D = Aspartic Acid

I = Isoleucine

P = Proline

V = Valine

E = Glutamic Acid

K = Lysine

Q = Glutamine

W = Tryptophan

F = Phenylalanine

L = Leucine

R = Arginine

Y = Tyrosine

5.3.1 Other online sequence conversion tools

FMTSeq – This is an elaborative version of ReadSeq. It is furnished with data manipulation for ClustalW, Zuker, ELEX (I/O files) and so on. URL: http://www.bioinformatics.org/JaMBW/1/2/
Emboss: This has several features, including cutseq, pasteseq, nthseq, extractseq, and so on. URL: http://emboss.sourceforge.net/docs/themes/SequenceFormats.html
EMBOSS Seqret: This is another sequence format conversion tool available online, offering several output formats for conversion. The URL is as follows: http://www.ebi.ac.uk/Tools/sfc/emboss_seqret/

5.4 QUESTIONS

1. Identify the sequence format given below:

A >DL;readseq‐43434_tmp_1
readseq‐43434_tmp_1 100 bases
cagacggaaaagctggagcgcaggcgcaagccccacctggaccgcagagg
cgccatcatccggggcatccccggcttctgggccaatgccattgcgaacc*

B LOCUS readseq‐13129_tmp_1 100 bp
ORIGIN
1 cagacggaaaagctggagcgcaggcgcaagccccacctggaccgcagaggcgccatcatc
61 cggggcatccccggcttctgggccaatgccattgcgaacc
//

C >readseq‐14738_tmp_1 100 bp
cagacggaaaagctggagcgcaggcgcaagccccacctggaccgcagaggcgccatcatc
cggggcatccccggcttctgggccaatgccattgcgaacc

D ID readseq‐10695_tmp_1 standard; DNA; UNC; 100 BP.
SQ Sequence 100 BP;
cagacggaaaagctggagcgcaggcgcaagccccacctggaccgcagaggcgccatcatc 60
cggggcatccccggcttctgggccaatgccattgcgaacc 100

E readseq‐946_tmp_1 cagacggaaaagctggagcgcaggcgcaagccccacctggaccgcagaggcgccatcatc
readseq‐946_tmp_1 cggggcatccccggcttctgggccaatgccattgcgaacc

F 1 100
readseq‐26 cagacggaaaagctggagcgcaggcgcaagccccacctggaccgcagagg
cgccatcatccggggcatccccggcttctgggccaatgccattgcgaacc
2. Download a nucleotide sequence of your interest from NCBI Nucleotide. Then convert it to the following formats:

a. Clustal b. EMBL c. Phylip
3. Given below is an amino acid sequence (GenBank: BAA36473.1) in lower case. Convert it to upper case and show in PIR format:QTEKLERRRKPHLDRRGAIIRGIPGFWANAIANHPQMSALITDQDE
4. Suppose you have custom sequenced a cloned product. How will you open the sequence file, and to which format will you convert it to do basic biocomputational analysis (i.e., using BLAST, Alignment, in silico translation (if applicable), etc.)?
5. What are the uses of sequence format conversion? A DNA sequence has been presented in some of the commonly used formats. Please write the name of the formats.
(A)

>readseq‐26104_tmp_1 204 bp

ccatgaacgccttcattgtgtggtctcgtgaacgaagacgaaaggtggctctagagaatc

ccaaaatgaaaaactcagacatcagcaagcagctgggatatgagtggaaaaggcttacag

atgctgaaaagcgcccattctttgaggaggcacagagactactagccatacaccgagaca

aatacccgggctataaatatcgac

(B)

LOCUS readseq‐11577_tmp_1 204 bp

ORIGIN

1 ccatgaacgccttcattgtgtggtctcgtgaacgaagacgaaaggtggctctagagaatc

61 ccaaaatgaaaaactcagacatcagcaagcagctgggatatgagtggaaaaggcttacag

121 atgctgaaaagcgcccattctttgaggaggcacagagactactagccatacaccgagaca

181 aatacccgggctataaatatcgac

(C)

ID readseq‐2117_tmp_1 standard; DNA; UNC; 204 BP.

SQ Sequence 204 BP;

ccatgaacgccttcattgtgtggtctcgtgaacgaagacgaaaggtggctctagagaatc 60

ccaaaatgaaaaactcagacatcagcaagcagctgggatatgagtggaaaaggcttacag 120

atgctgaaaagcgcccattctttgaggaggcacagagactactagccatacaccgagaca 180

aatacccgggctataaatatcgac 204

//

(D)

///

ENTRY readseq‐18456_tmp_1

TITLE readseq‐18456_tmp_1 204 bases

SEQUENCE

5 10 15 20 25 30

1 c c a t g a a c g c c t t c a t t g t g t g g t c t c g t g

31 a a c g a a g a c g a aa g g t g g c t c t a g a g a a t c

61 c c a aaa t g a aaaa c t c a g a c a t c a g c a a g c

91 a g c t g gg a t a t g a g t g g a aaa g g c t t a c a g

121 a t g c t g a aaa g c g c cc a t t c t tt g a g g a g g

151 c a c a g a g a c t a c t a g c c a t a c a c c g a g a c a

181 a a t a c...

Erscheint lt. Verlag	15.9.2017
Sprache	englisch
Themenwelt	Informatik ► Weitere Themen ► Bioinformatik
	Studium ► Querschnittsbereiche ► Epidemiologie / Med. Biometrie
	Naturwissenschaften ► Biologie
	Technik ► Medizintechnik
Schlagworte	Basic Applied Bioinformatics • Bioinformatics • Bioinformatics & Computational Biology • bioinformatics reliable data analysis • Bioinformatik u. Computersimulationen in der Biowissenschaften • Biowissenschaften • BLAST analyses • Cell & Molecular Biology • examples of bioinformatics • Genetics • Genetik • genome annotation</p> • introduction to agricultural biotechnology • introduction to animal biotechnology • introduction to medical biotechnology • introduction to microbial biotechnology • introduction to zoology • Life Sciences • <p>introduction to bioinformatics • MEGA6 suite • Mir Asif Iquebal • molecular sequences • Multiple sequence alignment • Nucleotide • phylogenetic tree construction • prediction of protein structures • primer designing and its quality checking • problems in bioinformatics Chandra Sekhar Mukhopadhyay • protein sequences • Ratan Kumar Choudhary • Zell- u. Molekularbiologie
ISBN-10	1-119-24441-2 / 1119244412
ISBN-13	978-1-119-24441-7 / 9781119244417

Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?

EPUB (Adobe DRM)

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.