Tutorials in Chemoinformatics (eBook)
John Wiley & Sons (Verlag)
978-1-119-13797-9 (ISBN)
30 tutorials and more than 100 exercises in chemoinformatics, supported by online software and data sets
Chemoinformatics is widely used in both academic and industrial chemical and biochemical research worldwide. Yet, until this unique guide, there were no books offering practical exercises in chemoinformatics methods. Tutorials in Chemoinformatics contains more than 100 exercises in 30 tutorials exploring key topics and methods in the field. It takes an applied approach to the subject with a strong emphasis on problem-solving and computational methodologies.
Each tutorial is self-contained and contains exercises for students to work through using a variety of software packages. The majority of the tutorials are divided into three sections devoted to theoretical background, algorithm description and software applications, respectively, with the latter section providing step-by-step software instructions. Throughout, three types of software tools are used: in-house programs developed by the authors, open-source programs and commercial programs which are available for free or at a modest cost to academics. The in-house software and data sets are available on a dedicated companion website.
Key topics and methods covered in Tutorials in Chemoinformatics include:
- Data curation and standardization
- Development and use of chemical databases
- Structure encoding by molecular descriptors, text strings and binary fingerprints
- The design of diverse and focused libraries
- Chemical data analysis and visualization
- Structure-property/activity modeling (QSAR/QSPR)
- Ensemble modeling approaches, including bagging, boosting, stacking and random subspaces
- 3D pharmacophores modeling and pharmacological profiling using shape analysis
- Protein-ligand docking
- Implementation of algorithms in a high-level programming language
Tutorials in Chemoinformatics is an ideal supplementary text for advanced undergraduate and graduate courses in chemoinformatics, bioinformatics, computational chemistry, computational biology, medicinal chemistry and biochemistry. It is also a valuable working resource for medicinal chemists, academic researchers and industrial chemists looking to enhance their chemoinformatics skills.
Edited by
Alexandre Varnek, PhD, is a professor of theoretical chemistry at The University of Strasbourg, France where he heads the Laboratory of Chemoinformatics, and is Director of two MSc programs: Chemoinformatics and In Silico Drug Design. Professor Varnek's research focuses on developing new approaches and tools for virtual screening and 'in silico' design of new compounds and chemical reactions.
30 tutorials and more than 100 exercises in chemoinformatics, supported by online software and data sets Chemoinformatics is widely used in both academic and industrial chemical and biochemical research worldwide. Yet, until this unique guide, there were no books offering practical exercises in chemoinformatics methods. Tutorials in Chemoinformatics contains more than 100 exercises in 30 tutorials exploring key topics and methods in the field. It takes an applied approach to the subject with a strong emphasis on problem-solving and computational methodologies. Each tutorial is self-contained and contains exercises for students to work through using a variety of software packages. The majority of the tutorials are divided into three sections devoted to theoretical background, algorithm description and software applications, respectively, with the latter section providing step-by-step software instructions. Throughout, three types of software tools are used: in-house programs developed by the authors, open-source programs and commercial programs which are available for free or at a modest cost to academics. The in-house software and data sets are available on a dedicated companion website. Key topics and methods covered in Tutorials in Chemoinformatics include: Data curation and standardization Development and use of chemical databases Structure encoding by molecular descriptors, text strings and binary fingerprints The design of diverse and focused libraries Chemical data analysis and visualization Structure-property/activity modeling (QSAR/QSPR) Ensemble modeling approaches, including bagging, boosting, stacking and random subspaces 3D pharmacophores modeling and pharmacological profiling using shape analysis Protein-ligand docking Implementation of algorithms in a high-level programming language Tutorials in Chemoinformatics is an ideal supplementary text for advanced undergraduate and graduate courses in chemoinformatics, bioinformatics, computational chemistry, computational biology, medicinal chemistry and biochemistry. It is also a valuable working resource for medicinal chemists, academic researchers and industrial chemists looking to enhance their chemoinformatics skills.
Edited by Alexandre Varnek, PhD, is a professor of theoretical chemistry at The University of Strasbourg, France where he heads the Laboratory of Chemoinformatics, and is Director of two MSc programs: Chemoinformatics and In Silico Drug Design. Professor Varnek's research focuses on developing new approaches and tools for virtual screening and "in silico" design of new compounds and chemical reactions.
Title Page 5
Copyright Page 6
Contents 7
List of Contributors 17
Preface 19
About the Companion Website 21
Part 1 Chemical Databases 23
Chapter 1 Data Curation 25
Theoretical Background 25
Software 27
Step-by-Step Instructions 29
Conclusion 56
References 58
Chapter 2 Relational Chemical Databases: Creation, Management, and Usage 59
Theoretical Background 59
Step-by-Step Instructions 63
Conclusion 87
References 87
Chapter 3 Handling of Markush Structures 89
Theoretical Background 89
Step-by-Step Instructions 90
Conclusion 95
References 95
Chapter 4 Processing of SMILES, InChI, and Hashed Fingerprints 97
Theoretical Background 97
Algorithms 98
Step-by-Step Instructions 100
Conclusion 102
References 103
Part 2 Library Design 105
Chapter 5 Design of Diverse and Focused Compound Libraries 107
Introduction 107
Data Acquisition 108
Implementation 108
Compound Library Creation 109
Compound Library Analysis 112
Normalization of Descriptor Values 113
Visualizing Descriptor Distributions 114
Decorrelation and Dimension Reduction 116
Partitioning and Diverse Subset Calculation 117
Partitioning 117
Diverse Subset Selection 119
Combinatorial Libraries 120
Combinatorial Enumeration of Compounds 120
Retrosynthetic Approaches to Library Design 121
References 123
Part 3 Data Analysis and Visualization 125
Chapter 6 Hierarchical Clustering in R 127
Theoretical Background 127
Algorithms 128
Instructions 129
Hierarchical Clustering Using Fingerprints 130
Hierarchical Clustering Using Descriptors 133
Visualization of the Data Sets 135
Alternative Clustering Methods 138
Conclusion 139
References 140
Chapter 7 Data Visualization and Analysis Using Kohonen Self-Organizing Maps 141
Theoretical Background 141
Algorithms 142
Instructions 143
Conclusion 148
References 148
Part 4 Obtaining and Validation QSAR/QSPR Models 149
Chapter 8 Descriptors Generation Using the CDK Toolkit and Web Services 151
Theoretical Background 151
Algorithms 152
Step-by-Step Instructions 153
Conclusion 155
References 156
Chapter 9 QSPR Models on Fragment Descriptors 157
Abbreviations 157
DATA 158
ISIDA_QSPR input 159
Data Split Into Training and Test Sets 161
Substructure Molecular Fragment (SMF) Descriptors 161
Regression Equations 164
Forward and Backward Stepwise Variable Selection 164
Parameters of Internal Model Validation 165
Applicability Domain (AD) of the Model 165
Storage and Retrieval Modeling Results 166
Analysis of Modeling Results 166
Root-Mean Squared Error (RMSE) Estimation 170
Setting the Parameters 173
Analysis of n-Fold Cross-Validation Results 173
Loading Structure-Data File 175
Descriptors and Fitting Equation 176
Variables Selection 177
Consensus Model 177
Model Applicability Domain 177
n-Fold External Cross-Validation 177
Saving and Loading of the Consensus Modeling Results 177
Statistical Parameters of the Consensus Model 178
Consensus Model Performance as a Function of Individual Models Acceptance Threshold 179
Building Consensus Model on the Entire Data Set 180
Loading Input Data 181
Loading Selected Models and Choosing their Applicability Domain 182
Reporting Predicted Values 182
Analysis of the Fragments Contributions 183
References 183
Chapter 10 Cross-Validation and the Variable Selection Bias 185
Theoretical Background 185
Step-by-Step Instructions 187
Conclusion 194
References 195
Chapter 11 Classification Models 197
Theoretical Background 198
Algorithms 200
Step-by-Step Instructions 202
Conclusion 213
References 214
Chapter 12 Regression Models 215
Theoretical Background 216
Step-by-Step Instructions 219
Conclusion 229
References 230
Chapter 13 Benchmarking Machine-Learning Methods 231
Theoretical Background 231
Step-by-Step Instructions 232
Conclusion 244
References 244
Chapter 14 Compound Classification Using the scikit-learn Library 245
Theoretical Background 246
Algorithms 247
Step-by-Step Instructions 252
Naïve Bayes 252
Decision Tree 253
Support Vector Machine 256
Notes on Provided Code 259
Conclusion 260
References 261
Part 5 Ensemble Modeling 263
Chapter 15 Bagging and Boosting of Classification Models 265
Theoretical Background 265
Algorithm 266
Conclusion 269
References 269
Chapter 16 Bagging and Boosting of Regression Models 271
Theoretical Background 271
Algorithm 271
Step-by-Step Instructions 272
Conclusion 277
References 277
Chapter 17 Instability of Interpretable Rules 279
Theoretical Background 279
Algorithm 280
Step-by-Step Instructions 280
Conclusion 283
References 283
Chapter 18 Random Subspaces and Random Forest 285
Theoretical Background 286
Algorithm 286
Step-by-Step Instructions 287
Conclusion 291
References 291
Chapter 19 Stacking 293
Theoretical Background 293
Algorithm 294
Step-by-Step Instructions 295
Conclusion 299
References 300
Part 6 3D Pharmacophore Modeling 301
Chapter 20 3D Pharmacophore Modeling Techniques in Computer-Aided Molecular Design Using LigandScout 303
Introduction 303
Theory: 3D Pharmacophores 305
Representation of Pharmacophore Models 305
Hydrogen-Bonding Interactions 307
Hydrophobic Interactions 307
Aromatic and Cation?? Interactions 308
Ionic Interactions 308
Metal Complexation 308
Ligand Shape Constraints 309
Pharmacophore Modeling 310
Manual Pharmacophore Construction 310
Structure-Based Pharmacophore Models 311
Ligand-Based Pharmacophore Models 311
3D Pharmacophore-Based Virtual Screening 313
3D Pharmacophore Creation 313
Annotated Database Creation 313
Virtual Screening-Database Searching 314
Hit-List Analysis 314
Tutorial: Creating 3D-Pharmacophore Models Using LigandScout 316
Creating Structure-Based Pharmacophores From a Ligand-Protein Complex 316
Description: Create a Structure-Based Pharmacophore Model 318
Create a Shared Feature Pharmacophore Model From Multiple Ligand-Protein Complexes 318
Description: Create a Shared Feature Pharmacophore and Align it to Ligands 319
Create Ligand-Based Pharmacophore Models 320
Description: Ligand-Based Pharmacophore Model Creation 322
Tutorial: Pharmacophore-Based Virtual Screening Using LigandScout 323
Virtual Screening, Model Editing, and Viewing Hits in the Target Active Site 323
Description: Virtual Screening and Pharmacophore Model Editing 324
Analyzing Screening Results with Respect to the Binding Site 325
Description: Analyzing Hits in the Active Site Using LigandScout 327
Parallel Virtual Screening of Multiple Databases Using LigandScout 327
Virtual Screening in the Screening Perspective of LigandScout 328
Description: Virtual Screening Using LigandScout 328
Conclusions 329
Acknowledgments 329
References 329
Part 7 The Protein 3D-Structures in Virtual Screening 333
Chapter 21 The Protein 3D-Structures in Virtual Screening 335
Introduction 335
Description of the Example Case 336
Thrombin and Blood Coagulation 336
Active Thrombin and Inactive Prothrombin 336
Thrombin as a Drug Target 336
Thrombin Three-Dimensional Structure: The 1OYT PDB File 337
Modeling Suite 337
Overall Description of the Input Data Available on the Editor Website 337
Exercise 1: Protein Analysis and Preparation 338
Step 1: Identification of Molecules Described in the 1OYT PDB File 338
Step 2: Protein Quality Analysis of the Thrombin/Inhibitor PDB Complex Using MOE Geometry Utility 342
Step 3: Preparation of the Protein for Drug Design Applications 343
Step 4: Description of the Protein?Ligand Binding Mode 347
Step 5: Detection of Protein Cavities 350
Exercise 2: Retrospective Virtual Screening Using the Pharmacophore Approach 352
Step 1: Description of the Test Library 354
Step 2.1: Pharmacophore Design, Overview 355
Step 2.2: Pharmacophore Design, Flexible Alignment of Three Thrombin Inhibitors 356
Step 2.3: Pharmacophore Design, Query Generation 357
Step 3: Pharmacophore Search 359
Exercise 3: Retrospective Virtual Screening Using the Docking Approach 363
Step 1: Description of the Test Library 363
Step 2: Preparation of the Input 363
Step 3: Re-Docking of the Crystallographic Ligand 363
Step 4: Virtual Screening of a Database 367
Conclusion 370
General Conclusion 372
References 373
Part 8 Protein-Ligand Docking 375
Chapter 22 Protein-Ligand Docking 377
Introduction 377
Description of the Example Case 378
Methods 378
Ligand Preparation 381
Protein Preparation 381
Docking Parameters 382
Description of Input Data Available on the Editor Website 382
Exercises 384
A Quick Start with LeadIT 384
Re-Docking of Tacrine into AChE 384
Preparation of AChE From 1ACJ PDB File 384
Docking of Neutral Tacrine, then of Positively Charged Tacrine 385
Docking of Positively Charged Tacrine in AChE in Presence of Water 387
Conclusions 387
Cross-Docking of Tacrine?Pyridone and Donepezil Into AChE 388
Preparation of AChE From 1ACJ PDB File 388
Cross-Docking of Tacrine-Pyridone Inhibitor and Donepezil in AChE in Presence of Water 389
Re-Docking of Donepezil in AChE in Presence of Water 392
Conclusions 392
General Conclusions 394
Annex: Screen Captures of LeadIT Graphical Interface 394
References 397
Part 9 Pharmacophorical Profiling Using Shape Analysis 399
Chapter 23 Pharmacophorical Profiling Using Shape Analysis 401
Introduction 401
Description of the Example Case 402
Aim and Context 402
Description of the Searched Data Set 403
Description of the Query 403
Methods 403
ROCS 403
VolSite and Shaper 406
Other Programs for Shape Comparison 406
Description of Input Data Available on the Editor Website 407
Exercises 409
Preamble: Practical Considerations 409
Ligand Shape Analysis 409
What are ROCS Output Files? 409
Binding Site Comparison 410
Conclusions 412
References 413
Part 10 Algorithmic Chemoinformatics 415
Chapter 24 Algorithmic Chemoinformatics 417
Introduction 417
Similarity Searching Using Data Fusion Techniques 418
Introduction to Virtual Screening 418
The Three Pillars of Virtual Screening 419
Molecular Representation 419
Similarity Function 419
Search Strategy (Data Fusion) 419
Fingerprints 419
Count Fingerprints 419
Fingerprint Representations 421
Bit Strings 421
Feature Lists 421
Generation of Fingerprints 421
Similarity Metrics 424
Search Strategy 426
Completed Virtual Screening Program 427
Benchmarking VS Performance 428
Scoring the Scorers 429
How to Score 429
Multiple Runs and Reproducibility 430
Adjusting the VS Program for Benchmarking 430
Analyzing Benchmark Results 432
Conclusion 436
Introduction to Chemoinformatics Toolkits 437
Theoretical Background 437
A Note on Graph Theory 438
Basic Usage: Creating and Manipulating Molecules in RDKit 439
Creation of Molecule Objects 439
Molecule Methods 440
Atom Methods 440
Bond Methods 441
An Example: Hill Notation for Molecules 441
Canonical SMILES: The Canon Algorithm 442
Theoretical Background 442
Recap of SMILES Notation 442
Canonical SMILES 443
Building a SMILES String 444
Canonicalization of SMILES 447
The Initial Invariant 449
The Iteration Step 450
Summary 453
Substructure Searching: The Ullmann Algorithm 454
Theoretical Background 454
Backtracking 455
A Note on Atom Order 458
The Ullmann Algorithm 458
Sample Runs 462
Summary 463
Atom Environment Fingerprints 463
Theoretical Background 463
Implementation 465
The Hashing Function 465
The Initial Atom Invariant 466
The Algorithm 466
Summary 469
References 469
Index 471
EULA 485
| Erscheint lt. Verlag | 14.6.2017 |
|---|---|
| Sprache | englisch |
| Themenwelt | Mathematik / Informatik ► Informatik ► Theorie / Studium |
| Naturwissenschaften ► Chemie ► Analytische Chemie | |
| Naturwissenschaften ► Chemie ► Technische Chemie | |
| Technik | |
| Schlagworte | 3D pharmacophore modeling • Bioinformatics • Bioinformatics & Computational Biology • Bioinformatik • Bioinformatik u. Computersimulationen in der Biowissenschaften • Biowissenschaften • Chemical Informatics • Chemie • Cheminformatik • Chemistry • Chemoinformatics • chemoinformatics algorithms • chemoinformatics and drug design • chemoinformatics ensemble modeling • chemoinformatics exercises • Chemoinformatics for Drug Discovery • chemoinformatics for industrial chemists • chemoinformatics for pharmaceutical research • chemoinformatics guide • chemoinformatics in biochemistry • chemoinformatics modeling • chemoinformatics practice • chemoinformatics products • chemoinformatics research • chemoinformatics software • chemoinformatics statistical modeling • chemoinformatics text • chemoinformatics tutorials • Chemoinformatik • Computational Biology • computational biology algorithms • computational biology exercises • computational biology software • free chemoinformatics software • how to design chemoinformatics databases • Life Sciences • medicinal chemistry chemoinformatics • molecular descriptors in qsar/qspr • Molecular Graphics • Molecular Modeling • Pharmaceutical & Medicinal Chemistry • Pharmazeutische u. Medizinische Chemie • practical chemoinformatics • Protein Modeling • structure-property/activity modeling • Virtual Screening |
| ISBN-10 | 1-119-13797-7 / 1119137977 |
| ISBN-13 | 978-1-119-13797-9 / 9781119137979 |
| Informationen gemäß Produktsicherheitsverordnung (GPSR) | |
| Haben Sie eine Frage zum Produkt? |
Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM
Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.
Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine
Geräteliste und zusätzliche Hinweise
Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.
aus dem Bereich