Zum Hauptinhalt springen
Nicht aus der Schweiz? Besuchen Sie lehmanns.de

Machine Learning in Protein Science (eBook)

Efficient Prediction of Protein Structures and Properties

, (Autoren)

eBook Download: EPUB
2025
436 Seiten
Wiley-VCH (Verlag)
978-3-527-84235-3 (ISBN)

Lese- und Medienproben

Machine Learning in Protein Science - Jinjin Li, Yanqiang Han
Systemvoraussetzungen
124,99 inkl. MwSt
(CHF 119,95)
Der eBook-Verkauf erfolgt durch die Lehmanns Media GmbH (Berlin) zum Preis in Euro inkl. MwSt.
  • Download sofort lieferbar
  • Zahlungsarten anzeigen

Harness the power of machine learning for quick and efficient calculations of protein structures and properties

Machine Learning in Protein Science is a unique and practical reference that shows how to employ machine learning approaches for full quantum mechanical (FQM) calculations of protein structures and properties, thereby saving costly computing time and making this technology available for routine users.

Machine Learning in Protein Science provides comprehensive coverage of topics including:

  • Machine learning models and algorithms, from deep neural network (DNN) and transfer learning (TL) to hybrid unsupervised and supervised learning
  • Protein structure predictions with AlphaFold to predict the effects of point mutations
  • Modeling and optimization of the catalytic activity of enzymes
  • Property calculations (energy, force field, stability, protein-protein interaction, thermostability, molecular dynamics)
  • Protein design and large language models (LLMs) of protein systems

Machine Learning in Protein Science is an essential reference on the subject for biochemists, molecular biologists, theoretical chemists, biotechnologists, and medicinal chemists, as well as students in related programs of study.

Jinjin Li is a Professor at the School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University in Shanghai, China. She performed postdoctoral work at the University of Illinois, USA and was a Senior Research Fellow at the University of California, USA.

Yanqiang Han is an Assistant Professor at the School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University in Shanghai, China.

Chapter 1
Introduction


1.1 Background and Motivation


Proteins are the molecular machines that power life itself. Every cell in a living organism contains a vast array of proteins, each responsible for specific tasks, from facilitating chemical reactions to structural integrity, and regulating gene expression. The study of proteins is essential for understanding the fundamental processes of life, ranging from cellular metabolism to disease pathology. At the molecular level, proteins are composed of long chains of amino acids that fold into specific three-dimensional structures, a process known as protein folding (Ptitsyn, 1991; Richardson and Richardson, 1992). The unique shape of a protein determines its functionality, as only a specific conformation allows it to interact with other molecules, catalyze biochemical reactions, and maintain cellular processes (Figure 1.1).

Figure 1.1 (a) The primary structure of a protein can be understood as a linear string. (b) The secondary structure refers to how the peptide chain undergoes twists, folds, and other transformations based on the string of the primary structure, forming a local three-dimensional structure. (c) The tertiary structure is the process of splicing multiple secondary structures together and folding them into a complete three-dimensional protein structure. (d) A quaternary structure refers to the combination of multiple tertiary molecules into a complex.

However, despite the critical role of proteins in cellular function, a major challenge in molecular biology remains: understanding how proteins achieve their three-dimensional shapes and how mutations in these structures can lead to diseases. For decades, researchers have attempted to predict protein structures based on their amino acid sequences, but this task has proven to be extraordinarily complex. The sequence of amino acids in a protein is like a string of letters in an alphabet, yet the way these letters arrange themselves into a specific shape is governed by intricate physical and chemical interactions that are not immediately obvious from the sequence alone.

In the past, the understanding of protein structures relied heavily on experimental methods such as X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, and cryo-electron microscopy (cryo-EM). These techniques can provide high-resolution information on the structure of proteins, but they are time-consuming, expensive, and often require high-quality samples, which are not always available. Moreover, they struggle to capture the dynamic nature of proteins, which constantly change shape during their interactions with other molecules. These challenges have led researchers to seek out computational approaches that can predict protein structure from sequence, simulate protein dynamics, and investigate the effects of mutations on protein function (Figure 1.2).

Figure 1.2 The three-dimensional structural model of proteins is usually predicted by bioinformatics software based on the amino acid sequence of proteins or analyzed through experimental methods such as X-ray crystallography, nuclear magnetic resonance (NMR), or cryo-electron microscopy. Different colors represent different secondary structures of proteins.

Computational protein biology has seen immense progress in recent years. The development of new algorithms and the exponential growth of computational power have paved the way for the application of more efficient techniques. Among the most groundbreaking advancements in this field is the application of machine learning (ML) and artificial intelligence (AI) to predict protein structures and functions (Jumper et al., 2021; Rives et al., 2021). The ability to predict a protein’s structure from its sequence without the need for experimental data has been one of the “holy grails” of computational biology. ML models, particularly those based on deep learning techniques, have shown immense promise in this area, outperforming traditional methods in accuracy and speed. One of the most notable breakthroughs in this domain is AlphaFold, a deep learning algorithm developed by DeepMind. AlphaFold’s ability to predict protein structures with near-experimental accuracy has revolutionized the field and demonstrated the potential of AI-driven approaches in protein science (Figure 1.3).

Figure 1.3 The detailed structure of the binding sites between one drug molecule and a protein molecule demonstrates how drugs interact with proteins, which is crucial for drug design and understanding protein function.

The success of AlphaFold (Jumper et al., 2021), which has been heralded as a major milestone in structural biology, highlights the potential of ML to solve long-standing problems in computational biology. AlphaFold uses deep neural networks trained on vast datasets of known protein structures to predict the three-dimensional structure of proteins based on their amino acid sequences. The algorithm has achieved unprecedented levels of accuracy, solving the protein folding problem for a wide range of proteins with remarkable precision. AlphaFold’s success has provided a glimpse into the future of protein research, where ML models can be used not only to predict protein structure but also to simulate protein function, understand the effects of mutations, and design novel proteins with desired properties.

Despite the significant strides made in protein structure prediction, there remain several challenges that need to be addressed. While AlphaFold’s algorithm is capable of predicting the structure of individual proteins, the prediction of protein–protein interactions (PPIs), protein–ligand binding, and the dynamic behavior of proteins in complex biological environments is still an open problem. These processes are crucial for understanding cellular signaling pathways, enzyme catalysis, and drug design (Krasner, 1972). In particular, predicting how proteins interact with one another and how their structures change in response to different conditions is a complex task that requires a deeper understanding of the molecular forces at play. Moreover, protein interactions often occur in crowded cellular environments, making it difficult to model these interactions accurately using traditional computational methods (Zheng et al., 2020).

Furthermore, the impact of mutations on protein structure and function remains a significant challenge. Mutations in DNA can lead to changes in the amino acid sequence of a protein, which in turn may alter its structure and function. Some mutations can lead to loss of function, while others may result in gain of function, causing diseases such as cancer, neurodegenerative disorders, and genetic diseases. Being able to predict the effects of mutations on protein structure and function is crucial for understanding disease mechanisms and developing therapeutic strategies. Although ML models have shown promise in predicting the effects of mutations, there is still much to be done in terms of improving the accuracy and robustness of these predictions.

In addition to structure and mutation prediction, protein function annotation remains one of the most important challenges in bioinformatics. While the genome sequencing revolution has provided us with vast amounts of sequence data, the function of many proteins remains unknown. The process of assigning a biological function to a protein based on its sequence is known as function annotation. Traditionally, function annotation has relied on experimental techniques, such as gene knockout experiments, to determine the role of a protein in a biological context. However, these methods are time-consuming and expensive. Computational methods, particularly those based on ML, have the potential to accelerate the process of function annotation by predicting the biological role of a protein based on its sequence, structure, or interaction with other molecules.

The need for accurate, high-throughput methods for protein function annotation has become even more urgent in the context of personalized medicine. With the increasing availability of genomic data, there is a growing demand for tools that can predict how genetic variations in individuals affect protein function. The ability to link specific genetic mutations to disease-causing proteins can provide valuable insights into the molecular basis of disease and guide the development of targeted therapies. In this regard, ML has the potential to revolutionize the way we approach drug discovery and personalized medicine by enabling the rapid identification of disease-related proteins and the design of therapies that target these proteins.

The integration of quantum mechanical calculations into protein research represents another promising avenue for improving the accuracy of protein predictions. Quantum mechanics, which describes the behavior of matter at the atomic and subatomic levels, provides a powerful framework for modeling the interactions between atoms and molecules. By applying quantum mechanical methods to protein systems, researchers can gain a deeper understanding of the forces that govern protein folding, stability, and interactions. Quantum mechanical calculations are particularly useful for studying the detailed electronic structure of proteins, including the behavior of electrons and the formation of chemical bonds. However, these calculations are computationally expensive and often require specialized software and hardware. As a result, they have been limited to small systems or simplified models. The challenge lies in developing methods that combine the accuracy of quantum...

Erscheint lt. Verlag 7.1.2025
Sprache englisch
Themenwelt Naturwissenschaften Chemie Organische Chemie
ISBN-10 3-527-84235-7 / 3527842357
ISBN-13 978-3-527-84235-3 / 9783527842353
Informationen gemäß Produktsicherheitsverordnung (GPSR)
Haben Sie eine Frage zum Produkt?
EPUBEPUB (Adobe DRM)

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belle­tristik und Sach­büchern. Der Fließ­text wird dynamisch an die Display- und Schrift­größe ange­passt. Auch für mobile Lese­geräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich