1
Genetics and Genomics
Anu Bashamboo and Ken McElreavey
Human Developmental Genetics, Institut Pasteur, Paris, France
Introduction
The Human Genome Project was completed in 2003 but it is only now that we are truly in the genomic era. Next‐generation sequencing (NGS), which allows genome‐wide detection of variants, is transforming on an unprecedented scale our understanding of pediatric and endocrine diseases by identifying mutations that are pathogenic or confer disease risk: new genes that cause human disease are being identified at the rate of 3 per week. We all differ in our DNA sequence and medical geneticists aim to understand the significance of this genetic diversity in health and disease, which has led to the age of genomic medicine.
Understanding genetic diversity is essential to understanding the biology of diseases of various kinds, from simple Mendelian or monogenic disorders to more complex multifactorial disease, and how we respond to treatment at both population and individual levels. We have the capacity to study the human genome as an entity rather than one gene at a time and medical and clinical genetics has become part of the broader field of genomic or precision medicine, which seeks to apply a large‐scale analysis of the human genome to provide an individual and knowledge‐based approach to medical care.
Many web resources and web‐based tools have been developed to help the clinicians navigate and interpret the tremendous amount of genomic data that are being generated (Table 1.1).
Table 1.1 Commonly used databases in human genetic and genomic analysis.
| National Center for Biotechnology Information | A portal that provides access to a wealth of biomedical and genomic information. Includes PubMed, OMIM, dbSNP, Clinvar, expression data sets. Suite of tools for data and sequence analysis (e.g. BLAST) | http://www.ncbi.nlm.nih.gov |
| ClinGen | Authoritative central resource that defines the clinical relevance of genes and variants for use in precision medicine and research | https://www.clinicalgenome.org |
| Ensembl | Genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation. Annotates genes, computes multiple alignments, predicts regulatory function and collects disease data | http://www.ensembl.org |
| University California, Santa Cruz (UCSC), genome browser | Genome browser offering access to genome sequence data from vertebrate and invertebrate species and major model organisms. Integrated with a large collection of analysis tools | https://genome.ucsc.edu |
| GeneCards | Provides comprehensive information on all human genes. It integrates gene data from ~125 web sources, including genomic, transcriptomic, proteomic, genetic, clinical and functional information | http://www.genecards.org |
| Human Gene Mutation Database (HGMD) | Collates published gene lesions responsible for human inherited disease | www.hgmd.cf.ac.uk/ac |
| Mouse Genome Informatics at the Jackson Laboratories | International database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human health and disease | http://www.informatics.jax.org |
| DECIPHER database | Collects clinical information about rare genomic variants and displays this information on the human genome map | https://decipher.sanger.ac.uk |
| Exome Aggregation Consortium (ExAC) browser | Exome data set >60,000 unrelated individuals. Provides both a reference set of allele frequencies and constraint metrics giving information on whether a gene is tolerant or intolerant to variation | http://exac.broadinstitute.org |
| dbSNP | Genetic variation within and across different species. Not limited to SNPs, it contains a range of molecular variation | http://www.ncbi.nlm.nih.gov/SNP |
| F‐SNP | Provides integrated information about the functional effects of SNPs obtained from 16 bioinformatics tools and databases. Helps identify and focus on SNPs with potential pathological effect to human health | http://compbio.cs.queensu.ca/F‐SNP |
| Biological General Repository for Interaction Datasets (BioGRID) | Database of protein–protein interactions, genetic interactions, chemical interactions, and post‐translational modifications | http://thebiogrid.org |
| PhenomicDB | A multi‐organism phenotype–genotype database including human, mouse, fruit fly, C. elegans, and other model organisms | http://www.phenomicdb.de |
| Phencode | Connects human phenotype and clinical data in various locus‐specific mutation databases with data on genome sequences, evolutionary history and function in the UCSC Genome Browser | http://phencode.bx.psu.edu |
| Human Epigenome Atlas | Includes human reference epigenomes and the results of their integrative and comparative analyses. Provides details of locus‐specific epigenomic states like histone marks and DNA methylation across tissues and cell types, developmental stages, physiological conditions, genotypes and disease states | http://www.genboree.org/epigenomeatlas |
| Encyclopedia of DNA Elements (ENCODE) | Catalogue of functional elements in the human genome, including elements that act at the protein and RNA levels and regulatory elements that control cells and circumstances in which a gene is active | https://www.encodeproject.org |
| Genomics England 100,000 Genomes Project | The project will sequence 100,000 genomes from around 70,000 people. Participants are National Health Service (UK) patients with a rare disease, plus their families, and patients with cancer | www.genomicsengland.co.uk/the‐100000‐genomes‐project |
The term ‘‐omics’ aims at the collective characterization and quantification of pools of biological molecules that translate into the structure, function and dynamics of an organism. Genomics can be divided into comparative genomics, the study of the relationship of genome structure and function across different biological species or strains; functional genomics, which describes gene and protein functions and interactions; metagenomics, the study of genetic material recovered directly from environmental samples; and epigenomics, which is the study of the complete set of epigenetic modifications on the genetic material of a cell, known as the epigenome.
Basic Concepts in Human Genetics and Genomics
Genes and Chromosomes
Genetic information is stored in DNA in the chromosomes within the cell nucleus. DNA is a polymeric nucleic acid macromolecule composed of a five‐carbon sugar (deoxyribose), a nitrogen‐containing base and a phosphate group. The bases are of two types, purines and pyrimidines. In DNA, there are two purine bases, adenine (A) and guanine (G), and two pyrimidine bases, thymine (T) and cytosine (C). DNA is organized in a helical structure in which two polynucleotide chains run in opposite directions, held together by hydrogen bonds between pairs of bases, A of one chain pairing with T of the other and G with C. In the coding sequences of a gene, each set of three bases constitutes a codon that encodes for a particular amino acid. Genome refers to the totality of genetic information carried by a cell or an organism, whereas genotype is the genetic constitution of an individual cell or organism. With the exception of cells that develop into gametes (the germline), all cells that contribute to the body are termed somatic cells.
The human genome contained in the nucleus of the somatic cells consists of 46 chromosomes arranged in 23 pairs, 22 of which are common in both males and females and are termed autosomes, and the remaining pair being the sex chromosomes, two X chromosomes in females and an X and a Y chromosome in males. Homologous chromosomes...