The Application of Comparative Genomics in Response to COVID-19

Part A

The COVID-19 pandemic continues to affect millions across the globe, claiming lives and causing a major public health scare. As the severe acute respiratory syndrome-related coronavirus-2 (SARS-CoV-2) continues to mutate, the efficacy of comparative genomics in elucidating key underlying mechanisms in its pathogenesis has been pivotal in characterizing key variants across different locations. In the present study by Khan et al. (2020), the researchers explore the SARS-CoV-2 genome from 13 countries, and map different mutations in primary coronavirus proteins in comparison to other known strains of the coronaviruses. The novel COVID-19 variant has posed a major threat to people’s health, specifically due to its high mutation rate and ability to adapt to new medication. Subsequently, this makes it difficult to establish effective therapies for managing symptoms in infected populations. According to the study, the mutation rate can be as high as 106  higher than their host, suggesting high evolvability and virulence.

Comparative genomics was used in Khan et al. (2020) to establish key information regarding the SARS-CoV-2 from different locations across the globe. In this case, genomic analysis was performed using different techniques. First, sequence analysis data were retrieved from the NCBI database. Then, the researchers deployed Molecular Evolutionary Genetics Analysis version 10.1.8 for the visualization and multiple sequence analysis of the broader SARS and specific SARS-CoV-2 from the selected locations. Second, the researchers used other techniques including model building, molecular docking and molecular docking simulations of different mutants of the SARS-CoV-2. These techniques helped identify complete nucleotide sequences of 13 variants, which showed approximately 82% identity with previously mapped SARS-CoV. Among the novel 13 sequences, results of the analyses showed 99% similarity. Hence, these phylogenetic analyses results indicate that the newly identified isolates belonged to one clad.

Khan et al. (2020) study findings suggested that the high rate of mutation within the SARS-CoV could be a primary predictor of the varying symptoms and mortality across different locations. There were several single amino acid mutations at different locations in the studied SARS-CoV-2 variants. Furthermore, the authors detected one amino acid mutation in the transmembrane domain of an envelop protein in the viral strain in one geographic location, which suggests that the disruption of the virus-host interaction dynamics explains the different infection rates among the studied locations. This is due to the different implications on pathogenicity dictated by the nature of mutations across the different strains.

Part B

Genome-wide bioinformatic analyses of the SARS-CoV-2, specifically RNA-based analysis, to identify the isoforms, protein domains, and conserved domains have helped establish changes in internal environment of SARS-CoV-2-infected respiratory cells. For example, a recent study demonstrated that some immunoregulatory genes such as IL32, CSF2, and IL-6 were differentially expressed, whereas immunoregulatory transposable elements were upregulated. Moreover, comparative genomics has helped determine the conserved interaction between human RNA-binding proteins and SARS-CoV-2. Such findings help elucidate host mechanisms that can be key therapeutics targets in order to reduce the severity of the disease.

Regarding bioinformatics, protein isoforms are proteins that are alike and perform similar roles within cells help in the evolvability of different species, which leads to biological diversity. For instance, one gene can encode multiple isoforms through alternative splicing. This phenomenon is a key factor behind the observed differences in different strains of the COVID-19 virus. Tripathi et al. (2019) defines protein domains as “independent evolutionary units, which define protein function.” Over the course of research in domain-based methods, several studies have demonstrated that domain-domain interactions play a key role in predicting protein-protein interaction which underpin all key processes in living organisms. Conversely, conserved domains are identical sequences in DNA, RNA, or proteins recurring across species in molecular evolution (Khan et al., 2020). Hence, conserved domains allow for the identification of patterns and functional properties that comprise links to more detailed information for different species.