Integrated network-based multiple computational analyses for identification of co-expressed candidate genes associated with neurological manifestations of COVID-19

Integrated network-based multiple computational analyses for identification of co-expressed candidate genes associated with neurological manifestations of COVID-19

The present study is a novel approach of integrated network-based multiple computational analyses of two networks, viz. TN and CGN to find the ‘disease-related regulatory genes’ associated with functional (transcriptional and translational) cellular entities necessary for understanding the molecular basis of brain pathophysiological phenotypes of COVID-19. To achieve the goal, we proceeded with the most robust approaches through multiple screening steps including (a) finding the two sets of predictive ‘target genes’, evaluated from TN and CGN with their PPIs having STRING ‘combined scores’ (SPPICS) as priori analysis, (b) evaluation of functional associations by ‘semantic similarity scores’ (SSS) of two sets of ‘target genes’, (c) screening ‘target genes’ by cumulating PPIs having both STRING-CS and SSS by selection with given threshold values for respective PPI scores to find ‘candidate genes’, (d) formulating integrated scores (WHMS) combining SPPICS and SSS for giving weightage to PPIs of ‘candidate genes’ for further categorisation, (e) assimilation of ‘annotation terms’ (symptoms/diseases) with genes among ‘candidate nodes’ through posteriori enrichment analysis to get functional module. Notably, ‘target nodes’ for symptoms and diseases evaluated from TN were manually curated and integrated with suitable Enrichr annotations for better interpretation. The classification statistics (ROC-AUC) and cut-off values (optimal thresholds for ROC) identified the PPIs with their association scores (SPPICS, SSS, WHMS) for the respective steps most accurately (AUC > 0.8) with minimum false positive interpretation. Furthermore, all 21 ‘candidate genes’ appeared co-expressive. They became almost equally ‘indispensable’ after screening for their controllability property on CGN. Finally, the ‘candidate genes’ were categorised based on pairwise analysis of values of WHMS, SSS and SPPICS to find prevalent vs. non-prevalentcandidate genes’ with their pattern (‘is_a’ vs ‘part_of’) of relationship with neurological manifestations in COVID-19. The pathophysiological relevance of prevalentcandidate genes’ with COVID-19 has been discussed thoroughly.

In our study, two networks (TN (Fig. 3a) and CGN (Fig. 4b)) were analysed to find the ‘target nodes’, which satisfied the three properties, viz. ‘hub’, ‘bottlenecks’ and ‘driver’ together for COVID-19. Separate studies indicate that host proteins targeted by viral proteins show the node properties of hubs and high-betweenness centrality25 and, ‘indispensable’ driver controllability25,26 in a host protein network. The ‘target genes’ (TG) evaluated from TN (TG-TN) showed node properties of hub-bottleneck (HB or ‘date-hubs’ i.e., together hub and bottlenecks), driver and both (HB and driver) (Fig. 3d and e). All ‘target genes’ (TG) evaluated from CGN (TG-CGN) showed node properties as both HB and driver (Fig. 4b and c). In fact, the number of driver nodes compared to the driver nodes themselves appears crucial for maintenance of the controllability of a network25,26. In our study, the finally selected 21 ‘candidate genes’ appeared to be ‘indispensable’ as the number of driver nodes increased (4.81% for total drivers, 7.96% for drivers but non-hub-bottlenecks i.e., ‘pure-drivers’) in the CGN after removal of one of the ‘candidate genes’ (Fig. 5e).

Next, the SPPICS were applied to construct possible PPI connections of new genes in TN (Fig. 3a) and CGN (Fig. 4b) networks related to the brain in COVID-19. The SPPICS provides quantitative measurement of physical and functional PPI evidence derived from available online resources. It lacks experimental evidences of functional entities related to regulatory mechanism in physiological context of cells, as part of its calculation. The Gene Ontology resources provide a model of hierarchically (ancestors-descendants relationship) organised directed acyclic graph (DAG) having GO-terms as nodes and functional association as directed edges within each hierarchy by ‘is_a’ (subtype) and ‘part_of’ (component) relationships associated with gene/protein functionality (molecular function, cellular component and biological process) description. GO-based biological process (GO-BP) provides cohesive evidences on protein interactions, related to both physical and functional networks of molecular events in cellular physiology27. The ‘candidate genes’ for a disease show common biological pathway(s)18. Therefore, in our study, the functional associations among ‘target nodes’ were analysed by semantic comparison of GO-BP annotations quantitatively through computing similarities between gene-pairs (SSS-I measurement) (Figs. 3d and 4d) and clustering gene/symptoms/disease/module-pairs (SSS-II measurement) into known pathways (Figs. 3b, c, e, 4e and 5d).

The conventional SSS-I provided pairwise ‘direct association’ based on comparative assessment of associated GO-BP terms of two ‘target genes’ (Figs. 3d and 4d). It has been reported that the genes and their functionally connected co-expressive genes show tissue-specific expressions and regulations, and exhibit pleiotropic effects, i.e., sharing common symptoms and diseases29,30. Based on this concept, the estimation of SSS-II values was newly introduced in our study (Figs. 3e and 4e). The SSS-II values provided pairwise ‘indirect association’ based on the summated contribution of comparative assessment of associated GO-BP terms of gene-clusters (connected genes) against targeted gene-pairs. Our data indicated that the classification of both SSS-I and SSS-II values were statistically robust (AUC: 0.91 and 0.93) with the different range of values and had respective accurate (0.40 and 0.71) threshold values for ROC to interpret the results most stringently (Fig. 5b). Interestingly, the gene-pairs found as common PPI in TN and CGN networks showed the same values of SSS-I whereas SSS-II values varied for networks. For example, CTNNB1AKT1 gene-pair among ‘target genes’, found as common PPI in both TN (Fig. 3a) and CGN (Fig. 4b), showed equal SSS-I value (0.487) (Figs. 3d and 4d). The SSS-II values of this gene-pair varied for TN (0.783) and CGN (0.805) (Figs. 3e and 4e). Additionally, certain gene-pairs having considerable (above a threshold value) SSS-I values appeared to have low (below a threshold value) or zero (‘null functional similarity’) SSS-II values, including AKT1FGFR1 (SSS-I: 0.485—above threshold; (Fig. 3d), SSS-II: 0.69—below the threshold; (Fig. 3e)), C9orf72SQSTM1 (SSS-I: 0.462—above threshold; (Fig. 3d), SSS-II: 0; (Fig. 3e)). Therefore, the gene-pairs with significant values of both SSS-I and SSS-II were considered for better interpretation of the results in our study.

Irrespective of the network, the SSS-I values of gene-pairs/PPIs might depict the global and existing ‘is_a’ and/or ‘part_of’ semantic similarity available in the GO-BP annotation data and therefore would remain the same for representing generalised pathophysiological functions for any disease condition. Alternatively, the SSS-II values for gene-pairs varied due to different constituents in ‘gene-clusters’, which provided the ‘is_a’ and/or ‘part_of’ functional relationship by sharing common GO-BP annotation terms to reflect the discrete or pleiotropic effects of genes among networks (Table 2). Particularly, the zero value of SSS-II of a gene-pair indicated that ‘gene-clusters’ (connected genes) against the gene-pair had not been well-supported by current literature-based evidences related to COVID-19 neurological symptoms. Therefore, the SSS-II values might provide a better disease-specific metric for the event of disassembly in the homeostatic genetic connectivity that gets perturbed during COVID-19 insult.

The better quality of the PPI network improves the prediction accuracy to determine the ‘candidate genes’ for a disease. The STRING database comprises genes from prior knowledge and thereby provides a PPI model with certain limitations. The SSS-based PPI network includes genes having sufficient annotation information and so has GO annotation biasness. The integration of two scores, viz. SPPICS of STRING-based PPI network and SSS of anatomy-based gene network by introducing ‘accuracy values of ROC’ as weightage given to the respective scores followed by summation of them, is reported to develop the better quality of network by filtering out the false positive interactions28. In our study, the same principle of weightage (‘accuracy value of ROC’) was applied to evaluate the weighted scores of SPPICS, SSS-I, SSS-II followed by calculating their harmonic mean in order to evolve the integrated scores (WHMS) for those gene-pairs which satisfied the criteria of having (a) three individual scores (SPPICS, SSS-I, SSS-II) and (b) at least one score with value above respective threshold level (Fig. 5b). The integrated scores of total 21 gene-pairs showed statistically strong fitted (AUC > 0.9) and most accurate (95%) interactions (Fig. 5b and solid edges in Fig. 5f), and provided 21 ‘candidate genes’ (Fig. 5c and f) associated with neurological insults (Fig. 5f) in COVID-19. All 21 ‘candidate genes’ (Fig. 5c) appeared to be derived from RNA-Seq data (Fig. 4b) and thus considered as co-expressed genes of COVID-19 in the brain.

All 21 gene-pairs/PPIs of ‘candidate genes’ showed SSS-I values (Figs. 3d, 4d, vide Point 4.2 in the results section) above the respective threshold value and therefore represented as ‘is_a’ functional relationship (Table 2) in the semantic similarity of GO-BP annotations for generalised pathophysiological functions irrespective of disease. Based on the threshold value of integrated PPI scores (WHMS), 21 pairwise ‘candidate genes’ were classified as ‘prevalent’ and ‘non-prevalent’ ‘candidate genes’ (Table 2). Six pairs of seven ‘prevalent’ ‘candidate genes’ showed strong database-dependent putative interaction scores (SPPICS) (Figs. 3a and 4b) and subsequently satisfied SSS-II values (Figs. 3e and 4e) above the threshold levels representing ‘is_a’ relationship (Table 2) with neuro-pathological manifestations in COVID-19. The ‘non-prevalent’ ‘candidate genes’ found to have varied SPPICS scores (strong and weak) and different relationships (‘is_a’ and ‘part-of’) among their gene-pairs (Table 2). The ‘prevalent’ ‘candidate genes’ (ADAM10, ADAM17, AKT1, CTNNB1, ESR1, FGFR1, PIK3CA) might have the most prominent pathophysiological relevance in COVID-19.

The pathophysiological action of SARS-CoV-2 in brain tissue cells begins with its binding to ACE2 receptors of the cell membrane. After viral endocytosis is over, ADAM17 directs the shedding of the ectodomain of the receptors31 and enhances the formation of TNF-α leading to escalation of the cytokine storm1. Dysfunction of ADAMs can also exacerbate Alzheimer’s disease condition through the misfolded Aβ pathology32, ischaemic stroke33 and vascular thrombosis34 via ACE2 and TNF-α receptors. Recently, ADAM10 and ADAM17 have been marked as the risk factors for cerebral infarction and hippocampal sclerosis related epilepsy35, respectively. In diabetic patients, an elevated activity of ADAM17 is found to enhance COVID-19 susceptibility36 through the AKT1mediated pathway.

AKT1 encodes protein kinase B, which is a part of the PI3K-NFκβ signalling pathway, involved in aberrant expression of IL10 and inflammation in severe coronavirus infection32. AKT1 can induce tumour formation through the upregulation of RNA binding protein EIF4G137, coronavirus exit from endosomes via valosin-containing protein VCP38 and MAPT-associated tau protein formation in dementia-like cognitive impairment39. The altered AKT1-signalling pathway is also evident in ATM-associated autism spectrum disorders that may exaggerate COVID-1940.

CTNNB1 expresses β-catenin related to the Wnt-signalling pathway and gets downregulated in COVID-1941 through the activation of glycogen synthase kinase 3β in the prefrontal cortex and dorsal hippocampus42. Defects in the formation of β-catenin cause disruption of the blood–brain barrier43 leading to the development of cerebrovascular thrombosis44, headache45, stroke46 and epileptic seizure47 during or in the aftermath of COVID-19. Stress-induced Dickkopf-1 protein formation prevents CTNNB1 gene function in the hippocampus, thereby impairing memory48. Uncontrolled interactions of CTNNB1 with PSEN149 and GLI250 are linked to skin tumorigenesis, which may be suggestive for their possible involvement in COVID-19. Moreover, abnormalities in PSEN151 and GLI252 functions, associated with the CTNNB1 gene are likely to be implicated in developing Alzheimer’s disease- and holoprosencephaly-like features in COVID-19.

ESR1 gene encodes estrogen receptor 1 that occurs primarily in the medial preoptic area and ventromedial nucleus of the hypothalamus, which regulates diverse reproductive functions of both males and females53. ESR1 deems to share CTNNB1-54 and AKT1-55 mediated signalling pathways to accelerate cancer and neurodegeneration, respectively. Moreover, estrogen inhibits inflammation and immune responses in COVID-19 and reduces the COVID-19 susceptibility in females than in males, because of its higher concentration and a greater number of ESR1 receptors in target tissues56.

In the adult brain, the PIK3CA gene product PI3K via the AKT1-pathway may exaggerate neurodegeneration in Alzheimer’s disease57, and FGFR1 dysregulation leads to ischemic stroke58 and holoprosencephaly59. Moreover, synchronised PIK3CA mutation and FGFR1 alteration are associated with ESR1-positive breast cancer60. Since COVID-19 develops inflammatory burst and lymphopenia, SARS-CoV-2-associated illness therefore may aggravate cancer prognosis59.

Notably, two prevalent genes CTNNB1 and AKT1 appeared to be common for both TN (Fig. 3a) and CGN (Fig. 4b). Both genes showed SSS-II (network-specific semantic similarity score) values greater than threshold values in respective cases (Figs. 3e and 4e), and therefore functionally interlinked (Fig. 5f). CTNNB1 appeared as the lone gene having both HB and driver node properties in TN. Interestingly, CTNNB1 was the only gene which formed a ‘tripartite open network’ that linked with eight symptoms and those symptoms remained connected with eight diseases (Fig. 3a). CTNNB1 in TN got connections with (a) five symptoms (viz. cerebral ischemia, vascular thrombosis, intracranial hypertension, seizures and epileptic seizures) in the central nervous system (CNS), (b) two symptoms (viz. hypertonia and fatigue) in the peripheral nervous system (PNS) and (c) one psychiatric symptom (viz. behavioral disorder). Moreover, it demonstrated that three symptoms connected with CTNNB1 in the present tripartite network, also happened to occur in other diseases, coinfected with COVID-19, viz. (a) cerebral ischemia in alobar, lobar and semilobar holoprosencephaly, Behçet disease, early infantile epileptic encephalopathy, MELAS and meningioma; (b) vascular thrombosis in alobar, lobar and semilobar holoprosencephaly, amyotrophic lateral sclerosis and MELAS; (c) intracranial hypertension in MELAS. But no data is available yet about the rest of the five symptoms in any other diseases challenged none-ever with SARS-CoV-2. This suggests that certain neurological symptoms of COVID-19 are intermingled with other diseases and need special clinical attention.

In conclusion, the present study, however, suffers from two limitations regarding the (a) status of COVID-19 patients who had mixed implications of neurological symptoms/manifestations during hospitalisation in most cases, long-term reports in few cases and without having any detail in other cases as reported in the literature (Table 1), and (b) use of a small cohort of a transcriptomic dataset of patients having SARS-CoV-2 viruses in brain autopsy samples23, available only at the time of study period.

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *