Get In Touch With Our Team

Only A better you can make this world a better place
We are here to change things and Together We could do so much more.
Change is a team sport and We will be in honour to be in team with you.
#ChangeForABetterYOU

Contact Info
Orbuculum
Bengaluru, India
hi@orbuculum.xyz
Follow Us

Deep learning
based preventive solution for COVID-19

Abstract

With the pandemic outbreak of COVID-19, it’s imperative to come up with preventive methods that can help control the outbreak and protect people. Our research focuses on coming up with preventive solutions to fight against this virus. One of the vital parts of prevention is identification of receptor to which the virus’s Spike protein can bind to in human body. With our AI network we were able to predict this receptor as ACE2 with an accuracy 97.7% and its performance has been validated thoroughly. Using this insight we were able to spot out the target organs which are highly susceptible to COVID-19’s attack. By collaborating with a senior Ayurvedic doctor and scientist, we were able to combine our findings about target organs with his existing research
on herbs that can specifically boost the immunity of organs, and select herbs that only focuses on target organs of COVID-19 in a much more effective manner compared to existing immunity boosters. Using his proprietary formulation and extraction pipeline these herbs were processed into tablets ensuring maximum potency. These preventive supplements are available under the
brand name of GenoVeda Saar.

Introduction

Coronaviruses (CoV) are a group of common, ancient, and diverse viruses. They infect many mammalian and avian species and cause respiratory, gastrointestinal, and central nervous system diseases [1, 2].

COVID-19 is a member of Coronavirus, first identified in Wuhan seafood market, China. Because of its low pathogenicity and highly contagious nature WHO declared COVID-19 as a pandemic event. It is vital to understand the reasons behind the pandemic behaviour of COVID-19 Coronavirus to contain it.

One of the vital steps in understanding the viral infections to the host cells is through virus receptor identification. It can indicate a range of viral hosts and inter species infection possibilities and a primary target for antiviral intervention. Coronaviruses recognize an eclectic collection of host receptors. Hence, they can infect a diverse set of hosts and which is a threat to human and animals.

Coronaviruses belong to the Coronaviridae family in the order of Nidovirales. They can be classified into at least three major genera, α, β, and γ(3) as show in Figure1.

    1. Prototypic α-genus coronaviruses include human coronavirus NL63 (HCoV-NL63), porcine transmissible gastroenteritis coronavirus (TGEV), and porcine respiratory coronavirus (PRCV).
    2. Prototypic β-genus coronaviruses include severe acute respiratory syndrome coronavirus (SARS-CoV), Middle East respiratory syndrome coronavirus (MERS-CoV), mouse hepatitis coronavirus (MHV), and bovine coronavirus (BCoV).

Figure 1: Receptor recognition pattern for different classes of Coronaviruses

Figure 2: Showing sequence alignment across different mammalian histone proteins with conserved and non-conserved regions.

  1. 3. Prototypic γ-genus coronaviruses include avian infectious bronchitis virus (IBV).

Contemporary research is trying to understand the evolution of the COVID-19 virus using sequence alignment methods, In this method, the degree of similarity between amino acids occupying a particular position in the sequence can be interpreted as a rough measure of how conserved a particular region or sequence motif is among lineages. The absence of substitutions, or the presence of only very conservative substitutions (that is, the substitution of amino acids whose side chains have similar biochemical properties) in a particular region of the sequence, suggest that this region has structural or functional importance as shown in Figure1.

However, sequence alignments are

  • Missing positional dependence among amino acids in a given sequence
  • Missing Patterns that are globally present but locally seems like a noise across multiple sequences
  • Computationally expensive for comparing a given sequence with multiple sequences
  • Unable to capture all the variations in the protein sequences of a single classes obtained from different samples

Furthermore, it is evident from the Figure1 that proteins within the same classes have different receptor binding functionality whereas proteins across different classes have common receptor functionality. So, when proteins are treated similar, using sequences alignment methods it’s diffcult to ascertain that they’ll have a high functional similarity.

We propose a new holistic feature representation and train a Neural network to predict the receptor of COVID-19 Spike protein will bind to in humans.

Data preparation

All Orthocoronavirinae protein sequences were downloaded from UniportKB database. The following organisms belonging to α, β, and γgeneraasshowinF igure1arefilteredfromtheentiredata:

  • Infectious bronchitis virus
  • Human coronavirus NL63 (HCoV-NL63)
  • Transmissible gastroenteritis virus
  • Porcine respiratory coronavirus
  • Bovine coronavirus
  • Middle East respiratory syndrome-related coronavirus
  • Murine hepatitis virus
  • SARS coronavirus

These Protein sequences are grouped based on their receptor type namely

  1. Binds sugar
  2. Binds APN
  3. Binds DPP4
  4. Binds ACE2

Due to lack of number of samples in Binds CEACAM1 receptor it was not included in the
model data creation

Network training and data preparation

Whole data is split into three portions train, valid and test. The class with a minimum number of samples(Cminsample) is identified and 90% of Cminsample are taken from each class to form trainValid data and remaining 10% of Cminsample are added to test data. From trainValid data, each class’s 70% of the samples go to the train set and 30% to the validation set. The Network is trained on train set, validated on validation set and tested on testset.

Results

Trained Neural network is able to predict the receptor to which COVID-19’s Spike protein can bind to humans as Binds ACE2 with 97.7% accuracy. The following section provides details on how neural network’s performance is validated using different standard metrics


Figure 3: Neural Network’s training and validation losses reducing consistently during training.

Training and Validation loss

Loss is defined as a metric to capture the deviation of the network prediction from the ground truth label. For example given a protein sequence with sugar receptor if the network predicts it to have ACE2 receptor, then the loss is high since the network wrongly predicted the protein receptor functionality

During the training process a network tries to learn the protein receptor specific patterns which will help it to identify the receptor from the unknown protein sequence once it is trained. So during the training process the loss should decrease(because the network is learning patterns thereby making correct predictions and hence the loss will decrease)

Confusion Matrix Confusion matrix helps in capturing following aspects of the trained network

  • True Positive(TP)
  • False Negative(FN)
  • False Positive(FP)
  • True Negative(TN)

True positive: is where a network would predict a protein sequence has sugar receptor functionality and the ground truth is also same

False negative: is where a network would predict a protein sequence has ACE2 receptor functionality and the ground truth is that protein has sugar as receptor

False positive: is where a network would predict a protein sequence has sugar receptor functionality and the ground truth is that protein has ACE2 as receptor

Figure 4: Neural Network’s Confusion matrix on test data |

False negative: is where a network would predict a protein sequence has ACE2 receptor functionality and the ground truth is that protein has ACE2 as receptor All the values along the diagonal of the confusion matrix represents True positives for the respective classes.

Recall

Recall is the proportion of True positives that are identi ed correctly by network

Network’s recall for each of receptor classes:

  • Binds ACE2: 0.99
  • Binds APN: 0.98
  • Binds DPP4: 0.99
  • Binds sugar: 0.99
F1 Score

F1_score is an harmonic mean between recall and precision. Where precision captures the proportion of the networks positive predictions that are actually correct

Network’s recall score for each of the receptor classes

  • Binds ACE2: 0.97
  • Binds APN: 0.98
  • Binds DPP4: 0.97
  • Binds sugar: 0.98
Log_Loss

One of the ways of enhancing the prediction confidence of the network for correct receptor is by having maximum prediction probability for the correct receptor which automatically reduces the prediction probabilities for other receptors. LogLossScore measures this separation between correct receptor and other receptor prediction probabilities. Higher the separation between correct receptor and other receptors lower will be LogLossScore. LogLossScore varies between 0 and 1. A LogLossScore of 0 indicates maximum separation across classes. Network achieved a LogLossScore score of 0.08

Conclusion

The AI Neural network can predict the receptor to which COVID-19‘s Spike protein bind to with high confidence and robust validation.These results align with the experimentally validated results by researchers[4]. We hope the identification of this receptor gives researchers a new direction in not only working on suitable drug discovery for fighting against COVID-19 but also identifying other non-human hosts which susceptible to COVID-19 so that human exposure to these hosts can be limited to contain the spread of COVID-19 Our future research will be focused on predicting the mutation of the COVID-19 protein sequences. Which not only aids researches in formulating better drugs but also in descovering the potential lethalities of COVID-19. Furthermore, this can also help researchers in tracing accurate animal source of origin of the virus.

Orbuculum’s actionable outcomes from research in-sights
Targeted organs and distribution of receptors to which COVID-19 can bind

We have found other research works supporting our insights about ACE2 receptor bindings that the expression and distribution of the ACE2 in human body may indicate the potential infection routes of COVID-19. Through the developed single-cell RNA sequencing (scRNA-Seq) technique and single-cell transcriptomes based on the public database, researchers analyzed the ACE2 RNA expression profile at single-cell resolution. High ACE2 expression was identified in type II alveolar cells (AT2) of lung, oesophagus upper and stratified epithelial cells, absorptive enterocytes from ileum and colon, cholangiocytes, myocardial cells, kidney proximal tubule cells, and bladder urothelial cells. These findings indicated that those organs with high ACE2-expressing cells should be considered as potential high risk for COVID-19 infection.[5]

Designing novel preventive solution for COVID-19:

Orbuculum’s Cronic disease prediction product can be also used to predict risk of COVID-19 infection using the user’s DNA sequence data. Furthermore, using these research insights we have come up with a new preventive solution for COVID-19. Generally Ayurvedic literature mentions mostly about non specific immunity boosters. These are quite generalised and not very effective in dealing with viruses with relatively high pathogenicity.Although there are very few specific immunity boosters but they are only used in case of certain diseases such as Asthma. Another quite popular generalised immunity booster is Vitamin C , but it has been disregarded after a lot of research because it was quite ineffective in actually boosting immunity and fighting pathogens.

Our collaborator is an senior Ayurvedic scientist and doctor, in his research experience of more than 20 years, he has already identified herbs that are capable of specifically boosting the innate and adaptive immunity of individual organs. After collaborating with him, We were able to combine our findings with his research and develop a formulation that can specifically boost the immunity of target organs by COVID-19. To ensure the medicines can reach to maximum potency, He designed a proprietary formulation and extraction pipeline in which medicines can be made without any artificial or binding element and ensuring high potency for it to work against deadly viruses. The in depth explanation behind the novel approach for targeted immunity enhancement will be explained in a separate document.

References
  1. Perlman S, Netland J. 2009. Coronaviruses post-SARS: update on replication and pathogenesis. Nat Rev Microbiol 7:439{450. doi:10.1038/nrmicro2147.
  2. Li WH,Wong SK, Li F, Kuhn JH, Huang IC, Choe H, Farzan M. 2006. Animal origins of the severe acute respiratory syndrome coronavirus: insight from ACE2-S-protein interactions. J Virol 80:4211{4219.doi:10.1128/JVI.80.9.4211-4219.2006
  3. González JM, Gomez-Puertas P, Cavanagh D, Gorbalenya AE, Enjuanes L. 2003. A comparative sequence analysis to revise the current taxonomy of the family Coronaviridae. Arch Virol 148:2207{2235.doi:10.1007/s00705-003-0162-
  4. Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation
  5. High expression of ACE2 receptor of 2019-nCoV on the epithelial cells of oral mucosa