CORONAVIRUSES
Coronaviruses, particularly betacoronaviruses, belong among usual causative agents of the common cold and respiratory infection symptoms, together with common cold picornavirus, influenza viruses, adenoviruses, human respiratory syncytial virus, and parainfluenza viruses. Newly appeared strains caused epidemics of severe acute respiratory syndrome (SARS), virus SARS-CoV-1 in 2003 and virus SARS-CoV-2 in 2019 (reviewed here).
PANDEMIC ORIGIN
In view of 15 million diagnosed cases and 600,000 death caused by CoV-2 by 20 July 2020, tens of thousands of publications try to elucidate the origin of the new coronavirus, which is known to be hosted by many mammals, and specifically by bats and pangolins in China. The virus has been broadly studied in China since 2003, and many new strains have been reported, some of them artificially prepared (here). Attempts to explain the path from an animal host to human have not been successful. The closest animal coronavirus published so far, Bat CoV RaTG13, has 96% RNA homology (also called % similarity or % identity). As the epidemic started close to the high security Virology Institute in Wuhan, an escape route from the laboratory is among the considered scenarios, also in view of the numerous such cases and the low level of security in the Chinese laboratories (here, here), and in view of the known cases of selling the animals from the labs to meat market (here); an escape was considered also by the director of the coronavirus laboratory in Wuhan, Dr. Zheng-Li Shi (here).
POSSIBLE UNIQUE STRUCTURAL FEATURES IN SARS-CoV-2
Naturally, the first structural feature considered as important for high infectivity is the protein sequence of the spike protein, which binds to the human receptor ACE2. It was found that an insertion of 12-nucleotides into the viral RNA resulted in four extra amino acids in positions 681-684 of the spike protein which may have improved contagiousness of the virus; these four extra residues are unique to this human virus and are not found in any other species (here). Additional elements seem to be highly important for the virus, including the envelope protein E, which seems very conservative, many coronaviruses having 100% homology in the protein amino acid sequence, but CoV-2 seems to be rather different (here).
Thus,
CoV-2 does differ from other coronaviruses in various aspects. Other regions of
the viral RNA have been considered as the source of CoV-2 special properties,
including the first 265 bases in the 5’-untranslated region (5’-UTR). The
current text examines whether the SARS-CoV-2 virus
differs from other coronaviruses in the mutation rate (mutation
extent, diversity) of its the 5’-UTR.
COMPARING THE MUTATION RATE IN THE 5’-UTR WITH THE MUTATION RATE IN THE WHOLE
GENOME FOR SEVERAL CORONAVIRUSES
The sequences of several coronaviruses are compared here using BLAST of NCBI (here), and the divergence among their 5’-UTR segments is compared with the divergence among their whole genomes, thereby examining the question whether the starting segment of 260 bases is more conserved than the whole genome in coronaviruses.
The following
11 frequently mentioned coronaviruses are considered (accession numbers and the
publishing date are given):
1. SARS-CoV-2
(MT192773),
Mar 2020,
2. Bat coronavirus
(DQ648857),
2005,
3. Bat SARS-like
coronavirus (GQ153547),
2010,
4. SARS-CoV-2
(MT764166.1),
Jul 2020,
5. Bat CoV
RaTG13 (MN996532.1),
Jan 2020, so far closest to CoV-2,
6. Human coronavirus
OC43 (NC005147.1),
2005, mild common cold symptoms,
7. SARS-CoV-1
SIN25000 (AY283794.1),
2003, 1st SARS epidemic,
8. Human CoV
229E (MF542265.1),
2016, mild common cold symptoms.
9. Bat SARS-like
coronavirus (MG772934.1),
2018,
10. Bat SARS-like
coronavirus SHC014 (KC881005),
2013, replicates in human but is not virulent,
11. Mouse SARS-like
coronavirus SARS-CoV MA-15 (DQ49700.8), 2007, virulent
in mouse and converted to human-virulent by
incorporating the spike from bat SHC014,
making chimera SHC014-MA15 in 2015.
Doublets
formed from the above genomes, xth and yth, are compared below (x,y) by BLAST to
obtain the homology (% identity) in their whole genomes, as well as in their
first 260 bases. The genome sizes n1 and n2 of the compared viruses are given below (n1/n2), followed by homology h1% in the whole sequence, and homology h2% in
the 260 base segment of the 5’-UTR. The mutation extent may be characterized by
% divergence, i.e. % fraction of differing sequences (1 – h); so that if
homology is 90%, the divergence is 10%. Ratio R of the divergence in the 260
base segment and the divergence in the whole RNA genome is calculated and given in the
curly brackets below:
R = (100 -
h2)/(100 - h1).
In the
first stage, CoV-2 is compared with several other coronaviruses, and in the
second stage, various non-CoV-2 coronaviruses are compared to each other:
CoV-2 vs others
-------------------------------------------------------------------------------------------------------------
(x,y) n1/n2 h1 h2 R
-------------------------------------------------------------------------------------------------------------
(1,2) 29890/29741 81.12% 90.31% {0.51}
(1,3) 29890/29704 80.85% 90.31% {0.60}
(1,4) 29890/29902 99.92% 98.85%
(1,5) 29890/29855 96.11% 96.75% {0.84}
(1,6) 29855/30738 65.28% n.d.
(1,7) 29890/29711 80.26% 90.16% {0.50}
(4,7) 29902/29711 80.24% 89.39% {0.54}
(1,8) 29890/27271 64.19% n.d.
(1,9) 29890/29732 87.22% 93.75% {0.49}
(1,10) 29890/29787 80.56% 89.49% {0.54}
(1,11) 29890/29726 80.24% 89.88% {0.52}
-------------------------------------------------------------------------------------------------------------
Non-CoV-2 vs
each other
-------------------------------------------------------------------------------------------------------------
(x,y) n1/n2 h1 h2 R
-------------------------------------------------------------------------------------------------------------
(2,3) 29741/29704 90.32% 96.11% {0.40}
(2,5) 29741/29855 80.90% 89.87% {0.53}
(2,7) 29741/29711 89.59% 95.51% {0.43}
(2,9) 29741/29732 82.03% 88.49% {0.64}
(3,5) 29704/29855 80.83% 89.54% {0.55}
(3,7) 29704/29711 89.25% 97.17% {0.26}
(3,9) 29704/29732 83.21% 88.03% {0.71}
(5,6) 29855/30738 65.50% n.d.
(5,7) 29855/29711 80.12% 89.96% {0.51}
(5,9) 29855/29732 87.13% 95.12% {0.38}
(5,10) 29855/29787 80.45% 88.94% {0.57}
(5,11) 29855/29726 80.10% 89.36% {0.53}
(6,7) 30738/29711 66.35% n.d.
(6,8) 30738/27271 65.89% n.d.
(7,9) 29711/29732 81.26% 87.76% {0.65}
It can be
seen that the common cold-like viruses 6 and 8 are most different from all
other viruses and from each other as well, corresponding to their great
evolutionary distance; BLAST could not determine homology % for their short 260
segments (n.d.). The closest to each other, of course, are two CoV-2, viruses 1
and 4, even though they are not identical.
The mutual
homologies (% identities) h1 among the genomes of different coronavirus species
are in the range of 65% to 96% (= 4% to 35% divergences). The two closest species
in the group (except for two CoV-2 viruses 1 and 4 having 99.9% homology) are
human SARS-CoV-2 and bat CoV-RaTG13 (viruses 1,5), having 96.1% common
sequences. Such a difference in coronaviruses may correspond up to about 100
years of normal separate evolution (here),
but quicker events can be considered, including recombination, accelerated
mutation rate, or artificial intervention.
The homologies
h2 among the 5’-UTR segments are always higher than
corresponding h1 values (except for doublets comprising two
CoV-2 strains or remote viruses 6 and 8). Shortly, the coronavirus mutation
extent of the 5’-UTR is lower than the mutation extent of the whole genome, which
confirms the importance of the starting segment.
COMPARING COV-2 WITH OTHER CORONAVIRUSES IN REGARD TO THE 5’-UTR DIVERGENCE
Sequence
divergence % of 5’-UTR
When
comparing the mutation extents in the 5’-UTR and the whole genome, (1 - h1)/(1
- h2), the ratios R of about 0.5 are obtained, showing that the RNA mutations
occur in the initial segment twice as slowly as in the whole genome.
Specifically, R values comprising CoV-2 are in the range of 0.49 to 0.84, the
mean value being 0.57; R values comprising only non-CoV-2 viruses are in the
range of 0.26 to 0.71, the mean value being 0.52. Thus, CoV-2 exhibits slightly
higher mutation extent in the 5’-UTR than the other coronaviruses, but the
difference is not too significant. The difference of 0.05 (DR=0.57-0.52) between the
group comprising CoV-2 and the group comprising only other viruses is too small
in relation to the whole observed R range of 0.26 to 0.84; moreover, the ranges
of both groups, 0.26-0.71 and 0.49-0.84, broadly overlap.
Importantly,
R for doublet (1,7) is 0.50, and R for doublet (4,7) is 0.54. So that DR for two groups that both
comprise CoV-2 (two different strains of CoV-2) is 0.04. Consequently, the
difference DR of
0.05 for two groups, of which one comprises CoV-2 and one not, is not
significant.
By the
way, two strains of CoV-2 (1,4) differ in their 5’-UTR segments more than in
their whole genomes, which may or may not result from slight sequencing errors.
Thus, SARS-CoV-2 does not differ from other coronaviruses in the mutation rate of its
5’-UTR, when measured by the sequence divergence of 5’-UTR relatively to the
whole genome.
Nucleotide
replacements in 5’-UTR
The numbers
of base changes (NBC) in the first 260 bases were compared as follows. A doublet
from the CoV-2 group, and a doublet from the non-CoV-2 group to be compared, were
chosen, so that both have nearly the same overall genome homology h1; the NBC were then calculated for each of the doublets from the values of h2. For
example, CoV-2 comprising doublet (1,9) has h1 of 87.22%, and non-CoV-2
comprising doublet (5,9) has nearly the same h1 of 87.13%; the NBC values are
calculated from h2 = (100-h2)*260/100,
namely:
NBC(1,9) = (100-93.75)*260/100
= 16 for CoV-2 doublet, and
NBC(5,9) = (100-95.12)*260/100
= 13 for non-CoV-2 doublet.
It means
that SARS-CoV-2 differs from bat coronavirus MT192773 in 16 bases of 260 in the
5’-UTR, whereas two other coronaviruses 5 and 9 (having also about 87% genome
homology) differ from each other in 13 bases of 260 in the 5’-UTR.
Four possible doublets among the considered cases provided four comparisons as follows:
(1,2) of 81.12
%h versus (7,9) of 81.26 %h: NBC = 25 bases for CoV-2 versus 32 for non-CoV-2
(1,3) of 80.85
%h versus (3,5) of 80.83 %h: NBC = 25 bases for CoV-2 versus 27 for non-CoV-2
(1,7) of 80.26
%h versus (5,7) of 80.12 %h: NBC = 26 bases for CoV-2 versus 26 for non-CoV-2
(1,9) of 87.22
%h versus (5,9) of 87.13 %h: NBC = 16 bases for CoV-2 versus 13 for non-CoV-2
When comparing CoV-2 with non-Cov-2, the divergence of 5’-UTR was higher in CoV-2 in 1 case (16 bases versus 13 bases), was the same in CoV-2 and non-CoV-2 in 1 case (26 versus
26), and was lower in CoV-2 in 2 cases (25 versus 27, and 25 versus 32). All these
differences between CoV-2 and non-Cov-2 are in accordance with random changes,
and the differences do not imply unexpected increased mutation changes in
5’-UTR of CoV-2 (for example, when the probabilities of the base differences are
evaluated by using the Poisson distribution, or otherwise).
So, SARS-CoV-2 does not differ from other coronaviruses in the mutation rate of its
5’-UTR, when measured by the number of base changes.
Insertions – deletions
The alignment of 5’-segments of 260-bases shows, for CoV-2
virus 1 and non-CoV-2 virus 2, one 2-base deletion and one 1-base insertion,
beside 25 single base replacements. The alignment for two non-CoV-2 viruses,
viruses 7 and 9, shows one 2-base deletion, one 2-base insertion, and one
one-base insertion, beside about 32 single base replacements.
SARS-CoV-2 does not seem to differ from other coronaviruses
in the mutation rate of its 5’-UTR, when assessed by the deletion-insertion
events in 5’-UTR.
CONCLUSIONS
The origin of SARS-CoV-2 has not been explained so far, the same as the origin of SARS-CoV-1. Although escape of CoV-2 from one of the Wuhan labs seems hardly refutable, the origin of its genome remains unclear. The genome may have been artificially edited or not; many publications relate to the mysterious origin of the virus (for example, here, here, here, here, here), and while not supporting a possible artificial intervention in its structure, their findings do not disprove such intervention, and still less an eventual lab escape.
Whatever
the origin of the CoV-2 genome sequence, the comparison of the mutation rate in
its 5’-untranslated region with other coronaviruses does not indicate any
unexpected difference. The mutation is about twice slower in the 5’-UTR than in
the whole genome for all checked coronaviruses, but the results do not indicate
that SARS-CoV-2 is less conservative in its 5’-UTR than other coronaviruses.
No comments:
Post a Comment