To avoid artefacts due to recombination, we focused on NRR1 and NRR2 and the recombination-masked alignment NRA3 to infer time-measured evolutionary histories. Based on the identified breakpoints in each genome, only the major non-recombinant region is kept in each genome while other regions are masked. Google Scholar. Effect of closure of live poultry markets on poultry-to-person transmission of avian influenza A H7N9 virus: an ecological study. c, Maximum likelihood phylogenetic trees rooted on a 2007 virus sampled in Kenya (BtKy72; root truncated from images), shown for five BFRs of the sarbecovirus alignment. Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. RegionB showed no PI signals within the region, except one including sequence SC2018 (Sichuan), and thus this sequence was also removed from the set. 5 (NRR1) are conservative in the sense that NRR1 is more likely to be non-recombinant than NRR2 or NRA3. Discovery and genetic analysis of novel coronaviruses in least horseshoe bats in southwestern China. 62,63), the GTR+ model and 100bootstrap replicateswas inferred for each BFR >500nt. Epidemiology, genetic recombination, and pathogenesis of coronaviruses. Lancet 395, 949950 (2020). We compiled a dataset including 27human coronavirus OC43 virus genomes and ten related animal virus genomes (six bovine, three white-tailed deer and one canine virus). Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins. The Artic Network receives funding from the Wellcome Trust through project no. J. Virol. Evol. Specifically, using a formal Bayesian approach42 (see Methods), we estimate a fast evolutionary rate (0.00169 substitutions per siteyr1, 95% highest posterior density (HPD) interval (0.00131,0.00205)) for SARS viruses sampled over a limited timescale (1year), a slower rate (0.00078 (0.00063,0.00092) substitutions per siteyr1) for MERS-CoV on a timescale of about 4years and the slowest rate (0.00024 (0.00019,0.00029) substitutions per siteyr1) for HCoV-OC43 over almost five decades. Intragenomic rearrangements involving 5-untranslated region segments in SARS-CoV-2, other betacoronaviruses, and alphacoronaviruses, Crystal structure of the CoV-Y domain of SARS-CoV-2 nonstructural protein 3, Association of underlying comorbidities and progression of COVID-19 infection amongst 2586 patients hospitalised in the National Capital Region of India: a retrospective cohort study, Molecular characterization of horse nettle virus A, a new member of subgroup B of the genus Nepovirus, Molecular phylogeny of coronaviruses and host receptors among domestic and close-contact animals reveals subgenome-level conservation, crossover, and divergence. Extended Data Fig. For coronaviruses, however, recombination means that small genomic subregions can have independent origins, identifiable if sufficient sampling has been done in the animal reservoirs that support the endemic circulation, co-infection and recombination that appear to be common. Its origin and direct ancestral viruses have not been . This leaves the insertion of polybasic. Researchers in the UK had just set the scientific world . Open reading frames are shown above the breakpoint plot, with the variable-loop region indicated in the Sprotein. 5. & Li, X. Crossspecies transmission of the newly identified coronavirus 2019nCoV. Extended Data Fig. The SARS-CoV divergence times are somewhat earlier than dates previously estimated15 because previous estimates were obtained using a collection of SARS-CoV genomes from human and civet hosts (as well as a few closely related bat genomes), which implies that evolutionary rates were predominantly informed by the short-term SARS outbreak scale and probably biased upwards. NTD, N-terminal domain; CTD, C-terminal domain. Nat. As illustrated by the dashed arrows, these two posteriors motivate our specification of prior distributions with standard deviations inflated 10-fold (light color). Grey tips correspond to bat viruses, green to pangolin, blue to SARS-CoV and red to SARS-CoV-2. However, inconsistency in the nomenclature limits uniformity in its epidemiological understanding. PubMed performed codon usage analysis. According to GISAID . Coronavirus: Pangolins found to carry related strains. Novel Coronavirus (2019-nCoV) Situation Report 1, 21 January 2020 (World Health Organization, 2020). Maciej F. Boni, Philippe Lemey, Andrew Rambaut or David L. Robertson. [12] Biol. Gorbalenya, A. E. et al. & Holmes, E. C. A genomic perspective on the origin and emergence of SARS-CoV-2. 5). Divergence dates between SARS-CoV-2 and the bat sarbecovirus reservoir were estimated as 1948 (95% highest posterior density (HPD): 18791999), 1969 (95% HPD: 19302000) and 1982 (95% HPD: 19482009), indicating that the lineage giving rise to SARS-CoV-2 has been circulating unnoticed in bats for decades. Adv. Evolutionary rate estimation can be profoundly affected by the presence of recombination50. This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. Biol. acknowledges support by the Research FoundationFlanders (Fonds voor Wetenschappelijk OnderzoekVlaanderen (nos. Visual exploration using TempEst39 indicates that there is no evidence for temporal signal in these datasets (Extended Data Fig. 1. It is available as a command line tool and a web application. =0.00025. Time-measured phylogenetic reconstruction was performed using a Bayesian approach implemented in BEAST42 v.1.10.4. All sequence data analysed in this manuscript are available at https://github.com/plemey/SARSCoV2origins. 36) (RDP, GENECONV, MaxChi, Bootscan, SisScan and 3SEQ) and considered recombination signals detected by more than two methods for breakpoint identification. 88, 70707082 (2014). This boundary appears to be rarely crossed. 382, 11991207 (2020). 4 we compare these divergence time estimates to those obtained using the MERS-CoV-centred rate priors for NRR1, NRR2 and NRA3. Microbiol. MERS-CoV data were subsampled to match sample sizes with SARS-CoV and HCoV-OC43. For the HCoV-OC43, MERS-CoV and SARS datasets we specified flexible skygrid coalescent tree priors. . D.L.R. 30, 21962203 (2020). As informative rate priors for the analysis of the sarbecovirus datasets, we used two different normal prior distributions: one with a mean of 0.00078 and s.d. After removal of A1 and A4, we named the new region A. Press, 2009). Regions AC were further examined for mosaic signals by 3SEQ, and all showed signs of mosaicism. S. China corresponds to Guangxi, Yunnan, Guizhou and Guangdong provinces. =0.00075 and one with a mean of 0.00024 and s.d. Now, the two researchers used genomic sequencing to compare the DNA of the new coronavirus in humans with that in animals and found a 99% match with pangolins. Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. Proc. Despite the high frequency of recombination among bat viruses, the block-like nature of the recombination patterns across the genome permits retrieval of a clean subalignment for phylogenetic analysis. Collectively our analyses point to bats being the primary reservoir for the SARS-CoV-2 lineage. To estimate non-synonymous over synonymous rate ratios for the concatenated coding genes, we used the empirical Bayes Renaissance countingprocedure67. The first available sequence data6 placed this novel human pathogen in the Sarbecovirus subgenus of Coronaviridae7, the same subgenus as the SARS virus that caused a global outbreak of >8,000 cases in 20022003. Anderson, K. G., Rambaut, A., Lipkin, W. I., Holmes, E. C. & Garry, R. F. The proximal origin of SARS-CoV-2. Instead, similarity in codon usage metrics between the SARS-CoV-2 and eukaryotes analyzed was correlated with coding sequence GC content of the eukaryote, with more similar codon usage being identified in eukaryotes with low GC content similar to that of the coronavirus (b). Viruses 11, 979 (2019). 56, 152179 (1992). GARD identified eight breakpoints that were also within 50nt of those identified by 3SEQ. and JavaScript. Google Scholar. The presence in pangolins of an RBD very similar to that of SARS-CoV-2 means that we can infer this was also probably in the virus that jumped to humans. performed Srecombination analysis. 2, vew007 (2016). PLoS Pathog. In our second stage, we wanted to construct non-recombinant regions where our approach to breakpoint identification was as conservative as possible. the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Maclean, O. In such cases, even moderate rate variation among long, deep phylogenetic branches will substantially impact expected root-to-tip divergences over a sampling time range that represents only a small fraction of the evolutionary history40. Alternatively, combining 3SEQ-inferred breakpoints, GARD-inferred breakpoints and the necessity of PI signals for inferring recombination, we can use the 9.9-kb region spanning nucleotides 11,88521,753 (NRR2) as a putative non-recombining region; this approach is breakpoint-conservative because it is conservative in identifying breakpoints but not conservative in identifying non-recombining regions. Aiewsakun, P. & Katzourakis, A. Time-dependent rate phenomenon in viruses. Identifying the origins of an emerging pathogen can be critical during the early stages of an outbreak, because it may allow for containment measures to be precisely targeted at a stage when the number of daily new infections is still low. We thank T. Bedford for providing M.F.B. In the absence of any reasonable prior knowledge on the TMRCA of the sarbecovirus datasets (which is required for grid specification in a skygrid model), we specified a simpler constant size population prior. This produced non-recombining alignment NRA3, which included 63 of the 68genomes. A., Lytras, S., Singer, J. N. China corresponds to Jilin, Shanxi, Hebei and Henan provinces, and the N. China clade also includes one sequence sampled in Hubei Province in 2004. The coronavirus genome that these researchers had assembled, from pangolin lung-tissue samples, contained some gene regions that were ninety-nine per cent similar to equivalent parts of the SARS . Membrebe, J. V., Suchard, M. A., Rambaut, A., Baele, G. & Lemey, P. Bayesian inference of evolutionary histories under time-dependent substitution rates. Duchene, S. et al. 92, 433440 (2020). If stopping an outbreak in its early stages is not possibleas was the case for the COVID-19 epidemic in Hubeiidentification of origins and point sources is nevertheless important for containment purposes in other provinces and prevention of future outbreaks. Nature 558, 180182 (2018). Our third approach involved identifying breakpoints and masking minor recombinant regions (with gaps, which are treated as unobserved characters in probabilistic phylogenetic approaches). The rate of genome generation is unprecedented, yet there is currently no coherent nor accepted scheme for naming the expanding . stand-alone pangolin work flows or Illumina DRAGEN COVID Lineage App (v3.5.5) following the default parameters. Boni, M.F., Lemey, P., Jiang, X. et al. 82, 48074811 (2008). The pangolin coronaviruses show lower similarity to SARS-CoV-2 than bat coronavirus RaTG13 across the whole genome, but higher similarity in the spike receptor binding domain, although the similarity at either scale remains too low to implicate . Another similarity between SARS-CoV and SARS-CoV-2 is their divergence time (4070years ago) from currently known extant bat virus lineages (Fig. Specifically, we used a combination of six methods implemented in v.5.5 of RDP5 (ref. Except for specifying that sequences are linear, all settings were kept to their defaults. Our results indicate the presence of a single lineage circulating in bats with properties that allowed it to infect human cells, as previously described for bat sarbecoviruses related to the first SARS-CoV lineage29,30,31. & Holmes, E. C. Recombination in evolutionary genomics. The estimated divergence times for the pangolin virus most closely related to the SARS-CoV-2/RaTG13 lineage range from 1851 (17301958) to 1877 (17461986), indicating that these pangolin lineages were acquired from bat viruses divergent to those that gave rise to SARS-CoV-2. A new coronavirus associated with human respiratory disease in China. While such models have recently been made available, we lack the information to calibrate the rate decline over time (for example, through internal node calibrations44). is funded by The National Natural Science Foundation of China Excellent Young Scientists Fund (Hong Kong and Macau; no. Posada, D., Crandall, K. A. Lin, X. et al. Individual sequences such as RpShaanxi2011, Guangxi GX2013 and two sequences from Zhejiang Province (CoVZXC21/CoVZC45), as previously shown22,25, have strong phylogenetic recombination signals because they fall on different evolutionary lineages (with bootstrap support >80%) depending on what region of the genome is being examined. Here, we analyse the evolutionary history of SARS-CoV-2 using available genomic data on sarbecoviruses. Softw. 4. 3). To examine temporal signal in the sequenced data, we plotted root-to-tip divergence against sampling time using TempEst39 v.1.5.3 based on a maximum likelihood tree. All custom code used in the manuscript is available at https://github.com/plemey/SARSCoV2origins. Rambaut, A., Lam, T. T., Carvalho, L. M. & Pybus, O. G. Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen). However, the coronavirus isolated from pangolin is similar at 99% in a specific region of the S protein, which corresponds to the 74 amino acids involved in the ACE (Angiotensin Converting Enzyme . The coverage threshold and consensus sequence generation threshold were set to 20 and 90 respectively. Uncertainty measures are shown in Extended Data Fig. 04:20. CAS We named the length-sorted BFRs as: BFRA (ntpositions 13,29119,628, length=6,338nt), BFRB (ntpositions 3,6259,150, length=5,526nt), BFRC (ntpositions 9,26111,795, length=2,535nt), BFRD (ntpositions 27,70228,843, length=1,142nt) and six further regions (EJ). All four of these breakpoints were also identified with the tree-based recombination detection method GARD35. Note that breakpoints can be shared between sequences if they are descendants of the same recombination events. RegionsB and C span nt3,6259,150 and 9,26111,795, respectively. Nature 583, 286289 (2020). Preprint at https://doi.org/10.1101/2020.04.20.052019 (2020). Green boxplots show the TMRCA estimate for the RaTG13/SARS-CoV-2 lineage and its most closely related pangolin lineage (Guangdong 2019), with the light and dark coloured version based on the HCoV-OC43 and MERS-CoV centred priors, respectively. J. Gen. Virol. Martin, D. P., Murrell, B., Golden, M., Khoosal, A. Suchard, M. A. et al. Viruses 11, 174 (2019). TMRCA estimates for SARS-CoV-2 and SARS-CoV from their respective most closely related bat lineages are reasonably consistent for the different data sets and different rate priors in our analyses. Published. At present, we analyzed the diversity of SARS-CoV-2 viral genomes in India to know the evolutionary patterns of viruses in the country through their pangolin lineage and GISAID-Clade. The unsampled diversity descended from the SARS-CoV-2/RaTG13 common ancestor forms a clade of bat sarbecoviruses with generalist propertieswith respect to their ability to infect a range of mammalian cellsthat facilitated its jump to humans and may do so again. In our analyses of the sarbecovirus datasets, we incorporated the uncertainty of the sampling dates when exact dates were not available. Boni, M. F., Zhou, Y., Taubenberger, J. K. & Holmes, E. C. Homologous recombination is very rare or absent in human influenza A virus. The variable-loop region in SARS-CoV-2 shows closer identity to the 2019 pangolin coronavirus sequence than to the RaTG13 bat virus, supported by phylogenetic inference (Fig. Background & objectives: Several phylogenetic classification systems have been devised to trace the viral lineages of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). R. Soc. PubMed Central Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. 3 Priors and posteriors for evolutionary rate of SARS-CoV-2. Holmes, E. C., Dudas, G., Rambaut, A. PubMed Share . In Extended Data Fig. While there is evidence of positive selection in the sarbecovirus lineage leading to RaTG13/SARS-CoV-2 (ref. (Yes, Pango is a tongue-in-cheek reference to pangolins, which were briefly suspected to have had a role in the coronavirus's originseveral of the team's computational tools are named after. performed recombination analysis for non-recombining regions1 and 2, breakpoint analysis and phylogenetic inference on recombinant segments. A pneumonia outbreak associated with a new coronavirus of probable bat origin. A hypothesis of snakes as intermediate hosts of SARS-CoV-2 was posited during the early epidemic phase54, but we found no evidence of this55,56; see Extended Data Fig. Boxes show 95% HPD credible intervals. These residues are also in the Pangolin Guangdong 2019 sequence. This is evidence for numerous recombination events occurring in the evolutionary history of the sarbecoviruses22,33; specifying all past events in their correct temporal order34 is challenging and not shown here. Eight other BFRs <500nt were identified, and the regions were named BFRAJ in order of length. Biol. 5. Python 379 102 pangoLEARN Public Store of the trained model for pangolin to access. A reduced sequence set of 25sequences chosen to capture the breadth of diversity in the sarbecoviruses (obvious recombinants not involving the SARS-CoV-2 lineage were also excluded) was used because GARD is computationally intensive. A single 3SEQ run on the genome alignment resulted in 67 out of 68sequences supporting some recombination in the past, with multiple candidate breakpoint ranges listed for each putative recombinant. Trends Microbiol. Press, H.) 3964 (Springer, 2009). Subsequently a bat sarbecovirusRaTG13, sampled from a Rhinolophus affinis horseshoe bat in 2013 in Yunnan Provincewas reported that clusters with SARS-CoV-2 in almost all genomic regions with approximately 96% genome sequence identity2. Posterior means with 95% HPDs are shown in Supplementary Information Table 2. Lond. Extensive diversity of coronaviruses in bats from China. Scientists trying to trace the ancestry of SARS-CoV-2, the virus responsible for COVID-19, have found the pangolin is unlikely to be the source of the virus responsible for the current pandemic. Posterior means (horizontal bars) of patristic distances between SARS-CoV-2 and its closest bat and pangolin sequences, for the spike proteins variable loop region and CTD region excluding the variable loop. Lie, P., Chen, W. & Chen, J.-P. Webster, R. G., Bean, W. J., Gorman, O. T., Chambers, T. M. & Kawaoka, Y. Evolution and ecology of influenza A viruses. The command line tool is open source software available under the GNU General Public License v3.0. Mol. Wan, Y., Shang, J., Graham, R., Baric, R. & Li, F. Receptor recognition by the novel Coronavirus from Wuhan: an analysis based on decade-long structural studies of SARS coronavirus. J. Virol. Thank you for visiting nature.com. This underscores the need for a global network of real-time human disease surveillance systems, such as that which identified the unusual cluster of pneumonia in Wuhan in December 2019, with the capacity to rapidly deploy genomic tools and functional studies for pathogen identification and characterization. Phylogenetic supertree reveals detailed evolution of SARS-CoV-2, Origin and cross-species transmission of bat coronaviruses in China, Emerging SARS-CoV-2 variants follow a historical pattern recorded in outgroups infecting non-human hosts, Inferring the ecological niche of bat viruses closely related to SARS-CoV-2 using phylogeographic analyses of Rhinolophus species, Genomic recombination events may reveal the evolution of coronavirus and the origin of SARS-CoV-2, A Bayesian approach to infer recombination patterns in coronaviruses, Metagenomic identification of a new sarbecovirus from horseshoe bats in Europe, A comparative recombination analysis of human coronaviruses and implications for the SARS-CoV-2 pandemic, Pandemic-scale phylogenomics reveals the SARS-CoV-2 recombination landscape, https://github.com/plemey/SARSCoV2origins, https://doi.org/10.1101/2020.04.20.052019, https://doi.org/10.1101/2020.02.10.942748, https://doi.org/10.1101/2020.05.28.122366, http://virological.org/t/ncov-2019-codon-usage-and-reservoir-not-snakes-v2/339, http://virological.org/t/ncovs-relationship-to-bat-coronaviruses-recombination-signals-no-snakes-no-evidence-the-2019-ncov-lineage-is-recombinant/331. 1c). RegionC showed no PI signals within it. Lam, T. T. et al. While it is possible that pangolins, or another hitherto undiscovered species, may have acted as an intermediate host facilitating transmission to humans, current evidence is consistent with the virus having evolved in bats resulting in bat sarbecoviruses that can replicate in the upper respiratory tract of both humans and pangolins25,32. Biol. Lancet 395, 565574 (2020). Graham, R. L. & Baric, R. S. Recombination, reservoirs, and the modular spike: mechanisms of coronavirus cross-species transmission. However, formal testing using marginal likelihood estimation41 does provide some evidence of a temporal signal, albeit with limited log Bayes factor support of 3 (NRR1), 10 (NRR2) and 3 (NRA3); see Supplementary Table 1. Over relatively shallow timescales, such differences can primarily be explained by varying selective pressure, with mildly deleterious variants being eliminated more strongly by purifying selection over longer timescales44,45,46. Bayesian evolutionary rate and divergence date estimates were shown to be consistent for these three approaches and for two different prior specifications of evolutionary rates based on HCoV-OC43 and MERS-CoV. Mol. We infer time-measured evolutionary histories using a Bayesian phylogenetic approach while incorporating rate priors based on mean MERS-CoV and HCoV-OC43 rates and with standard deviations that allow for more uncertainty than the empirical estimates for both viruses (see Methods). Aside from RaTG13, Pangolin-CoV is the most closely related CoV to SARS-CoV-2. Holmes, E. C., Rambaut, A. Virus Evol. CNN . For the current pandemic, the novel pathogen identification component of outbreak response delivered on its promise, with viral identification and rapid genomic analysis providing a genome sequence and confirmation, within weeks, that the December 2019 outbreak first detected in Wuhan, China was caused by a coronavirus3. G066215N, G0D5117N and G0B9317N)) and by the European Unions Horizon 2020 project MOOD (no. In addition, sequences NC_014470 (Bulgaria 2008), CoVZXC21, CoVZC45 and DQ412042 (Hubei-Yichang) needed to be removed to maintain a clean non-recombinant signal in A. 1) and thus likely to be the product of recombination, acquiring a divergent variable loop from a hitherto unsampled bat sarbecovirus28. He, B. et al. the development of viral diversity. The genetic distances between SARS-CoV-2 and RaTG13 (bottom) demonstrate that their relationship is consistent across all regions except for the variable loop. 26 March 2020. 24, 490502 (2016). Humans' selfish, speciesist treatment of these animals could be the very reason why the novel coronavirus exists.
Can You Drink Coffee When You Have Covid,
Crawford County Arkansas Police Scanner,
Invest 92l Spaghetti Models,
Ssbci Florida Application,
Zodiac Starforce Lawsuit,
Articles P