A leading US expert in influenza viruses has discovered that early sequences of the coronavirus genome from a global database at the request of Chinese researchers.
Professor Jesse Bloom, who works at the Fred Hutchinson Cancer Research Center in Seattle, found a project by Wuhan University which sequenced 34 positive COVID-19 cases from January 2020, as well as 16 cases in early February in which researchers looked into diagnosing a SARS-CoV-2 infection using a technique known as nanopore sequencing.
While the results of their researcher were published in March as a pre-print, and in June following peer review, the genomic sequences obtained during the course of their research - and uploaded to the US-maintained Sequence Read Archive (SRA) within the National Institutes of Health - were removed by a process that could have only taken place if the SRA staff were asked to do so, according to The Telegraph.
But when I went to Sequence Read Archive, I found entire project was gone! (Note that as detailed below, this does *not* imply malfeasance by NIH. Sequence Read Archive policy allows submitters to delete by e-mail request.) (3/n) pic.twitter.com/fEzOaVYZLZ— Bloom Lab (@jbloom_lab) June 22, 2021
The sequences, which have been recovered from cloud storage and published in a pre-print, have been described by experts as “the most important data” on the origins of Covid-19 in more than a year.
The recovered data does not support either the “natural origins” or “lab leak” theory over the pandemic’s source, scientists say. However, it suggests the virus was circulating in Wuhan earlier than previously thought, and could perhaps point toward answers on the origins of Sars-CoV-2 - answers that could not only help end this pandemic but prevent the next one.
The emergence of the sequences also suggests there is more data from the early days of the epidemic that China is sitting on, and which may be recoverable by investigators.
Bloom writes in a lengthy Twitter thread: "Although events that led to emergence of #SARSCoV2 in Wuhan are unclear (zoonosis vs lab accident), everyone agrees deep ancestors are coronaviruses from bats. Therefore, we’d expect the first #SARSCoV2 sequences would be more similar to bat coronaviruses, and as #SARSCoV2 continued to evolve it would become more divergent from these ancestors. But that is *not* the case! Instead, early Huanan Seafood Market #SARSCoV2 viruses are more different from bat coronaviruses than #SARSCoV2 viruses collected later in China and even other countries. @lpipes @ras_nielsen give nice technical analysis at https://academic.oup.com/mbe/article/38/4/1537/6028993."
Therefore, we’d expect the first #SARSCoV2 sequences would be more similar to bat coronaviruses, and as #SARSCoV2 continued to evolve it would become more divergent from these ancestors. But that is *not* the case! (8/n)— Bloom Lab (@jbloom_lab) June 22, 2021
Same result if we use other bat coronaviruses like RpYN06 or RmYN02. To see this, go to https://t.co/4qZbDRjFvw for an interactive plot that allows you to select the bat coronavirus outgroup and mouse over points for strain details. (11/n)— Bloom Lab (@jbloom_lab) June 22, 2021
There are also broader implications. First, fact this dataset was deleted should make us skeptical that all other relevant early Wuhan sequences have been shared. We already know many labs in China ordered to destroy early samples: https://t.co/3Uol5gdwON (16/n) pic.twitter.com/ajtm8SxfVu— Bloom Lab (@jbloom_lab) June 22, 2021
The NIH confirmed that the removal of the data, telling the Telegraph that they had "reviewed the submitting investigator’s request to withdraw the data," and removed it.
"The requestor indicated the sequence information had been updated, was being submitted to another database, and wanted the data removed from SRA to avoid version control issues," said a spokesperson, adding "Submitting investigators hold the rights to their data and can request withdrawal of the data."
Bloom published his findings on the preprint server bioRxiv.