MY KOLKATA EDUGRAPH
ADVERTISEMENT
regular-article-logo Monday, 23 December 2024

Scientist finds deleted coronavirus sequences from Wuhan

Researcher in Seattle found 13 sequences in Google Cloud that had mysteriously disappeared last year

Carl Zimmer New York Published 26.06.21, 02:34 AM
Representational image.

Representational image. Shutterstock

About a year ago, more than 200 data entries from the genetic sequencing of early cases of Covid-19 in Wuhan disappeared from an online scientific database.

Now, by rooting through files stored on Google Cloud, a researcher in Seattle reports that he has recovered 13 of those original sequences — intriguing new information for discerning when and how the virus may have spilled over from a bat or another animal into humans.

ADVERTISEMENT

The new analysis, released on Tuesday, bolsters earlier suggestions that a variety of coronaviruses may have been circulating in Wuhan before the initial outbreaks linked to animal and seafood markets in December 2019.

As the Biden administration investigates the contested origins of the virus, known as SARS-CoV-2, the study neither strengthens nor discounts the hypothesis that the pathogen leaked out of a famous Wuhan lab. But it does raise questions about why original sequences were deleted, and suggests that there may be more revelations to recover from the far corners of the Internet.

“This is a great piece of sleuth work for sure, and it significantly advances efforts to understand the origin of SARS-CoV-2,” said Michael Worobey, an evolutionary biologist at the University of Arizona who was not involved in the study.

Jesse Bloom, a virologist at the Fred Hutchinson Cancer Research Center who wrote the new report, called the deletion of these sequences suspicious. It “seems likely that the sequences were deleted to obscure their existence”, he wrote in the paper, which has not yet been peer-reviewed or published in a scientific journal.

Dr Bloom and Dr Worobey belong to an outspoken group of scientists who have called for more research into how the pandemic began.

In a letter published in May, they complained that there wasn’t enough information to determine whether it was more likely that a lab leak spread the coronavirus, or that it leapt to humans from contact with an infected animal outside of a lab.

The genetic sequences of viral samples hold crucial clues about how SARS-CoV-2 shifted to our species from another animal, most likely a bat. Most precious of all are sequences from early in the pandemic, because they take scientists closer to the original spillover event.

As Dr Bloom was reviewing what genetic data had been published by various research groups, he came across a March 2020 study with a spreadsheet that included information on 241 genetic sequences collected by scientists at Wuhan University. The spreadsheet indicated that the scientists had uploaded the sequences to an online database called the Sequence Read Archive, managed by the US government’s National Library of Medicine.

But when Dr Bloom looked for the Wuhan sequences in the database earlier this month, his only result was “no item found”.

Puzzled, he went back to the spreadsheet for any further clues. It indicated that the 241 sequences had been collected by a scientist named Aisi Fu at Renmin Hospital in Wuhan. Searching medical literature, Dr Bloom eventually found another study posted online in March 2020 by Dr Fu and colleagues, describing a new experimental test for SARS-CoV-2. The Chinese scientists published it in a scientific journal three months later.

In that study, the scientists wrote that they had looked at 45 samples from nasal swabs taken “from outpatients with suspected Covid-19 early in the epidemic”. They then searched for a portion of SARS-CoV-2’s genetic material in the swabs. The researchers did not publish the actual sequences of the genes they fished out of the samples. Instead, they only published some mutations in the viruses.

But a number of clues indicated to Dr Bloom that the samples were the source of the 241 missing sequences. The papers included no explanation as to why the sequences had been uploaded to the Sequence Read Archive, only to disappear later.

Perusing the archive, Dr Bloom figured out that many of the sequences were stored as files on Google Cloud. Each sequence was contained in a file in the cloud, and the names of the files all shared the same basic format, he reported.

Dr Bloom swapped in the code for a missing sequence from Wuhan. Suddenly, he had the sequence. All told, he managed to recover 13 sequences from the cloud this way.

With this new data, Dr Bloom looked back once more at the early stages of the pandemic. He combined the 13 sequences with other published sequences of early coronaviruses, hoping to make progress on building the family tree of SARS-CoV-2.

Working out all the steps by which SARS-CoV-2 evolved from a bat virus has been a challenge because scientists still have a limited number of samples to study. Some of the earliest samples come from the Huanan Seafood Wholesale Market in Wuhan, where an outbreak occurred in December 2019.

But those market viruses actually have three extra mutations that are missing from SARS-CoV-2 samples collected weeks later.

In other words, those later viruses look more like coronaviruses found in bats, supporting the idea that there was some early lineage of the virus that did not pass through the seafood market.

Dr Bloom found that the deleted sequences he recovered from the cloud also lack those extra mutations.

New York Times News Service

Follow us on:
ADVERTISEMENT
ADVERTISEMENT