Pedigree shows path of the virus


The new coronavirus, which has now been in the Netherlands for over two weeks, has been introduced several times in our country. This can be seen in the pedigree of the coronavirus that geneticists have compiled. Geneticists are now making these pedigree analyzes at lightning speed, enabling them to follow the course of the epidemic almost live.

The outbreak of the new coronavirus is investigated by testing people with complaints for the presence of the virus. The test detects the presence of a small characteristic piece of virus RNA. While the GGDs do their best to trace contacts of people who test positive, geneticists search for the source of the infections by analyzing the complete hereditary material of the virus. It provides an important additional clue as to how the epidemic is unfolding. For example, it may answer the question of whether the outbreak is still being fed from foreign source areas, or whether the virus has already taken root in the population to such an extent that a new virus of its own has developed. That information also played a role in the considerations for the new corona measures that the Dutch government took this week.

Virologist Marion Koopmans of Erasmus MC in Rotterdam and her team have already collected dozens of genetic fingerprints of Dutch corona viruses. They use small changes in the hereditary code of the virus to make a family tree of the virus. Through this analysis, the researchers were able to see that many of the infections with the new coronavirus in the Netherlands indeed have many similarities with virus variants that circulate in Northern Italy. A smaller part overlaps with viruses from source areas in Germany, France and South Korea.

“In this way we want to help the source tracing,” says Koopmans. This is mainly done through contact research of people with a confirmed infection, by systematically asking people where they have been and with whom they have had (intensive) contact in the past fourteen days. But it is not always possible to trace where someone has contracted the virus. “We linked the genetic data we collected to the information we received from GGD and RIVM. This sometimes provides new clues about where to find the source, ”says Koopmans. “But this method of analysis to monitor patterns of an epidemic is certainly not fool proof. It is an experimental method, you cannot derive everything from it and you have to watch out for big conclusions. ”

At the beginning of this week, the Rotterdam team added the first 25 sequences of the virus to the international database used by Nextstrain. On Thursday, 48 were added again. “We have more, we will add them later,” says Koopmans. In Nextstrain the Dutch get viruses a clear place between foreign variants. In the family tree, the Dutch sequences are close together, in three or four clusters.

What does Koopmans see when she looks at it? “You can immediately see whether the group of people who are sick in a country all have the same virus. And that is clearly not the case here in the Netherlands. In addition, we also see a few sequences from other countries in the Dutch clusters. This indicates that the same clusters also exist in Italy and France, for example. That is actually good news, because it means that the Dutch viruses can still be traced as introductions from other countries. We now have the second, third generation of coronaviruses here in the Netherlands and because they are not very far away from the original viruses, I think that the measures that RIVM and the government are now taking may still have an effect on the epidemic. ”

Animation film

But it is something to keep a close eye on, Koopmans emphasizes. “As the virus spreads in one region day by day, you get an increasingly stronger region signal, a cluster of viruses that are very similar. Then they are no longer strongly linked to viruses from source areas. We therefore continue to pay close attention to whether there is Netherlands onlyclusters emerge. When that happens, the epidemic has entered another phase here. ”

The growth of the pedigree and the spread of the virus can be observed the website from Nextstrain as an animated film. The virus can be traced on the basis of genetics. That is important information for epidemiologists. It answers questions such as: where exactly do new infections come from, and how long has the virus been circulating in a certain area? The virus’s RNA acts as a molecular clock, “ticking” on average at the rate of just over one mutation per month. These small differences and the pattern of the changes provide geneticists with information about where and when certain variants may have arisen.

The Nextstrain software, developed by the American bioinformaticians Trevor Bedford and Richard Neher, also illustrates this well for non-expert eyes. On the website From Nextstrain you see Sars-CoV-2 split off in three months from China in new variants, indicated as a tree with multicolored branches.

The spread of this new virus has gone incredibly fast. Because the virus is completely new, no one has antibodies against it. And there is also no vaccine that can stop it. Isolation is the only effective method to stop the virus. Despite drastic measures in China and other countries that tried to achieve this, the virus failed to stop. No country will escape the virus anymore.

How accurately the Netherlands can track the virus through these genetic fingerprints also depends on how well foreign colleagues share their data. That could be a lot better. More than 400 SARS-CoV-2 sequences have so far been included in the Gisaid database from which Nextstrain draws its data. Isn’t that very little, given that the number of confirmed infections has risen to more than 130,000 worldwide? “Yes, that is very little in the total,” says Emma Hodcroft of Nextstrain on the phone from Switzerland. “We never do sequencing ourselves,” says Hodcroft, who works as a developer at Nextstrain. “We depend on other researchers to share their results. The more sequences we have, the more we can puzzle together. But there are still white spots on the map, including places where major outbreaks are now underway, such as Iran. That makes it more difficult to see connections between new sources of infection. ”

The picture of the European epidemic is also flawed. “We don’t have nearly as many samples as we would like from European countries,” says Hodcroft, “but we remain hopeful that they will come. We often know that countries have already collected the sequences, but it is of course important that they also share them with us. ”

It is difficult to make ‘one story’ on the basis of few data points. For example, the database contains one virus genome dated January 28 of the small early outbreak at an auto parts company in Bavaria and two sequences of viruses from the start of the Lombardy outbreak in the second half of February. These German and Italian viruses looked like two drops of water. Subsequently, Nextstrain founder Trevor Bedford suggested in a tweet that an undamaged virus from the early Bavarian outbreak may have been the germ for the disastrous Italian outbreak. That immediately put him in a storm of indignation from colleagues.

“Yes, that was an unfortunate misstep,” said Hodcroft. “He had not made it clear in his tweet that there are of course two scenarios, each of which can be true in itself. The similarity of the viruses could mean that they are in the same chain of infection, so the virus has gone from Bavaria to Italy. But the other scenario is that the infections in Germany and Italy both happened to have started to transmit the virus from China. ”

So while overinterpretation is lurking quickly, Nextstrain’s research can also quickly debunk wild theories. This happened, for example, when Chinese researchers claimed that the virus was splitting into two separate strains. One strain would be a lot more aggressive and deadly than the other. On the basis of their database, Nextstrain scientists could quickly turn that into minced meat.

American epidemic

The genome analysis also proved its strength in the United States, where it took a long time to test humans for the new coronavirus. Trevor Bedford found that the virus had been going unnoticed in his own Seattle area for a long time after he discovered that a February 27 genome was very similar to the sequence of a virus found in Washington State six weeks earlier in a man who had returned from Wuhan. Meanwhile, tests and other virus sequences collected from patients have confirmed that a major epidemic is ongoing in the northwestern state, with 30 deaths already.

Another striking conclusion you can draw from the data in the Nextstrain software is that the virus variants that the United States is currently facing are mainly not from Europe. US President Trump’s measure to keep all air travelers out of the Schengen countries is based on the mistaken assumption that the US infections came from European sources. Now, in the future, the measure is likely to protect Europe from transfers from the US if the epidemic gets worse there.

The genome database of the new coronavirus has also provided a good picture of when the virus spread to humans. Using the molecular clock, that moment could be calculated back to mid-November 2019. This is roughly true with the first report of a patient with an ‘misunderstood lung disease’ on December 1 in the Chinese city of Wuhan, the epicenter from which the epidemic spread. . “The earliest virus sequence in our files is December 21, 2019, obtained from a sample of a 65-year-old man in Wuhan who was found to be infected with the new virus. At that date it was not yet known what kind of virus it was, let alone that the virus was already isolated. But the sequence was added later when the virus was sequenced in a surviving sample. ”

History is interesting, but the focus is now mainly on the current situation, says Hodcroft. “We would prefer to include sequences from recent samples so that we can follow the epidemic as closely as possible to the present.”

Sequencing a virus genome is specialist work and also takes time, which means that your analysis lags behind current events. A genome sequence of the virus cannot be determined in everyone who has tested positive, says Koopmans. “It depends on the amount of virus someone is carrying.” Of course she would also like to have more sequences from other European countries, to better see the Dutch outbreak in perspective. After all, with every new cluster that is added to the database, the branches in the tree shift. “That is exactly why you have to be very careful when drawing conclusions,” says Koopmans. “But: with more sequences, the analysis will get better.”

Supplement 21:17: In an earlier version of this article, virologist Marion Koopmans stated that the epidemic was still contained. Partly in the light of the interview with RIVM director Jaap van Dissel in NRC, in which he states that the outbreak in Brabant can no longer be contained, the passages above have been adjusted.


Please enter your comment!
Please enter your name here