What type of genomes do viruses have




















The second group is that of dsDNA viruses that infect algae, invertebrates and vertebrates, and that have genome sizes that range from to kbp.

This second genome-size group includes giant viruses like the Phycodnaviridae, the Iridoviridae, and the Asfarviridae. It has been suggested that these eukaryotic viral families share a common ancestor with the largest-genome giant viruses, supporting the idea of an additional branch of life Iyer et al.

However, it has been argued that the giant viruses, like the Marseilleviridae, have in fact increased their genomes with eukaryotic sequences through horizontal gene transfer Moreira and Brochier-Armanet, ; Boyer et al. Finally, the third and also the smallest genome-size group includes dsDNA viruses that infect prokaryotes and vertebrates. It is possible that the small size of prokaryotic viruses is constrained by their small-size capsids Krupovic et al.

However, the Myoviridae and the Siphoviridae, the only two families that cross-infect both Bacteria and Archaea, have genomes that range from 10 to kbp.

Together with the Podoviridae, these two-tailed viral families of bacteriophages appear to be an ancient and genetically connected viral group Hendrix, There is no evidence of cross-infection between prokaryotes and eukaryotes, which may suggest a domain-specific origin of viruses. The smallest genomes of dsDNA viruses 5—7 kbp are those of the Polyomaviridae and the Papillomaviridae, which infect mammals and birds de Villiers et al.

Therefore, the largest genomes of dsDNA viruses are found in those which infect eukaryotes. Although both viral types exhibit a difference of one magnitude in their genome sizes, the smallest viral genomes are found in ssDNA viruses.

It thus appears that the genomes of ssDNA viruses are subjected to the same restrictions that hinder the size increase in RNA viral genomes, most likely due to the lack of repair mechanisms Reanney, Both viral types exhibit comparable behavior, including high mutation rate, large population sizes, small levels of horizontal gene transfer, little gene duplication, overlapping reading frames and, often, little recombination Duffy and Holmes, ; Holmes, Unlike DNA viruses, RNA viral families infect a wide range of phylogenetically diverse eukaryotic hosts, an evolutionary dispersal that may explain why some of them have coevolved with their invertebrate vectors Gray and Banerjee, ; Lobo et al.

One of the viral families that infect multiple hosts is the Reoviridae, which also exhibit multiple segmentation and large genomes see Figure 3. It has been suggested that segments of dsRNA genome of Reoviridae probably recombine through complementation when two or more viruses co-infect a single cell Reanney, ; Froissart et al.

It has been argued that the Reoviridae cannot undergo major increases in the genome size, since this would require a complex molecular machinery including unwinding proteins, DNA-dependent ATPases, and nucleases which are not encoded by RNA viruses Reanney, It is somewhat surprising that with the exception of only two known examples, all RNA viral families appear to be restricted to eukaryotic hosts.

It has been speculated that the latter could be derived from eukaryotic viruses Holmes, RNA-mediated silencing is a highly conserved mechanism that was probably present in the last common ancestor of eukaryotes Cerutti and Casas-Mollano, , which may indicate an ancient evolutionary relationship between nucleated cells and RNA viruses, whose origin could thus be placed some time near the actual emergence of eukaryotic microbes.

As reviewed above, it has been argued that viruses were the first living entities and RNA viruses or viroids may be direct descendants of the RNA World. Our results suggest that these schemes may be mistaken. Although the results presented here may be severely affected by methodological issues that include biased representations of viral diversity, our data show that in terms of their genome size and organization RNA viruses are not endowed with the simpler and smallest genomes of all known viruses as is generally believed, and in fact that they may be more closely related to the evolutionary history of their eukaryotic hosts.

Our results also suggest that since retroviruses appear to be restricted to plants and vertebrates, they could not have played a role in the evolutionary transition from primitive cellular RNA genomes to the extant DNA-based genetic systems of extant cells, nor the viral reverse transcriptase can be considered an evolutionary vestige of the polymerase that played a role in this transition.

The results presented here demonstrate that viral genome sizes are not randomly distributed, but do not appear to be correlated with the antiquity of their hosts. Therefore, viruses may be ancient, but not primitive.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

We are indebted to Dr. Agol, V. Which came first, the virus or the cell? Baltimore, D. Expression of animal virus genomes. PubMed Abstract Google Scholar. Berkhout, B. The interplay between virus infection and the cellular RNA interference machinery. FEBS Lett. Beutner, R. Life's Beginning on the Earth.

Boyer, M. Giant Marseillevirus highlights the role of amoebae as a melting pot in emergence of chimeric microorganisms. Caprari, S. Sequence and structure analysis of distantly-related viruses reveals extensive gene transfer between viruses and hosts and among viruses. Viruses 10, — Cerutti, H. On the origin and functions of RNA-mediated silencing: from protists to man. Colson, P.

Gene repertoire of amoeba-associated giant viruses. Intervirology 53, — Crandall, K. Phylogenomics and molecular evolution of polyomaviruses. Daros, J. Viroids: an Ariadne's thread into the RNA labyrinth. EMBO Rep. Classification of papillomaviruses. Virology , 17— The Bacteriophage and its Behavior. Google Scholar. Duffy, S. To perform a comprehensive analysis, we first explored the diversity of known viruses and their hosts within the NCBI database see Materials and methods.

We then created distributions on a number of metrics, namely genome length, gene length, gene density, percentage of noncoding DNA or RNA , functional gene category abundances, and gene order. We have provided brief introductions to these metrics in the following subsections. A central and revealing piece of information is the genome length. As more and more complete genomes have become available, we have learned that genome lengths of cellular organisms vary quite extensively, specifically by six orders of magnitude Phillips et al.

Because these studies focused on cellular organisms, and because genome length information is generally inaccessible through metagenomic studies, large-scale analyses that systematically capture viral genome length distributions in light of different classification systems and in relation to other genomic parameters are lacking.

One such genomic parameter is the number of genes that are encoded per genome, also referred to as gene density Keller and Feuillet, ; Hou et al. Another set of missing distributions involves gene lengths, and here too, it is important to see how they vary across different viral classification categories.

One of the most surprising discoveries of the past several decades was the rich and enormous diversity of noncoding DNA in the human genome Elgar and Vavouri, Moreover, genomes vary widely in their noncoding percentages.

Hence, the noncoding percentage of the genome is thought to correlate with the phenotypic complexity of the organism, and consequently, much of the investigation into noncoding fractions of genomes has been focused on higher eukaryotes. Even less is known about the noncoding fraction of viral genomes. These sequences were shown responsible for viral evasion of host immunity by inhibition of protein kinase R- a cellular protein responsible for the inactivation of viral protein synthesis Mathews and Shenk, In ovine herpesvirus, miRNAs have been shown to maintain viral latency Riaz et al.

These are just several examples in which viral noncoding elements have been shown to enable viral escape from host immunity, as well as regulate viral life-cycle and viral persistence Tycowski et al. Despite many interesting studies exploring the topic of cellular noncoding DNA Mattick and Makunin, ; Morris, ; Mattick, , there are no studies, to our knowledge, that reveal the statistics of noncoding percentage of viral genomes.

There are detailed studies on the counts of cellular genes belonging to each broad functional category Molina and van Nimwegen, ; Grilli et al.

These studies have helped us better understand the scaling of functional categories across different clades of organisms. In fact there was an intriguing conclusion that for prokaryotic genomes, there exists a universal organization which governs the relative number of genes in each category Molina and van Nimwegen, Such depictions of viral genomes, however, are largely lacking. Thus, we set out to better understand how viral genes are distributed across different functional categories and how these distributions might differ across various viral groups.

Viral genome organization is a topic that has great depth but limited breadth. While this highly detailed approach is indispensible for studying individual viruses, a simplified illustration of genome organization is a requirement of any high-throughput visualization and comparison of genomes.

The latter approach could help us uncover general rules governing genomic organization, in the same way that synteny, or conserved gene order, has been used to compare animal genomes Telford and Copley, ; Jaillon et al. We used the largest available dataset of completed viral genomes available from the National Center for Biotechnology Information NCBI viral genomes resource Brister et al.

These viruses were included for further analysis, and unless noted otherwise, will constitute our dataset in this study. A Percentage of viruses infecting hosts from the three domains of life. E Distributions of host phyla or supergroups infected by the 1 eukaryotic, 2 bacterial, and 3 archaeal viruses is shown. F Histograms of the number of known viruses infecting host species. Median and mean number of viruses infecting a host species is provided in each plot. Further exploration of the largest fraction of the eukaryotic virome i.

In contrast to prokaryotes, which are predominantly host to viruses with double-stranded genomes, eukaryotes are host to a higher number of viruses with single-stranded genomes.

Why are double-stranded DNA viruses, despite their high prevalence in the bacterial and archaeal world, only the third largest group of viruses infecting eukaryotes in this database? One explanation proposed is the physical separation of transcriptional processes from the cytoplasm by way of the eukaryotic nucleus Koonin et al. More than half of viruses with complete genomes have not been assigned to any viral orders under the ICTV classification Figure 2D4.

About one third of all known viruses are assigned to the Caudovirales order, while the other orders are in the minority. The vast majority of the bacterial viruses are categorized as part of the Cauodvirales order Figure 2D2 , but the majority of archaeal and eukaryotic viruses remain unassigned to any order. Before any further exploration of this dataset, we aimed to assess its diversity and possible sources of bias Figure 2E—F. It was immediately clear, for example, that archaeal viruses were heavily under-sampled.

In contrast, bacterial viruses infect hosts from a diverse array of bacterial phyla Figure 2E2. However, even for bacterial viruses, there are host phyla whose viruses are entirely missing from the database, for example Synergistes and Acidobacteria , whose members are typically unculturable soil bacteria. Given that the isolation and characterization of archaeal and bacterial viruses has traditionally been dependent on the culturing of their hosts, the majority of viruses with unculturable hosts remain unexplored.

Moreover, the eukaryotic viruses in the database infect hosts primarily from the Viridiplantae or the Opisthokonta supergroups Figure 2E1. Among Viridiplantae , the majority of hosts belong to the Streptophytina group land plants , and within the Opisthokonta supergroup, the majority of viruses are metazoan. We further examine the distribution of viruses from the Opisthokonta supergroup in Figure 2—figure supplement 1.

We continued to explore host diversity at a finer resolution and mapped out the number of viruses that infect each host species Figure 2F. As expected, organisms such as Staphylococcus aureus , Escherichia coli , and Solanum lycopersicum , which are host species with either medical, research or agricultural relevance, have many known viruses and are outliers in the skewed distributions shown in Figure 2F.

However, the median number of viruses known to infect a eukaryotic or a prokaryotic host species is approximately 1 Figure 2F.

This signifies that even for host species that are already represented in our collection, the number of known viruses is likely an underestimate considering the larger numbers of viruses known to infect the more heavily studied host species. Genome lengths for all fully sequenced viral genomes varied widely by three orders of magnitude Figure 3A , Table 1.

According to the Host Domain classification, prokaryotic viruses tend to have longer genomes than eukaryotic viruses Figure 3—source data 1 , Figure 3—figure supplement 1. However, this difference can be better explained by the Nucleotide Type classification, as the median RNA virus genome length is four times shorter than the median DNA virus genome length.

Thus, the comparison between prokaryotic and eukaryotic viral genome lengths is confounded by the fact that the prokaryotic virome, as represented by this database, is primarily composed of DNA viruses, whereas the eukaryotic virome is only half comprised of DNA viruses Figure 2C4. A Box plots of genome lengths Log10 across all viruses included in our dataset top , further partitioned based on the Baltimore classification categories bottom.

The number of viruses included in each group is denoted by N. Distributions of genome lengths associated with eukaryotic, bacterial and archaeal viruses are shown in salmon, blue, and teal, respectively. ICTV viral families with only a few members are omitted. Distributions of genome lengths across different classification systems along with various statistics are shown in Figure 3—figure supplement 1.

Note that the bimodal distribution of eukaryotic ssDNA viruses, which also appears in the next figure, arises from the Begomoviruses, which are plant viruses with circularized monopartite and bipartite genomes Melgarejo et al.

C Median gene length is plotted against the number of genes for each genome for all genomes in our dataset, color-coded according to different classification systems.

D Number of genes per genome length gene density for dsDNA viruses based on the overlay of Host Domain bottom and ICTV family classification categories top Pearson correlations and their statistical significance, two-tailed t-test P values, are denoted.

Genome length statistics for viral groups across different classification systems rounded to the nearest kilobase. Only median values are reported in this table.

Genome length data is rounded to the nearest kilobase. N corresponds to the number of viruses from which data is obtained. With respect to viral genome lengths, the Baltimore classification seems to offer the most explanatory power. Knowing whether a viral genome is DNA- or RNA-based already provides a strong indication about viral genome length, especially for RNA viruses where the standard deviation is just a few kilobases Figure 3—source data 1.

Across all Baltimore groups, dsDNA viruses have genome lengths that have the largest standard deviation, however considering the limited range of genome lengths associated with other Baltimore groups, it is very likely that a larger viral genome will be composed of dsDNA Figure 3A.

We provide a more detailed view of genome length distributions by layering different classification systems, first applying the Baltimore classification, followed by the Host Domain and the ICTV family classifications Figure 3B , Figure 3—source data 1.

Finally, it is worth noting that capsid dimension, surprisingly, does not seem to correlate with viral genome size, and to different degrees, many viruses are shown to under-utilize the capsid volume Brandes and Linial, In viewing the relationship between median gene length and number of genes per viral genome Figure 3C , two different coding strategies become apparent.

For example, many of the RNA genomes we examined closely contained genes that encode polyproteins, ribosomal slippage frame-shifting or codon read-through events, among other non-canonical translational mechanisms. As in the case of genome lengths, by examining only the ICTV or the Host Domain classifications it would be difficult to draw meaningful conclusions about the observed patterns, and in the case of the Host Domain classification, our conclusions would be confounded by the disproportionate ratio of RNA to DNA viruses that are known to infect each host domain in this database.

However, the layering of these classification systems offers new insights, which we will discuss in the following paragraphs. We follow others Keller and Feuillet, ; Hou et al. We observed a strong linear correlation between dsDNA viral genome lengths and the number of genes encoded by these genomes Figure 3D. The mean and median gene densities for bacterial, archaeal and eukaryotic dsDNA viral genomes are approximately 1.

This trend follows what we see across cellular genomes, since prokaryotic genes and proteins are shown to be significantly shorter than eukaryotic ones Milo and Phillips, ; Brocchieri and Karlin, Instead of showing absolute viral counts on y-axes, the counts are normalized by the total number of viruses in each viral category denoted as N inside each plot. The mean of each distribution is denoted as a dot on the boxplot. For all histograms, bin numbers and bin widths are systematically decided by the Freedman-Diaconis rule Reich et al.

Viral schematics on the right of the figure are modified from ViralZone Hulo et al. Key statistics describing these distributions can be found in Table 1 and Figure 4—source data 1.

Median gene length statistics for viral groups across different classification systems rounded to the nearest base. It is important to clarify that the median values in this table represent the median of median gene lengths. So far we have primarily focused on the coding fractions of viral genomes. Thus, we created distributions of noncoding percentage of viral genomes see Materials and methods, Figure 5 , Table 1 , Figure 5—source data 1.

Interestingly, both retroviral groups had relatively high noncoding DNA percentages. This is likely due to the presence of defunct retroviral genes. This high noncoding percentage can be explained by the fact that this virus genome contains three pseudogenes previously coding for env , pol and gag proteins.

The counts of viruses are normalized by the total number of viruses in each viral category denoted as N inside each plot.

Viral schematics are modified from ViralZone Hulo et al. Key statistics describing these distributions can be found in Table 1 and Figure 5—source data 1. Percent noncoding DNA or RNA for viral groups across different classification systems rounded to the nearest percentage.

We categorized viral genes according to several major functional categories, including structural genes such as capsid and tail genes, metabolic genes, informational genes, which we define as those involved in replication, transcription or translation of the viral genetic code, among other categories Figure 6 , see Materials and methods.

When reporting the relative abundance of different functional gene categories, we will normalize the number of genes belonging to each functional category by the total number of labeled genes. A Abundances of functional gene categories across 8 viral groups normalized to the number of labeled genes in each viral group the total number of genes in each viral group is shown above the panel, and in brackets are the number of labeled genes for each viral group.

A few examples of the types of genes contained as part of each functional subcategory are provided. For example, across all three viral groups, roughly half of all genes are structural. Similarly, dsDNA viruses of eukaryotes and bacteria in this database, in contrast to having different genomic properties and morphologies surprisingly have very similar distribution of gene functional category and subcategory abundances.

The major difference between these two viral groups, as expected from our knowledge of viral morphologies, is that a larger portion of eukaryotic dsDNA viral genes are envelope and matrix genes, whereas a greater portion of bacterial dsDNA genes are portal and tail-associated genes.

By further zooming in on bacterial dsDNA viruses, it is again interesting to see that Myoviridae , Siphoviridae , and Podoviridae viral groups, with their different morphologies and wide range of hosts, having very similar functional gene category abundances even at the level of subcategories.

To explore viral genome organization we developed a coarse-grained method for visualizing a large number of genomes in one snapshot. We first defined genome organization as the order in which genes appear across a genome. Genes with similar functions are grouped and are represented by the same letter Figure 7. Therefore each viral genome, analogous to a nucleotide sequence, is compactly described by a sequence of letters that represent its gene order Figure 7 , which we will refer to as the gene order sequence.

Because we aimed to study gene order sequences across different viral groups, we focused on genes whose functions are universally required, namely structural genes. Each genome is summarized by a sequence of letters, with each letter corresponding to a gene, positioned in the order that it appears on the genome. Note the letters shown serve to only denote genes with similar functions. Structural genes are assigned colors, whereas other genes are denoted in black.

Across all three panels, each row corresponds to the gene order sequence for a given virus, and thus, the length of the sequence denotes the number of genes within a given genome.

The left two columns accompanying each panel provide further information on hosts and viral morphologies. Among RNA viruses and certain DNA viruses, the genome is often divided up into separate parts, in which case it is called segmented. For RNA viruses, each segment often codes for only one protein, and they are usually found together in one capsid. However, all segments are not required to be in the same virion for the virus to be infectious, as demonstrated by the brome mosaic virus and several other plant viruses.

A viral genome, irrespective of nucleic acid type, is almost always either single-stranded or double-stranded. Then surrounding the nucleic acid will be a protein coat that's in the form of capsid, or little small units that are assembled in a certain way. That is what all viruses have. Now, some viruses will also have an envelope which they obtain as they emerge from the cell. Viruses are very interesting in that they can only survive inside a living cell. So they must have a living cell in order to survive and replicate.



0コメント

  • 1000 / 1000