Microorganisms and their viruses are increasingly recognized as drivers of myriad ecosystem processes. However, our knowledge of their roles is limited by the inability of culture-dependent and culture-independent (e.g., metagenomics) methods to be fully implemented at scales relevant to the diversity found in nature. Here we combine advances in bioinformatics (shared k-mer analyses) and social networking (regression modeling) to develop an annotation- and assembly-free visualization and analytical strategy for comparative metagenomics that uses all the data in a unified statistical framework. Application to 32 Pacific Ocean viromes, the first large-scale quantitative viral metagenomic dataset, tested existing and generated further hypotheses about ecological drivers of viral community structure. Highly computationally scalable, this new approach enables diverse sequence-based large-scale comparative studies.

Attachment(s):