proGenomes3: approaching one million accurately and consistently annotated high-quality prokaryotic genomes
Fullam A,
Letunic I,
Schmidt TSB,
Ducarmon QR,
Karcher N,
Khedkar S,
Kuhn M,
Larralde M,
Maistrenko OM,
Malfertheiner L,
Milanese A,
Rodrigues JFM,
Sanchis-Lopez C,
Schudoma C,
Szklarczyk D,
Sunagawa S,
Zeller G,
Huerta-Cepas J,
von Mering C,
Bork P,
Mende DR,
Nucleic Acids Res
51
(D1)
:D760-D766
(2023).
Abstract
The interpretation of genomic, transcriptomic and other microbial ‘omics data is highly dependent on the availability of well-annotated genomes. As the number of publicly available microbial genomes continues to increase exponentially, the need for quality control and consistent annotation is becoming critical. We present proGenomes3, a database of 907 388 high-quality genomes containing 4 billion genes that passed stringent criteria and have been consistently annotated using multiple functional and taxonomic databases including mobile genetic elements and biosynthetic gene clusters. proGenomes3 encompasses 41 171 species-level clusters, defined based on universal single copy marker genes, for which pan-genomes and contextual habitat annotations are provided. The database is available at http://progenomes.embl.de/.