Abstract

Improvements in sequencing technologies have seen the amount of available genomic data expand considerably over the last twenty years. One of the key steps for analysing is the prediction of protein-coding regions in genomic sequences, known as Open Reading Frames (ORFs), which span between a start and a stop codon. A recent comparison of several ORF prediction methods (Korandla et al., 2019) has shown that Prodigal (Hyatt et al., 2010), a prokaryotic gene finder that uses dynamic programming, is one of the highest performing ab-initio ORF finders. Pyrodigal is a Python package that provides Cython bindings and an interface to Prodigal to make it easier to use in Python applications.

Pyrodigal has been used in the following publications:

  1. Accurate de novo identification of biosynthetic gene clusters with GECCO (preprint).
  2. Protein Structure Informed Bacteriophage Genome Annotation with Phold (preprint).
  3. skDER and CiDDER: two scalable approaches for microbial genome dereplication.
  4. FastAAI: efficient estimation of genome average amino acid identity and phylum-level relationships using tetramers of universal proteins.
  5. zol and fai: large-scale targeted detection and evolutionary investigation of gene clusters.

Papers