Abstract
Improvements in sequencing technologies have seen the amount of available genomic data
expand considerably over the last twenty years. One of the key steps for analysing is the
prediction of protein-coding regions in genomic sequences, known as Open Reading Frames
(ORFs), which span between a start and a stop codon. A recent comparison of several ORF
prediction methods (Korandla et al., 2019)
has shown that Prodigal (Hyatt et al., 2010), a
prokaryotic gene finder that uses dynamic programming, is one of the highest performing
ab-initio ORF finders. Pyrodigal is a Python package
that provides Cython bindings and an interface to
Prodigal to make it easier to use in Python applications.
Pyrodigal has been used in the following publications:
- Accurate de novo identification of biosynthetic gene clusters with GECCO (preprint).
- Protein Structure Informed Bacteriophage Genome Annotation with Phold (preprint).
- skDER and CiDDER: two scalable approaches for microbial genome dereplication.
- FastAAI: efficient estimation of genome average amino acid identity and phylum-level relationships using tetramers of universal proteins.
zol
and fai
: large-scale targeted detection and evolutionary investigation of gene clusters.