Abstract
Carbohydrate-active enzymes (CAZymes) allow microbes to digest both host (e.g. mucins) and diet-derived (e.g. fiber) glycans, but bioinformatics tools for CAZyme profiling and interpretation of substrate preferences in microbial community data are lacking. To address this, we developed the CAZyme profiler Cayman. Cayman (Carbohydrate active enzymes profiling of metagenomes) is an easy-to-use command-line profiling tool for profiling CAZyme abundances in metagenomic data. It takes shotgun reads as input and provides a matrix of CAZyme Reads-Per-Kilobase-Million (RPKM) abundances for your sample as output.
By additionally providing an integrated manually curated hierarchical substrate annotation scheme, we facilitate the interpretation of the resulting CAZyme profiles by grouping CAZyme families into higher-level, biologically meaningful substrate groups, e.g. dietary fibre. Thus CAZyme and substrate annotations are transferred via sequence homology.
We showcase the tool on large-scale bacterial gut genome collections and human gut metagenomic datasets demonstrating its utility for pinpointing bacterial species with specific substrate utilisation patterns (e.g. mucin-foraging) and how glycan substrate utilisation inferred from faecal metagenomes can differ across host lifestyles and health states. Cayman is broadly applicable to (meta-)genomic data from a variety of microbial communities.
Cayman profiles Carbohydrate-Active Enzymes (CAZymes) from metagenomes via quantification of CAZymes through a gene catalogue. CAZyme abundances are then aggregated/grouped at different levels, facilitated by our curated substrate annotations. GMGC: Gut Microbial Gene Catalog, ORFs: Open Reading Frames, PL: Polysaccharide Lyase, GH: Glycoside Hydrolase.