annot_toolbox

complement_gff

This Python script supplements one GFF file into a reference GFF file. It was designed specifically for the case of adding ORFs (i.e. from EMBOSS getorf) into an existing annotation, without disturbing or modifying the original annotation in any other way.

complement_gff.py

This script was written for Python 3. To run in a UNIX command line: python3 complement_gff.py

See expected inputs and behavior with complement_gff.py --help

The minimal inputs are complement_gff.py -r <ref.gff> -s <supplemental_genes.gff> -o <output.gff>

By default, supplemental features are at the bottom of the output gff. --sorted adds significant runtime but will add genes into the correct genomic position. The order of feature types may differ from the original order in the reference GFF.

Note that this script merges features into the reference GFF. For the original use case, exon and CDS had identical start and stop positions to the gene features, so no features child to gene had separate start and end points. If using this script to supplement polyexonic genes, exons not overlapping reference features may be orphaned if their parent genes do overlap reference features. There will be no message output if this happens, consider this your warning. For more robust supplementation of genes, see AGAT. agat_sp_complement_annotations.pl correctly handles orphan genes, but the parser may make modifications to reference annotations. Any changes to reference features is described in the output.