close up of abstract shapes

Annotation of Bacterial Genomes

This is Part 5 of tutorial series: NGS Workflow for Genome Assembly to Annotation for Hybrid Bacterial Data 

We’ll be use hybrid sequencing data (Illumina and Nanopore). This tutorial has five parts. 

Disclaimer: This post is a work in progress. This is genome assembly and annotation workflow that I use for microbial genomics. Previously, I used this template to teach different class in OSU as well as in other training facilities.


After getting a good assembly, you want to have a good annotation. Essentially, you want to know where are the different genes are, and other important features of genome. There are many tools, like Bakta, RAST, Beav, Prokka etc. Let’s start with Bakta.

Bakta

Bakta is a standard tool for annotating bacterial genomes. Along with annotating basic genomic-features like CDS, genes, RNAs, etc., Bakta can also annotate antimicrobial genes using NCBI’s AMRfinder.

After installing Bakta, make sure to install database as well:

bakta_db download --output /path/to/database --type full

Remember where you are saving the database using the --output flag. Once the database is downloaded, you can run it and annotate the assembly:

bakta --db /path/to/database assembly.fasta --output output_dir/ 

Bakta also produces a figure of your annotated genome with the locations of genes and major elements, such as rRNA and tRNA loci.

bakta_plot contigs.json

However, it is often useful to interactively browse the annotated assembly. For that, we will use the contigs.json file.

Bakta has an web-app where  you can interactively view and explore the Circos plot: https://bakta.computational.bio/viewer

Explore the BAKTA output files to understand the predicted genes, their functions, and metabolic pathways present in the microbial genome.

Beav

There are often other things in the genome apart from genes. Let’s say we want to better understand the genomic context, i.e. we like to know mobile genetic elements (ICEs, Prophages, Integrons), different secretion systems, biosynthetic gene clusters, operons, origin of replications etc. Usually, you have to run separate tools for each of these.

However, in the Weisberg lab, we developed another bacterial genome annotation pipeline called Beav, which can automate all of these.

After installing Beav, like Bakta, you need to download the database:

beav_db

After the database is installed, you can run beav using following command:

beav --input /path/to/file/test.fna --threads 8 --tiger_blast_database /path/to/databases/blast/refseq_genomic.fna

Alright, we come to end of the tutorial series! In this series I have introduced essential steps and tools to do quality assessment, taxa/contamination detection, assembly of short-/long-/hybrid-data, assessing assembly, and annotating the assembly. Of course, the devil is in the details. I will come back to this tutorial series again, and improve it! Please let me know if you have any questions.


Comments

Leave a Reply

Learn Python for Bioinformatics

I have created a set of worksheets that give a quick overview of Python for Bioinformatics (as well as intro to UNIX). You just have to give it 3-6 hours, and you will know the essentials!

Join 78 other subscribers

Discover more from Arafat Rahman

Subscribe monthly newsletter, and download free worksheet on Python for Bioinformatics.

Continue reading