Annotation of Bacterial Genomes

This is Part 5 of tutorial series: NGS Workflow for Genome Assembly to Annotation for Hybrid Bacterial Data

We’ll be use hybrid sequencing data (Illumina and Nanopore). This tutorial has five parts.

Part 1: Downloading and preparing data
Part 2: Assembly with short-reads.
Part 3: Assembly with long-reads.
Part 4: Hybrid assembly (long- and short-reads).
Part 5: Bacterial genome annotation.

Disclaimer: This post is a work in progress. This is genome assembly and annotation workflow that I use for microbial genomics. Previously, I used this template to teach different class in OSU as well as in other training facilities.

After getting a good assembly, you want to have a good annotation. Essentially, you want to know where are the different genes are, and other important features of genome. There are many tools, like Bakta, RAST, Beav, Prokka etc. Let’s start with Bakta.

Bakta

Bakta is a standard tool for annotating bacterial genomes. Along with annotating basic genomic-features like CDS, genes, RNAs, etc., Bakta can also annotate antimicrobial genes using NCBI’s AMRfinder.

After installing Bakta, make sure to install database as well:

bakta_db download --output /path/to/database --type full

Remember where you are saving the database using the --output flag. Once the database is downloaded, you can run it and annotate the assembly:

bakta --db /path/to/database assembly.fasta --output output_dir/

Bakta also produces a figure of your annotated genome with the locations of genes and major elements, such as rRNA and tRNA loci.

bakta_plot contigs.json

However, it is often useful to interactively browse the annotated assembly. For that, we will use the contigs.json file.

Bakta has an web-app where you can interactively view and explore the Circos plot: https://bakta.computational.bio/viewer

Explore the BAKTA output files to understand the predicted genes, their functions, and metabolic pathways present in the microbial genome.

Beav

There are often other things in the genome apart from genes. Let’s say we want to better understand the genomic context, i.e. we like to know mobile genetic elements (ICEs, Prophages, Integrons), different secretion systems, biosynthetic gene clusters, operons, origin of replications etc. Usually, you have to run separate tools for each of these.

However, in the Weisberg lab, we developed another bacterial genome annotation pipeline called Beav, which can automate all of these.

After installing Beav, like Bakta, you need to download the database:

beav_db

After the database is installed, you can run beav using following command:

beav --input /path/to/file/test.fna --threads 8 --tiger_blast_database /path/to/databases/blast/refseq_genomic.fna

Alright, we come to end of the tutorial series! In this series I have introduced essential steps and tools to do quality assessment, taxa/contamination detection, assembly of short-/long-/hybrid-data, assessing assembly, and annotating the assembly. Of course, the devil is in the details. I will come back to this tutorial series again, and improve it! Please let me know if you have any questions.

Annotation of Bacterial Genomes

Bakta

Beav

Comments

Leave a ReplyCancel reply

Learn Python for Bioinformatics

Annotation of Bacterial Genomes

Bakta

Beav

Share this:

Comments

Leave a ReplyCancel reply

Learn Python for Bioinformatics

Discover more from Arafat Rahman