This is Part 5 of tutorial series: NGS Workflow for Genome Assembly to Annotation for Hybrid Bacterial Data
We’ll be use hybrid sequencing data (Illumina and Nanopore). This tutorial has five parts.
- Part 1: Downloading and preparing data
- Part 2: Assembly with short-reads.
- Part 3: Assembly with long-reads.
- Part 4: Hybrid assembly (long- and short-reads).
- Part 5: Bacterial genome annotation.
Disclaimer: This post is a work in progress. This is genome assembly and annotation workflow that I use for microbial genomics. Previously, I used this template to teach different class in OSU as well as in other training facilities.
After getting a good assembly, you want to have a good annotation. Essentially, you want to know where are the different genes are, and other important features of genome. There are many tools, like Bakta, RAST, Beav, Prokka etc. Let’s start with Bakta.
Bakta
Bakta
is a standard tool for annotating bacterial genomes. Along with annotating basic genomic-features like CDS, genes, RNAs, etc., Bakta
can also annotate antimicrobial genes using NCBI’s AMRfinder
.
After installing Bakta
, make sure to install database as well:
bakta_db download --output /path/to/database --type full
Remember where you are saving the database using the --output
flag. Once the database is downloaded, you can run it and annotate the assembly:
bakta --db /path/to/database assembly.fasta --output output_dir/
Bakta also produces a figure of your annotated genome with the locations of genes and major elements, such as rRNA and tRNA loci.
bakta_plot contigs.json
However, it is often useful to interactively browse the annotated assembly. For that, we will use the contigs.json
file.
Bakta
has an web-app where you can interactively view and explore the Circos plot: https://bakta.computational.bio/viewer
Explore the BAKTA output files to understand the predicted genes, their functions, and metabolic pathways present in the microbial genome.
Beav
There are often other things in the genome apart from genes. Let’s say we want to better understand the genomic context, i.e. we like to know mobile genetic elements (ICEs, Prophages, Integrons), different secretion systems, biosynthetic gene clusters, operons, origin of replications etc. Usually, you have to run separate tools for each of these.
However, in the Weisberg lab, we developed another bacterial genome annotation pipeline called Beav
, which can automate all of these.
After installing Beav
, like Bakta
, you need to download the database:
beav_db
After the database is installed, you can run beav
using following command:
beav --input /path/to/file/test.fna --threads 8 --tiger_blast_database /path/to/databases/blast/refseq_genomic.fna
Alright, we come to end of the tutorial series! In this series I have introduced essential steps and tools to do quality assessment, taxa/contamination detection, assembly of short-/long-/hybrid-data, assessing assembly, and annotating the assembly. Of course, the devil is in the details. I will come back to this tutorial series again, and improve it! Please let me know if you have any questions.
Leave a Reply