tree

How to Find OTUs Using Mothur

OTU (Operational Taxonomic Unit) is a convenient way that researchers use to classify a bunch of sequences from different samples into in a taxonomic group based on the similarity.

It’s convenient, because one not necessarily need to bother about the actual taxonomic class of the sequence and move forward to analyze the diversity of the samples they are dealing with.

Mothur is a popular tool which is used for many analysis in microbial ecology. I use the following commands to generate OTUs from any input dataset:

# Check sequences
filter.seqs(fasta=input.aln.fasta, trump=.)

The input sequence alignment is in fasta format (input.aln.fasta).

trump is used to remove any position/column in the alignment where there is a missing base in any sequence.

Let’s make a distance matrix:

# Make distance matrix in phylip format
dist.seqs(fasta=current, calc=onegap, output=lt)

The option current sets to take the output from the previous step as input in the present step.

For defining OTUs, >= 97% sequence similarity cutoff are used often.

# Make OTUs
cluster(phylip=current, cutoff=0.03)

Once we have the OTUs, you may want to get summary statistics. Doing rarefaction analysis is also useful. And finally, getting OTU sequences for further downstream analysis

# Create OTU table
make.shared(list=current, group=meta.group)

# Do rarefaction
rarefaction.shared(shared=current)

# Get OTU consensus sequences
consensus.seqs(fasta=current, list=current)

Group (i.e. meta.group file) takes a tab separated file that looks like the following:

Strain 1       Species 1
Strain 2       Species 1
Strain 3       Species 2
Strain 4       Species 2
Strain 5       Species 3
Strain 6       Species 3
...            ...

However, with the advent of more precise amplicon sequencing methods, ASV (Amplicon Sequencing Variants) are becoming more popular. For finding the unique sequences, use the following command:

unique.seqs(fasta=input.aln.fasta, format=name)


Comments

Leave a Reply

Join as a subscriber

Only the posts on data visualization, bioinformatics how to tutorials, web-development, and general comments on research and science will be sent.

Join 10 other subscribers