A tool to visualize fastq alignment with quality-score

Do you need to eyeball through fastq alignment file, with special features to highlight the quality score of each bases?

Introducing fastqviz, a Streamlit app which can do just that. I made it quite some time ago to visualize some amplicon data for my own project.

You can upload your fastq file. The fastqviz viewer will show the alignment, along with color highlighting quality of each bases (pink is high quality, dark is low). Scroll sideways to explore through sequence length.

Continue reading “A tool to visualize fastq alignment with quality-score”

Argument against “DIY computational drug-discovery” trend in Bangladesh

I’m seeing a trend that bioscience students from Bangladesh (South Asia in general) are increasingly interested in publishing papers on computational drug design. Many bright students, undergrad or freshly graduated, are actually publishing drug designing related papers in good journals.

I have also done similar projects. But now, I think it is a bad trend.

While it is completely sound to do computational simulations for drug discovery, however, most of the published articles I’m seeing seem to motivated to get a “publication”, with a hope that these “publication” will help to get opportunities for higher studies abroad.

It is understandable that since our universities in Bangladesh do not provide enough good opportunities to have research experience under supervision of good mentor, many bright students are leaning to jump in such “do it yourself drug design and publish it” endeavors.

I have few arguments to make against this trend.

Continue reading “Argument against “DIY computational drug-discovery” trend in Bangladesh”

How to make Co-phylogeny plot: easy tanglegram in R

Tanglegrams are co-phylogeny which is a very powerful visualization tool to examine co-evolution. Here is a tutorial on how to make them in R.

Tanglegram is a representation of co-phylogeny where two phylogenetic trees are linked. This method is super useful to visualize common traits shared by both trees. For example, it can be used to visualize host-pathogen (or host-symbiotic) evolution and visualize if there is any phylogenetic concordance between the two phylogenetic trees.

I was in need to visualize co-phylogeny of phylogenetic tree reconstructed from chromosomal and symbiotic genes. Surprisingly, I didn’t find any strait-forward solution in R that can be used for drawing tanglegram. Particularly I wanted to leverage the beautiful ggtree library. After trying out several methods, I found the following approach works well for me so far.

In this post, I’m going to use two toy trees with the following Newick format. Note that they have the same isolate, but different tree-topology (since supposedly different gene-set were used to reconstruct them).

Continue reading “How to make Co-phylogeny plot: easy tanglegram in R”

A note on learning computational biology

Many asks me about learning Bioinformatics. So, I’m going to put some good learning resources in this note.

If you are a complete beginner, don’t aim to ‘understand’ everything discussed in a course or lecture or book. It’s okay to be partially ignorant but still moving forward. Try to go through 60-70% content of the following source within one-two months. The objective in this stage is to get some good understanding of core Bioinformatics concepts and terminology.

Complete Beginner

1. Bioinformatics Methods I and II, offered by Toronto University in massive-open-online-course (MOOC) Coursera.org has pretty good materials (video+tutorial).

2. On Shikkhok.com, a MOOC platform in Bengali language, there is a very short course on Bioinformatics, বায়োইনফরমেটিক্স পরিচিতি, offered by Bio-Bio-1 Foundation.

3. Reading books is the best way. I’ve found ‘Essential Bioinformatics’ by Jin Xiong an easy to understand book.

Continue reading “A note on learning computational biology”

Achieving Expertise: How to Find the Elusive Origin of Replication Site in DNA?

This write-up was for a writing assignment in Coursera.com MOOC (Massive Open Online Course) , English Composition I: Achieving Expertise offerd from Duke University. There were four assignments, I submitted first two of them and later became very busy with my M.S. thesis work. The second assignment was to select a random picture and writing an explanation regarding what it offer on achieving expertise.



Bioinformatics is a multidisciplinary subject which uses computational techniques employing various mathematical and statistical methods to answer biological questions. One important domain of Bioinformatics is genome analysis and one smoky question it tries to answer is how to find origin of DNA replication site. One expert bioinformatician might want to answer this question. Above image is a cartoon which illustrate this situation that some complex analysis is required to solve this puzzle.

Here, title of the image is “Where in the Genome Does DNA Replication Begin?” (1). This is a cartoon drawn by Randall Christopher, a visiting artist of coursera.org massive open online course (MOOC) Bioinformatics Algorithm I, 2013-14 session. This was taken from week one part of a cartoon series which is connected with course content.

In this picture, two men are walking in an island. The island is small and a blue sea is surrounding around it. There is an ancient ship floating on the sea in background. The island have a bunch of coconut trees, ordered in two lines side-by-side. These two line of trees are of varying length and they form a helical pattern. In front of the image, a box is half-buried in the sea-beach. A text is written outside of the box and that is “DNAA”. Abizarre, large golden bug is sitting on top of the box. A cryptic message is written on the body of the bug in English alphabet but they do not sense anything meaningful to the reader.

These two man are walking in opposite direction in the middle of the canvas. It can be assumed that the they have reached the island by travelling through the ship behind them. The front man is dressed like western character in Hollywood film and the man behind him is wearing pirate-like dress. One of his eye is covered by black bandana. Their foot-step formed a Greek ‘theta’ (ө) like circular path, although the front half of that ‘theta’ is smaller in area-size. Within the smaller half of theta-like-path they are strolling through, there are three wooden box situated which also have  the text ‘DNAA’written on their outside wall. These two people’s body-gesture indicates that they are on a search for something hidden in this path. Some flies are flying on top of their head.

To explicate this image, the title also needed to be understood clearly. DNA is found in all living cell (with some exceptions like mature red blood cell) which is chemically called deoxyribonucleic acid and it is the information carrying component of the cell. The DNA functions as blueprint of life. Genome is the collection of whole DNA sequence which is written using four alphabets, namely A, T, G and C. Replication is the copying mechanism of DNA during cellular division. DNA is a double stranded molecule and there is a specific point where replication begin by splitting two strands. This splitting event form a small loop in the DNA molecule and replication starts in opposite direction. In ancient single cellular (eg. bacteria) cases, DNA is a circular molecule and when the replication begins, the size of loop increase. By this process, an imbalanced Greek theta shape of DNA is formed. In this image, the circular path is depicting DNA molecule. This conjecture is also corroborated by the helical shape of
coconut trees in background, as because we know that DNA is a helical molecule. The two ‘wildly-dressed’ investigators searching for something in the pierced-stranded-loop of the circular DNA, smaller-half of theta, and this is depicting their search for origin of replication.

Another important element in this image is DNAA boxes within the loop. The DNAA boxes are sequence motif, a specific pattern of nucleotides, which is recognized by DNA replication machinery and situated near DNA replication origin site. The idea of golden bug taken from Edgar Allan Poe’s short-story ‘The Golden-Bug’ (2), where the protagonist had to solve a cryptic message, which is written in this image on the body of the golden-bug sitting on the front, half-buried DNAA box. In this image, the protagonists are two bioinformatician, and cryptic message written on the body of gold
bug is an analogy of hidden pattern in DNA sequence. The gold bug is sitting on the DNAA box indicates that these bioinformatician has to find where DNAA box is situated by demystifying this hidden pattern.

This is exactly how bioinformatician works. They aim to break cryptic code of life, understand it and try to answer those elusive questions, such as understanding the pattern of DNAA box sequence which will help them to find origin of replication.

1) Bioinformatics Algorithm I (2013-14), coursera.org,https://class.coursera.org/bioinformatics-001
2) Edgar Allan Poe (1891). The Gold Bug. G. Routledge & Sons, Limited.