A note on learning computational biology

Many asks me about learning Bioinformatics. So, I’m going to put some good learning resources in this note.


If you are a complete beginner, don’t aim to ‘understand’ everything discussed in a course or lecture or book. It’s okay to be partially ignorant but still moving forward. Try to go through 60-70% content of the following source within one-two months. The objective in this stage is to get some good understanding of core Bioinformatics concepts and terminology.

Complete Beginner

1. Bioinformatics Methods I and II, offered by Toronto University in massive-open-online-couse (MOOC) Coursera.org has pretty good materials (video+tutorial). One can download the course materials from this Google drive folder.

2. On Shikkhok.com, a MOOC platform in Bengali language, there is a very short course on Bioinformatics, বায়োইনফরমেটিক্স পরিচিতি, offered by Bio-Bio-1 Foundation.

3. Reading books is the best way. I’ve found ‘Essential Bioinformatics’ by Jin Xiong an easy to understand book.

Intermediate

1. Start reading computational biology/bioinformatics-related research papers. It’s good idea to read a research paper, do all the analysis mentioned in the paper with it’s data, and trying to generate same/similar results. This process is called reproduction and very helpful to understand how to do a real bioinformatics project. Here’s a list of bioinformatics journals.

The journal Nature has a series of educational articles where experts describe different concepts in Bioinformatics, Statistics and Data Visualizations. Dr. Xianjun Dong from Harvard University has compiled an index of those papers in a PDF document. I encourage everyone to use this resource as a syllabus.

2. Learn coding. The target is to write small scripts that can automate many boring mouse-clicking tasks and save time. Say doing a single BLAST on NCBI is easy. But when you need to do BLAST with 20+ sequences, it’s madness. So learn Python, it’s current state-of-the-art language for bioinformatics programing (along with data science, too) and very easy to use/learn. I teach a Python for Bioinformatics online-course in cBLAST (an online course forum run by University of Dhaka). Here are some other good resources to learn Python:

  • Codecademy.com has great learning environment.
  • This site contains several slides at its very bottom section ‘Introduction to Programming for Bioinformatics in Python’. I actually learnt Python from these slides. Just write the commands and try to get same answers, do the exercise. It’s very easy to understand.
  • Rosalind.info is a site where one can learn and improve his/her skill in bioinformatics programming. You can learn python in it’s ‘Python Village’ section. After that, I suggest solving problems in ‘Bioinformatics Stronghold’. The structure of Rosalind.info is very interesting. Initially, the problems will be easy. But as you start to solve them, the problems will be harder.

Well, have a look into Rosalind Country Ranking, Bangladesh is currently in the 3rd position world-wide!

3. Learn R. More on this later.

Advanced

RNA-Seq

I have used different packages like DESeq2, EdgeR, Limma to do RNA-Seq data analysis. For starters, it is really important to understand the normalization problem between libraries these packages try to solve. StatQuest have some great video to explain these things. Here’s an index of their YouTube video channel:

Video Explaining RNA-Seq Normalization Methods

Pipeline for doing RNA-Seq analysis

Also there are some pipe-line I have followed for doing the analysis. Here’s some link to them:

[Updated: April 8, 2019]

One thought on “A note on learning computational biology”

Leave a Reply