How to make Co-phylogeny plot: easy tanglegram in R

Apr 20, 2022

—

— 2,217 reads

Tanglegram is a representation of co-phylogeny where two phylogenetic trees are linked. This method is super useful to visualize common traits shared by both trees. For example, it can be used to visualize host-pathogen (or host-symbiotic) evolution and visualize if there is any phylogenetic concordance between the two phylogenetic trees.

I was in need to visualize co-phylogeny of phylogenetic tree reconstructed from chromosomal and symbiotic genes. Surprisingly, I didn’t find any straight-forward solution in R that can be used for drawing tanglegram. Particularly I wanted to leverage the beautiful ggtree library. After trying out several methods, I found the following approach works well for me so far. I have released a small R package on it, which can be found on GitHub.

In this post, I’m going to use two toy trees with the following Newick format. Note that they have the same isolate, but different tree-topology (since supposedly different gene-set were used to reconstruct them).

The datasets used in this tutorial can be downloaded from here.

Tree 1: (((((((A:4,B:4):6,C:5):8,D:6):3,E:21):10,((F:4,G:12):14,H:8):13):13,((I:5,J:2):30,(K:11,L:11):2):17):4,M:56);
Tree 2: (((((((F:8,I:18,G:4):2,C:5):3,M:6):3,E:21):10,((A:2,B:2):4):3):13,((K:5,L:2):20,(H:18,J:11):2):17):4,D:56);

We might be interested to visualize one (or more) interesting feature(s) (i.e. genotype) associated with the isolates in both trees. Our meta-file looks like the following:

Isolate	Genotype
A	Green
B	Green
C	Green
D	Green
E	Red
F	Red

Let’s load the necessary packages.

library(ggplot2)
library(ggtree)
library(phangorn)
library(dplyr)

The best thing about ggtree is you can attach any feature(s) associated with isolates using a simple CSV file. Just make sure that the first column of your metafile has the same isolate names as used in the tree. Let’s load the metafile and both phylogenetic tree to be used. I also like to do midpoint rooting at this point.

# Meta file
meta <- read.table('~/path/to/meta.csv', sep=',', header = T)

# Load tree 1
tree1 <- read.tree('~/path/to/tree1.nwk')
tree1 <- midpoint(tree1)

# Load tree 2
tree2 <- read.tree('~/path/to/tree2.nwk')
tree2 <- midpoint(tree2)

Let’s combine the meta feature dataset with both phylogenetic trees and visualize how they look.

t1 <-ggtree(tree1)  %<+%  meta + geom_tiplab()
t2 <- ggtree(tree2)  %<+%  meta + geom_tiplab()

t1
t2

Tree 1

Tree 2

Now we are going to draw both trees in a single figure. We also want to flip tree 2, for which we need to change the x-coordinates in that tree.

d1 <- t1$data
d2 <- t2$data

d1$tree <-'t1'
d2$tree <-'t2'

d2$x <- max(d2$x) - d2$x + max(d1$x) +  max(d1$x)*0.3
pp <- t1 + geom_tree(data=d2)
pp

In the above code block, we are grabbing the backend data frame from both trees and updating the tree 2 data frame x-coordinate. We are using this equation for the update: max(d2$x) - d2$x + max(d1$x) + max(d1$x)*0.3. You can toy with different values depending on the branch length unit of your tree to get good visualization (I particularly suggest changing max(d1$x)*0.3 terms).

Two phylogenetic trees, face-to-face.

Let’s join d1 and d2 for dataset so that we can use the coordinates of the tips for making connections between both of the trees.

dd <- bind_rows(d1, d2) %>% 
  filter(isTip == TRUE)
dd1 <- as.data.frame(dd)

Now, we are going to conditionally join the tips of both trees for the feature we are interested in. Connected tips will represent the same isolates.

green_tree <- dd1[which(dd1$Genotype == 'Green'), c('label', 'x', 'y', 'tree')]
pp + geom_line(aes(x, y, group=label), data=green_tree, color='#009E73')

Connecting the isolates

Lily asked in the comment section if it is possible to connect all the tip labels from both trees. The previous code chunk actually connects the tips based on a subset from meta. In this case, we do not have to do any kind of subsetting. Using the following code we can connect all tips:

pp + geom_line(aes(x, y, group=label), data=dd1)

Here, label is the tip-label variable, which is already associated with cophylogeny. You can check it by head(pp$dd1)

This may look messy since I am using totally random trees. However, if your co-phylogeny trees have some particular pattern, the tanglegram will show that.

Due to the popularity of this tutorial, I have released a small R package that can help you to draw simple tanglegram from two ggtree objects.

The R package called TangleR, currently released in GitHub.

You can download it in R using the following command:

library("devtools")
install_github('acarafat/tangler')

Here’s how to use this TangleR package:

library(ggtree)
library(tangler)

# Load meta
meta=read.csv('tree_meta.csv', header=T) 


# Load tree 1
t1 <- read.tree('tree1.nwk')

# Load tree 1 and use ggtree to annotate features
tree1 <- ggtree(t1)   %<+% meta +
  geom_tippoint(aes(color=species))

# Load tree 2
t2 <- read.tree("tree2.nwk")
tree2 <- ggtree(t2) %<+% meta

# Draw Tanglegram
simple.tanglegram(tree1, tree2, Genotype, Green, tiplab = T)

I hope this is useful. Please let me know if you need any other features or find bugs! Thanks!

Comments

20 responses to “How to make Co-phylogeny plot: easy tanglegram in R”

Eric

June 2, 2024

Hi, thank you for this great article, there’s no easier way to create a tanglegram other than yours ! I am trying to use the new TangleR package and the example on this page doesn’t work, I’m getting this error:

Error in unique.default(x, nmax = nmax) :
unique() applies only to vectors

Any idea where it could come from ?
Thanks !
Eric

Reply
1. Arafat
  
  June 3, 2024
  
  Thank you so much for reporting the issue. There was a small bug in the simple.tanglegram function, which has been corrected. Please test it again, and let me know if it has been fixed! Best, Arafat!
  
  Reply
Jianshu

March 30, 2024

Hi, in my case, I have several same names (tip label) in one tree, and the name is also in the second tree, I want to add link from all such names in tree 1 to tree 2. In the code provided, link within tree 1 are created (they are the same tip label), 2 same names in tree 1, only 1 are linked to tree 2, another linked each other in tree1. How should I adjust the code to have what I want?

Reply
1. Arafat
  
  April 2, 2024
  
  Hello! The strain names has to be unique in ggtree. But, you can add a second column in your meta where you can have the same names in the tip labels. You can use the geom_tiplab2 to show this column info. Then use the new column to connect the lines. This might work, please try!
  
  Reply
Rehemah Gwokyalya

October 1, 2023

Hi, this is really great. Thanks for sharing it with us. I ma wondering if it is possible to subset basing on what is present in both data sets i.e., x > 0 in both A and B data sets?

Reply
1. Arafat
  
  October 2, 2023
  
  Hi, thanks for commenting. If I understand your question correctly, you can make a new column based on a condition of interest, and use the new column to subset and plot connected lines.
  
  Reply
Manuela

July 14, 2023

Hi! Is there a way to rotate nodes on the trees? I rotate them before adding the meta data to the tree file but I keep getting the same tree order.

Also, is there also a way to make the lines different colors, like make each clade a different color?

Thank you

Reply
1. Arafat
  
  July 19, 2023
  
  I think this is possible.
  
  For rotation, you have to rotate them after adding meta data to the tree, or after you connect two trees side by side.
  
  To selectively colorize different clades, you can use this tutorial: https://yulab-smu.top/treedata-book/chapter6.html
  
  Reply
Dylan H. Cohen

October 26, 2022

Is there anyway to connect the line from the end of the tip label to the other tree so the line does not cross thru the tip label? Thanks1

Reply
1. Arafat
  
  November 17, 2022
  
  Sorry for the late response. I think that is possible. I’ll try and come back to this. Thanks for the idea!
  
  Reply
2. Arafat
  
  February 7, 2024
  
  Yes, if you check how it is plotting the line, geom_line(aes(x, y, group=label), data=green_tree, color='#009E73'), you just need to update the x coordinate in the green_tree data set.
  
  For each label, there are two entries here, one for tree 1 and another for tree 2.
  
  If you add some constant value to x-coordinate for tree 1 and subtract the same value from y-coordinate, that will do!
  
  Sorry for the late reply 🙁
  
  Reply
Astrid

October 12, 2022

What if you have two different trees that do not have the same tip labels. For example a phylogenetic tree based on a core genome and then a protein based tree and then you want to show which genomes in the core tree that harbours the protein in the second tree. Thanks!

Reply
1. Arafat
  
  November 17, 2022
  
  That is definitely possible. Please give me some time, and I’ll update the main post. Sorry for the late reply. Best.
  
  Reply
Fernando Hayashi

July 27, 2022

I was not able to find how the variable dd1 was created. Could you please clarify?

Reply
1. Arafat
  
  July 27, 2022
  
  Sorry about that! I just added it to the main post:
  
  dd <- bind_rows(d1, d2) %>%
  filter(isTip == TRUE)
  dd1 <- as.data.frame(dd)
  
  Reply
  1. Fernando Hayashi
    
    July 27, 2022
    
    Thank you very much!
    
    Reply
Alex

May 25, 2022

Hi, really clear description of how to draw tanglegrams within R. Would it be possible to extend this to more than two trees?

Reply
1. Arafat
  
  May 25, 2022
  
  That is really interesting idea. I think that is technically possible. Let me try that, I’ll come back to you soon! Thanks for reading 🙂
  
  Reply
Lily

May 11, 2022

Hi, this is great thanks. How would you edit your code to link the tips from the two trees not by your meta file, but by matching tiplabels? Thanks! Lily

Reply
1. Arafat
  
  May 25, 2022
  
  Yes you can do that. You do not need to do any subset, and use the following command in R instead:
  
  pp + geom_line(aes(x, y, group=label), data=dd1)
  
  Please check the article, I have updated it!
  
  Thanks for reading 🙂
  
  Reply