Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve tip lengths in dendrogram (similarly to plot.hclust()) #368

Open
almeidasilvaf opened this issue Mar 25, 2024 · 2 comments
Open

Comments

@almeidasilvaf
Copy link

Hi, @thomasp85

First of all, thank you for this great package!

I've been trying to use {ggraph} to create a dendrogram that preserves tip lengths as plot.hclust() does, but I can't seem to make it work. I was wondering if there is already an easy to do this with geom_edge_elbow().

In the reprex below, I demonstrate that I can do what I want with a hacky way by using data produced by {ggdendro}, but I don't like this approach. Is there a more elegant way of doing the same using {ggraph}?

suppressPackageStartupMessages(library(ggplot2))

# Simulate data
simdata <- matrix(
    rnorm(100 * 50, mean = 10, sd = 2),
    nrow = 100
)
cormat <- cor(t(simdata))
hc <- hclust(as.dist(1 - cormat), method = "average")

# 1) The base R way
plot(hc, labels = FALSE)

# 2) The {ggraph} way
p1 <- ggraph::ggraph(
    hc, layout = "dendrogram",
    height = .data$height
) +
    ggraph::geom_edge_elbow()

p1

# 3) The {ggdendro} way
p2 <- ggdendro::ggdendrogram(hc, labels = FALSE)
p2

# 4) The hacky way: Extracting x, y, xend, and yend coords with {ggdendro}
ddata <- ggdendro::dendro_data(hc, type = "rectangle")$segments
head(ddata)
#>           x         y      xend      yend
#> 1 39.324219 1.0234122 15.273438 1.0234122
#> 2 15.273438 1.0234122 15.273438 1.0006401
#> 3 15.273438 1.0006401  6.015625 1.0006401
#> 4  6.015625 1.0006401  6.015625 0.9593656
#> 5  6.015625 0.9593656  3.125000 0.9593656
#> 6  3.125000 0.9593656  3.125000 0.8316230

ddata$yend[ddata$yend < 0.05] <- ddata$y[ddata$yend < 0.05] - 0.05

p3 <- ggplot(ddata) +
    geom_segment(
        aes(x = .data$x, y = .data$y, xend = .data$xend, yend = .data$yend), 
        linewidth = 0.3
    ) +
    coord_cartesian(ylim = c(0, NA))

p3

Created on 2024-03-25 with reprex v2.1.0

Best,
Fabricio

@KlausVigo
Copy link

Hi @almeidasilvaf,
if you apply hclust the tree is ultrametric, that means that the distance from each tip to the root is equal. So p2 is actually preserving the edge lengths, while p1 is shortening external edges. The external edges in p1 are also all the same, but if you don't want the edges to be shortened, use

plot(hc, hang=-1)

In phylogenetic hclust(..., method = "average") is better known as UPGMA. plot.hclust additionally multiplies the edges by a factor of two, so interpretation of plot.hclust is different to phylogenetic trees and easy to mislead the reader relying on default values. Here in comparison the ape representation:

library(ape)
plot(as.phylo(hc))
axisPhylo()
max(cophenetic(hc))
# The distance of the root from the tips is half the cophenetic distance.

Kind regards,
Klaus

@almeidasilvaf
Copy link
Author

Hi, @KlausVigo

That's interesting, I didn't know that hclust() generated ultrametric trees by default. In my case, I really want to generate plots like plot.hclust() does (with shortened external edges) from hclust objects using the ggplot2 system. Would you know a way of extracting the coordinates that plot.hclust() uses?

I could also rely on phylo objects (although, in my case, I am not working with phylogenies, it's really hierarchically clustered dissimilarity matrices) and create the plots with the {ggtree} package. The problem is that {ggtree} is extremely heavy and can easily lead to a huge increase in complexity in the dependency graph of the package where I want to implement this functionality.

Best,
Fabricio

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants