Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better cooperation of as_data_frame() and graph_from_data_frame() ? #223

Open
mbojan opened this issue Aug 30, 2017 · 6 comments · May be fixed by #758
Open

Better cooperation of as_data_frame() and graph_from_data_frame() ? #223

mbojan opened this issue Aug 30, 2017 · 6 comments · May be fixed by #758
Labels
bug an unexpected problem or unintended behavior
Milestone

Comments

@mbojan
Copy link

mbojan commented Aug 30, 2017

It would be great to have the following functionality back:

dlist <- as_data_frame(g, what="both")
graph_from_data_frame(dlist$edges, vertices=dlist$vertices)

In other words, using (parts of) output of as_data_frame in graph_from_data_frame directly. I recall it was possible before recent API changes. Above will work if g has vertex names, but it won't if it does not. See the following reprex:

library(igraph)
#> 
#> Attaching package: 'igraph'
#> The following objects are masked from 'package:stats':
#> 
#>     decompose, spectrum
#> The following object is masked from 'package:base':
#> 
#>     union
g <- make_graph(~ A, B -- C, C -- D)
V(g)$a <- letters[1:4]
g1 <- delete_vertex_attr(g, "name")

# 1
dlist <- as_data_frame(g, what="both")
# Works if `g` has names
r <- graph_from_data_frame(dlist$edges, vertices=dlist$vertices, directed=is_directed(g))
identical_graphs(g, r) # Why?
#> [1] FALSE
identical(dlist, as_data_frame(r, what="both"))
#> [1] TRUE

# 2
dlist1 <- as_data_frame(g1, what="both")
# Won't work if `g` does not have names
graph_from_data_frame(dlist1$edges, vertices=dlist1$vertices, directed=is_directed(g1))
#> Error in graph_from_data_frame(dlist1$edges, vertices = dlist1$vertices, : Some vertex names in edge list are not listed in vertex data frame

In particular:

  1. I think the main reason that this connection is lost is that as_data_frame( . , what="vertices") does not include the vertex IDs as the first column anymore.
  2. I don't understand why g and r are not identical_graphs in # 1. Both edge and vertex data frames are identical. Am I missing something?
@clhunsen
Copy link
Contributor

  1. I don't understand why g and r are not identical_graphs in #1. Both edge and vertex data frames are identical. Am I missing something?

We have the same problem in our scripts. This seems weird as the same function call to identical_graphs works in other places.

What can we do to help debugging this?


@bockthom, @hechtlC: FYI.

@clhunsen
Copy link
Contributor

Quick update to this: When I try a cascade of directed = TRUE and as.undirected(r), it works...

g <- make_graph(~ A, B -- C, C -- D)
V(g)$a <- letters[1:4]
dlist <- as_data_frame(g, what="both")
r <- graph_from_data_frame(dlist$edges, vertices=dlist$vertices, directed=TRUE)
r <- as.undirected(r)
identical_graphs(g, r)
# [1] TRUE

I hope this helps identifying the problem.

@clhunsen
Copy link
Contributor

clhunsen commented Feb 8, 2018

Any news on this?

@krlmlr
Copy link
Contributor

krlmlr commented Mar 31, 2023

Thanks, confirmed.

The first problem seems to be due to a bogus difference between an empty vector and NULL, need to take a look.

The second problem can be fixed with a more careful implementation of graph_from_data_frame() .

library(igraph, warn.conflicts = FALSE)

g <- make_graph(~A, B - -C, C - -D)
V(g)$a <- letters[1:4]
g1 <- delete_vertex_attr(g, "name")

# 1
dlist <- as_data_frame(g, what = "both")
# Works if `g` has names
r <- graph_from_data_frame(dlist$edges, vertices = dlist$vertices, directed = is_directed(g))
identical_graphs(g, r) # Why?
#> [1] FALSE
waldo::compare(unclass(g)[1:9], unclass(r)[1:9])
#> `names(old[[9]][[4]])` is a character vector ()
#> `names(new[[9]][[4]])` is absent

# 2
dlist1 <- as_data_frame(g1, what = "both")
# Won't work if `g` does not have names
graph_from_data_frame(dlist1$edges, vertices = dlist1$vertices, directed = is_directed(g1))
#> Error in graph_from_data_frame(dlist1$edges, vertices = dlist1$vertices, : Some vertex names in edge list are not listed in vertex data frame

Created on 2023-03-31 with reprex v2.0.2

@krlmlr
Copy link
Contributor

krlmlr commented Apr 1, 2023

The first problem is now tracked in #756.

@krlmlr
Copy link
Contributor

krlmlr commented Apr 2, 2023

For the second problem, there are a lot of tests, and probably downstream packages, that rely on graph_from_data_frame() coercing integer vertices to strings. Effectively, this makes it difficult to have graph_from_data_frame() construct an unnamed graph, which seems to be a prerequisite for the roundtrip.

After having tinkered with it, I'd like to change the following in graph_from_data_frame():

  • Only accept data frames and character vectors in the vertices argument
  • Explicitly watch out for a $name attribute in vertices
  • Interpret numeric inputs as vertex IDs in the first two columns of d

We can also extend as_data_frame() to offer an option to always return the edges as numeric vertex IDs.

If we want to be backward-compatible, we could request numeric vertex IDs in the edges to be wrapped by I() . It would be cleanest to make this a breaking change though, but we need to assess the impact first.

@krlmlr krlmlr linked a pull request Apr 2, 2023 that will close this issue
@krlmlr krlmlr added this to the triage milestone Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants