Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AFLW calc_scags_wide() computation time and errors due to interp package #13

Open
harriet-mason opened this issue Jul 1, 2022 · 0 comments

Comments

@harriet-mason
Copy link
Collaborator

The AFLW example from the UseR! talk and working paper no longer runs. The code for the example is below.

library(fitzRoy)
library(dplyr)
library(interp)
library(cassowaryr)

# Get data and find scatter plot that is instant and long scatter plot
aflw <- fetch_player_stats(2020, comp = "AFLW")

aflw_num <- aflw %>%
  select_if(is.numeric)

aflw_num <- aggregate(aflw_num[,5:37],
                      list(aflw$player.player.player.surname),
                      mean)

#This calculation used in the paper and package will take an estimated 4h and return an error after 30mins
AFLW_scags <- calc_scags_wide(aflw_num[,c(2:34)]) 

There are two issues with this calculation.

  1. The computation time has been increased from approximately 12 minutes to 4 hours. This is due to a handful of plots that were previously instantaneous when calculated under tripack now taking over a minute each. There isn't a clear visual feature that is causing this issue and the only way to find out a plot has a large number of edges is to run the Delaney triangulation. Tracing back the source of the sudden increase of time we have:
    cassowary::scree > alphahull::delvor > interp::tri.mesh > interp:::shull.deltri
    The function interp::tri.mesh is a replacement for tripack::trimesh (this is where the packages differ) BUT inerp::shell.dultri isn’t written with R code, it is a wrapper for a function in C++. Therefore, even though the shull library is open source, it being written in C++ makes these errors hard to fix. An example of a scatter plot that was previously instantaneous and now takes over a minute is below.
# three vectors (one is problems)
a <- aflw_num$goals
b <- aflw_num$behinds
c <- aflw_num$handballs
# a,b is fast, a,c is slow (1.5mins)

# get ac to shull input
#inside cassowaryr::scree()
ac <- cbind(cassowaryr:::unitize(a), cassowaryr:::unitize(c))
dupes <- paste(a, c, sep =",")
ac <- ac[!duplicated(dupes),]

#inside alphahull::delvor()
y=NULL
AC <- xy.coords(ac, y)
tri.obj1 <- interp::tri.mesh(AC) #1min 20seconds
tri.obj2 <- tripack::tri.mesh(AC) #instantaneous
  1. There is an "error counting arcs" that I cannot preemptively check for that was not returned with tripack.
# Scatter plot that throws an error
d <- aflw_num$totalPossessions
e <- aflw_num$contestedPossessions
scree(d,e) # returns an error from the shull.deltri function

Since the error comes from the C++ code, I can't debug it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant