Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved R support (object serialization, environment isolation) #5022

Open
D3SL opened this issue May 2, 2024 · 0 comments
Open

Improved R support (object serialization, environment isolation) #5022

D3SL opened this issue May 2, 2024 · 0 comments
Labels
enhancement Polish or UX improvements

Comments

@D3SL
Copy link

D3SL commented May 2, 2024

If I understand correctly R blocks work by RScript on their contents and serializing any input or output as a CSV file. If so it seems like each R block can run completely arbitrary code, natively at that, but at the cost of complete isolation and only being able to ingest or export a single tabular object per block.

This means for example it's not possible to set a variable either in the R global environment or Mage itself for use in other blocks, output non-dataframe objects, or output multiple objects (even bundled via list).

There are multiple ways to address this I think. The most comprehensive is using rpy2 combined with reticulate for an actual two-way connection between R and Mage's internal python environments. R's QS package, which allows for serializing pretty much every type of R object, may also be helpful.

Parallel to this, for dataframes in particular the use of Apache's Arrow format would be a significant optimization over CSV. There are packages in both R and Python for it.

@wangxiaoyou1993 wangxiaoyou1993 added the enhancement Polish or UX improvements label May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Polish or UX improvements
Projects
None yet
Development

No branches or pull requests

2 participants