You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I understand correctly R blocks work by RScript on their contents and serializing any input or output as a CSV file. If so it seems like each R block can run completely arbitrary code, natively at that, but at the cost of complete isolation and only being able to ingest or export a single tabular object per block.
This means for example it's not possible to set a variable either in the R global environment or Mage itself for use in other blocks, output non-dataframe objects, or output multiple objects (even bundled via list).
There are multiple ways to address this I think. The most comprehensive is using rpy2 combined with reticulate for an actual two-way connection between R and Mage's internal python environments. R's QS package, which allows for serializing pretty much every type of R object, may also be helpful.
Parallel to this, for dataframes in particular the use of Apache's Arrow format would be a significant optimization over CSV. There are packages in both R and Python for it.
The text was updated successfully, but these errors were encountered:
If I understand correctly R blocks work by
RScript
on their contents and serializing any input or output as a CSV file. If so it seems like each R block can run completely arbitrary code, natively at that, but at the cost of complete isolation and only being able to ingest or export a single tabular object per block.This means for example it's not possible to set a variable either in the R global environment or Mage itself for use in other blocks, output non-dataframe objects, or output multiple objects (even bundled via list).
There are multiple ways to address this I think. The most comprehensive is using rpy2 combined with reticulate for an actual two-way connection between R and Mage's internal python environments. R's QS package, which allows for serializing pretty much every type of R object, may also be helpful.
Parallel to this, for dataframes in particular the use of Apache's Arrow format would be a significant optimization over CSV. There are packages in both R and Python for it.
The text was updated successfully, but these errors were encountered: