Suggestion #18

statquant · 2021-08-12T14:52:32Z

Hello, I wanted to suggest packages additions and removals

nanotime is the first package to handle nanos in R, I think it should be used in replacement of clock
clustermq is a package that leverage zeromq to send function to workers on a grid with 0 files involved
qs is a binary format like fst that supports all objects (vs data.frames for fst) though no random access
lubridate is (I think) fairly slow so I am surprised it’s on the list

Many thanks for this universe, I did not know kit, that's great to discover new packages
Regards

SebKrantz · 2021-08-12T15:06:25Z

Hello, thank you! I already considered nanotime, but did not include it for now because my thinking was it provides a specialized class that few people require and those that require it know about it. But I can include it for sure. Lubridate and ggplot2 are on the list because I haven’t quite found convenient replacements for them, and the fastverse should still be somewhat well rounded. I don’t know about the other packages, but you can send a pull request to the development branch, making a new category for reading and writing files.

Otherwise I’ll look at them during the weekend...

statquant · 2021-08-12T15:10:36Z

Hello, I personally never really found lubridate very helpful, totally agree with ggplot2 that should just be there because there is nothing else quite like it. Will do PR (adding links to each repo too while we're at it).

SebKrantz · 2021-08-12T15:16:12Z

Great, thanks! So what do you use for standard Date and POSIXct manipulation? I know clock does it, but is mostly geared towards its own set of classes.

statquant · 2021-08-12T15:22:00Z

For Date I use base::Date or rather now data.table::IDate which make sure internal representation is integer so it is used much faster by data.table. Most of the time I only need to add/remove a number of days. What operations do you usually have to do ?
I use a mix of POSIXct and nanotime, they are interchangeable with nanotime, note there is nanoduration too that is a proxy of "time of day" too which is very useful.

nickforr · 2021-08-12T15:38:37Z

Hope nobody minds me jumping into this issue but is it worth mentioning the arrow package alongside fst and qs, as the parquet format gives options for sharing binary files with python etc (apologies if fst does this and I’ve missed that)?

SebKrantz · 2021-08-12T15:46:30Z

Thanks @nickforr, as I said a category for reading and writing files can be added featuring arrow, vroom etc. just make PR to development branch. Also mention the number of dependencies, you can use fastverse_deps(pck, recursive = TRUE).

@statquant I know about and have mentioned IDdate, but it’s a data.table thing that is not totally portable. As an economist I deal a lot with monthly and quarterly data where I use a mix of lubridate and xts/zoo. We can keep this thread open, I definitely don’t mind good packages like nanotime being addeed. I‘m not yet convinced lubridate should be removed. I also have not benchmarked it tbh, just know that dependency wise it is definitely different from the rest of the tidyverse and it serves a lot of comon tasks.

SebKrantz · 2021-08-12T18:08:59Z

Just one note to both of you: you need to fork, implement, and send PR on the development branch. You cannot fork "main" or "CRAN-Version" branch and send an PR from those to development, as that will include other stuff I don't want in development.

SebKrantz · 2021-08-15T20:34:00Z

I've added nanotime qs and arrow for now, but perhaps you can still improve on my descrriptions and add the links - as you find time.

eddelbuettel · 2021-08-15T20:42:50Z

(Came here late via the commit you just made.... thanks for that)

So what do you use for standard Date and POSIXct manipulation?

anytime::anydate() and anytime::anytime() really do all I need or want, and never need a format (on sane inputs). Others are slowly copying its design---I need to check when base R did this but it now also 'guesses' over two or three plausible formats on some converters. And lubridate, apparently, also does (or will). But like @statquant I never found a use for lubridate, likely because it first started it was very slow. The C++ rewrites and pruning of dependencies made it better.

And nanotime is good when you have to deal with sub-microsecond timestamp as is now common with high-frequency trading data. It's S4 class is pretty sane (thanks to @lsilvest who rewrote my more basic S3 class and then added a ton more useful features). So yes, adding it here makes sense even it (sadly) is not quite as minimal in its dependencies. At least recursively it still doesn't blow up. I haven't looked at clock at all as we covered the same ground earlier for our needs so 🤷‍♂️

BenoitLondon · 2021-08-16T02:02:38Z

Hello!
This "package" is a nice idea, I used myself the (defunct?) pkgverse package to build my own package universe...
Sorry to bump on this issue but I wanted to suggest some packages.
Maybe they are too specialized or do not meet your coding standards...

SebKrantz · 2021-08-16T08:51:41Z

Thanks @eddelbuettel for clarifying this about lubridate and nanotime. I think it is good then to have all these packages here. I recall from my use that I didn't find lubridate terribly slow, and indeed it has both C and C++ functions.

Thanks also @BenoitLondon for these suggestions. I did not know about pkgverse, but it's a nice idea. I could create a function fastverse_child() allowing the creation of a 0-dependency extensible verse like the fastverse - for a future release.

Regarding the packages you suggested, I am happy to add stringdist.

The others I think don't qualify because (1) speedglm and ranger are packages to estimate specific kinds of statistical models. The fastverse focuses on general purpose statistical computing and data manipulation, and for good reason: we are talking about more than 50 packages in the estimation category: from various fast lm's, glm's, panel data and time series models (e.g. Kalman Filter), various fast machine learning models (random forests hast at least two faster implementations, there are several fast knn and other classifiers). We could add an "Estimation" category as an extra feature at the end of the README file, but then we should try to be comprehensive and also need to move in broad strokes e.g. just listing the packages under serveal categories 'liner models', 'time series', 'classifiers', 'imputation' etc.. At the moment I certainly don't have time to ckeck out all those packages and determine their dependencies, but if you want to undertake a comprehensive mapping of fast and low-dependency estimation packages I can add it to the README. Just picking out two packages here is definitely not an option, and estimation packages will never be added to the documentation under ?fastverse_extend.

future also for me does not qualify because it is a parallel computing package. Parallel implementation alone makes nothing fast, it depends on the code that is being parallelized, and C/C++ level parallelism (as you have in data.table, fst, roll etc.) also does a significantly better job at that. In any case, the fastverse includes packages alllowing you to write 'fast code' for statistical computing and data manipulation. Everything else is for the "High-Performance Computing" Task View on CRAN. redux appears to be in the same category, although I don't fully understand it.

SebKrantz · 2021-08-16T11:46:32Z

Now added stringdist and the links.

BenoitLondon · 2021-08-16T17:00:37Z

thanks! Yeah I think the ability to create several of our own *-verses is quite nice, depending on what you re working on.

some examples :

data-verse (which could be fastverse)
web-verse (with httr, rvest etc)
pkg-dev-verse (with testing pkg, usethis etc)
ML-verse (with ranger speedglm h2o etc)

Can create a new issue for this if you want?

SebKrantz · 2021-08-16T17:34:46Z

Thank you, yes I can add it as an extra, but the purpose of my package is not to make a verse-creating package, but to emphasize packages with certain desirable properties. Full flexibility to customize this verse, both gloablly and for specific projects, has already been granted (see vignette). The disadvantage of creating wholly separate verses is that it requires creating a source package which is not available on CRAN, whereas simply adding a configuration file inside a project directory is much easier. So I'll keep it in the back of my head and implement it if feasable. I don't think an extra issue is necessary. Thanks.

BenoitLondon · 2021-08-16T17:46:38Z

sure, makes sense thanks!

SebKrantz · 2021-08-23T23:00:01Z

So @BenoitLondon I have just pushed an update to github which includes a function fastverse_child that does what you want. Feel free to check it out and give feedback.

emmansh · 2021-09-03T13:05:17Z

@SebKrantz this organization is a great initiative. If I may, I would like to remind of {rrapply} which is a great package for dealing with lists. It provides great speed with no dependencies.

SebKrantz · 2021-09-13T18:55:06Z

Thanks @emmansh, this package is interesting. I will check it out.

s3alfisc · 2022-05-21T12:44:32Z

Hi @SebKrantz, mabye this is out of scope for the fastverse, but I wanted to point you towards the dqrng package, which provides very fast sampling of random numbers. Here is a benchmark I did a while back:

library(dqrng)
library(bench)

m <- 1000
n <- 99999
all <- m * n
bm <- bench::mark(samp = sample(x = c(1, -1), size = all, replace = TRUE),
                  dqsamp = dqsample(x = c(1,-1), size = all, replace = TRUE),
                  check = FALSE, 
                  iterations = 3)
bm

# # A tibble: 2 x 13
#   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
#   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
# 1 samp          6.37s    6.59s     0.153    1.12GB    0.153     3     3     19.56s
# 2 dqsamp        1.07s    1.43s     0.723    1.12GB    0.482     3     2      4.15s
# # ... with 4 more variables: result <list>, memory <list>, time <list>, gc <list>

SebKrantz · 2022-05-22T19:56:32Z

Thanks @s3alfisc, yes, random number generation is part of general purpose statistical computing and I am happy to include it.

t-wojciech · 2022-09-16T16:06:57Z

Here is a new package ast2ast that translates some functions from R to C++, so they are faster. I think it's worth looking into.

SebKrantz · 2022-09-17T14:43:48Z

Thanks @t-wojciech. I also recently became aware of several approaches of compiling R to make it faster. I'll investigate and think about featuring such packages in the fastverse over the coming weeks.

t-wojciech · 2023-03-07T10:02:31Z

rpolars bindings to Polars. It's still in the early stages (not available on CRAN), but it promises to be interesting.

Add r-polars and Tidier.jl (#18).

tony-aw · 2023-06-25T16:22:50Z

Hello, thank you! I already considered nanotime, but did not include it for now because my thinking was it provides a specialized class that few people require and those that require it know about it. But I can include it for sure. Lubridate and ggplot2 are on the list because I haven’t quite found convenient replacements for them, and the fastverse should still be somewhat well rounded. I don’t know about the other packages, but you can send a pull request to the development branch, making a new category for reading and writing files.

Otherwise I’ll look at them during the weekend...

Sorry to jump in like this. But regarding an alternative to ggplot2: what do you think of the vegabrite R package (https://github.com/vegawidget/vegabrite)? It looks promising (but it's still somewhat experimental).

SebKrantz · 2023-06-29T07:13:15Z

Thanks, its interesting indeed, especially for interactive visualization in R. However, it imports vegawidget, and through that incurs 34 dependencies. So given that this is experimental and with high dependency count, not really a fastverse candicate. But I agree with you, a lightweight and more performant system for complex graphics in R would be very nice.

tony-aw · 2023-06-29T07:59:47Z

Hi, thank you for your response.
Yes, including recursive dependencies the number is indeed high. Mostly due to the recursive dependencies of dependency htmlwidgets. If only that package could reduce its dependencies....

tony-aw · 2023-10-06T21:05:55Z

By the way, stringr has 7 dependencies, not 3:
cli, glue (≥ 1.6.1), lifecycle (≥ 1.0.3), magrittr, rlang (≥ 1.0.0), stringi (≥ 1.5.3), vctrs.
Why would stringr be in the list, considering it's just a wrapper around stringi thouggh with unnecessary many dependencies?

SebKrantz · 2023-10-23T19:47:36Z

Agreed, it could be removed, I sometimes still use it because of the more convenient API.

tony-aw · 2023-10-23T19:52:28Z

The function names and arguments of 'stringi' and 'stringr' are quite similar, or do you mean something else? Also, sorry if this is a stupid question, but what does the API have got to do with the fastverse? It's about high speed and minimal dependencies, right?

waynelapierre · 2024-01-14T19:04:42Z

Some suggestions:

remove magrittr. it is slow and has been obsolete for a while: https://michaelbarrowman.co.uk/post/the-new-base-pipe/
nowadays very few R users resort to Java for speed
it makes no sense to resort to Julia for speed as it is slow and bloated
many packages are just wrappers of the real fast ones (stringr to stringi, tidytable to data.table, etc.). since this repo is called fastverse not tidyverse, you might want to keep only the real fast ones to avoid confusing readers.

SebKrantz · 2024-01-16T22:38:33Z

Thanks, I have adjusted the README a bit, putting stringr, snakecase and lubridate into the notes below each section. I want to keep magrittr due to the reasons mentioned here. Bindings to faster languages and data.table wrappers were moved to the end of the README.

SebKrantz added a commit that referenced this issue Apr 10, 2023

Add r-polars and Tidier.jl (#18).

290e6c2

SebKrantz added a commit that referenced this issue Apr 10, 2023

Merge pull request #81 from fastverse/development

d949835

Add r-polars and Tidier.jl (#18).

SebKrantz added a commit that referenced this issue Apr 10, 2023

Merge pull request #82 from fastverse/development

4f8b6fa

Add r-polars and Tidier.jl (#18).

SebKrantz added a commit that referenced this issue Jan 16, 2024

Updating README according to some suggesions (#18).

b7719a1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion #18

Suggestion #18

statquant commented Aug 12, 2021 •

edited

SebKrantz commented Aug 12, 2021 •

edited

statquant commented Aug 12, 2021

SebKrantz commented Aug 12, 2021

statquant commented Aug 12, 2021

nickforr commented Aug 12, 2021

SebKrantz commented Aug 12, 2021 •

edited

SebKrantz commented Aug 12, 2021

SebKrantz commented Aug 15, 2021

eddelbuettel commented Aug 15, 2021

BenoitLondon commented Aug 16, 2021

SebKrantz commented Aug 16, 2021 •

edited

SebKrantz commented Aug 16, 2021

BenoitLondon commented Aug 16, 2021 •

edited

SebKrantz commented Aug 16, 2021

BenoitLondon commented Aug 16, 2021

SebKrantz commented Aug 23, 2021

emmansh commented Sep 3, 2021

SebKrantz commented Sep 13, 2021

s3alfisc commented May 21, 2022

SebKrantz commented May 22, 2022

t-wojciech commented Sep 16, 2022

SebKrantz commented Sep 17, 2022

t-wojciech commented Mar 7, 2023

tony-aw commented Jun 25, 2023

SebKrantz commented Jun 29, 2023

tony-aw commented Jun 29, 2023

tony-aw commented Oct 6, 2023

SebKrantz commented Oct 23, 2023

tony-aw commented Oct 23, 2023

waynelapierre commented Jan 14, 2024 •

edited

SebKrantz commented Jan 16, 2024

Suggestion #18

Suggestion #18

Comments

statquant commented Aug 12, 2021 • edited

SebKrantz commented Aug 12, 2021 • edited

statquant commented Aug 12, 2021

SebKrantz commented Aug 12, 2021

statquant commented Aug 12, 2021

nickforr commented Aug 12, 2021

SebKrantz commented Aug 12, 2021 • edited

SebKrantz commented Aug 12, 2021

SebKrantz commented Aug 15, 2021

eddelbuettel commented Aug 15, 2021

BenoitLondon commented Aug 16, 2021

SebKrantz commented Aug 16, 2021 • edited

SebKrantz commented Aug 16, 2021

BenoitLondon commented Aug 16, 2021 • edited

SebKrantz commented Aug 16, 2021

BenoitLondon commented Aug 16, 2021

SebKrantz commented Aug 23, 2021

emmansh commented Sep 3, 2021

SebKrantz commented Sep 13, 2021

s3alfisc commented May 21, 2022

SebKrantz commented May 22, 2022

t-wojciech commented Sep 16, 2022

SebKrantz commented Sep 17, 2022

t-wojciech commented Mar 7, 2023

tony-aw commented Jun 25, 2023

SebKrantz commented Jun 29, 2023

tony-aw commented Jun 29, 2023

tony-aw commented Oct 6, 2023

SebKrantz commented Oct 23, 2023

tony-aw commented Oct 23, 2023

waynelapierre commented Jan 14, 2024 • edited

SebKrantz commented Jan 16, 2024

statquant commented Aug 12, 2021 •

edited

SebKrantz commented Aug 12, 2021 •

edited

SebKrantz commented Aug 12, 2021 •

edited

SebKrantz commented Aug 16, 2021 •

edited

BenoitLondon commented Aug 16, 2021 •

edited

waynelapierre commented Jan 14, 2024 •

edited