Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building a data.table community of packages with “Seal of Approval” #5723

Open
TysonStanley opened this issue Nov 3, 2023 · 19 comments · May be fixed by #6046
Open

Building a data.table community of packages with “Seal of Approval” #5723

TysonStanley opened this issue Nov 3, 2023 · 19 comments · May be fixed by #6046

Comments

@TysonStanley
Copy link
Member

TysonStanley commented Nov 3, 2023

With the goal of building a community of packages that have similar philosophies and syntax that are separate from data.table (and outside of data.table scope #5722), we would like to set up a “Seal of Approval” (play on the mascots of data.table) process. The process for a package receiving the Seal of Approval could be: 

  • The package developer submits a standard application, indicating that their package is on CRAN, is actively maintained, and has at least one of the features listed below.
  • A small number of volunteers from the data.table community review the application to ensure that the criteria are met.  (This includes no testing or vetting of the package functionality itself; only checking that it's an appropriate choice for the Seal.)
  • There is some variety of public community vote to approve the package. 

Approval will include being listed as a Seal of Approval package on the data.table repository and an SVG of the “seal” that they can include on their own repository/package logo. The initial idea would be packages that do at least one of the following:

  • Built on the same principles as data.table (data.table principles #5693)
  • Extends the functionality of data.table to other contexts (e.g., to databases) with similar syntax
  • Uses data.table on the backend of the package

Possible examples of this could include packages that have few dependencies (e.g. tinytest), extend functionality (e.g. dtplyr, tidytable, tidyfast), and packages that use data.table on the backend (e.g. modelsummary).
 
This process would hopefully help other developers feel more connected to data.table and be more likely to want to support it. Things for us to decide on are:

  1. Does this idea resonate with the data.table community?
  2. If so, does the list of criteria make sense for the purposes of the Seal of Approval?
  3. What else should be considered in designing the Seal of Approval?
@jangorecki
Copy link
Member

I believe data.table was made to play nicely with any package, by following many conventions from base R.
Making a "seal approved" may give an impression that some packages works better with data.table, while others don't work well or don't work at all with data.table...

Rather than having community of packages I would prefer to have all packages to be in a community.

@TysonStanley
Copy link
Member Author

Thanks for your feedback on it. I agree that data.table is nicely designed to work well with all sorts of packages (and in ways that are not always obvious!). I don't think our intention would be to say there are certain packages that work best with data.table and others don't. The goal would be to help build the community around data.table. This was just one idea of how we could engage more R users and get them into the data.table repository more. We would also hope that it would spawn more ideas of how to use data.table with other packages, across more situations. The documentation (and other resources) are vast on data.table but I think there is still a lot of users that don't find it (and how to use it) early enough.

Do you have other suggestions on how to make entry into data.table use and development easier?

@AngelFelizR

This comment was marked as off-topic.

@MichaelChirico
Copy link
Member

Is there anything besides an approval process {data.table} maintainers would be committing to as part of this?

Would the approval be granted in perpetuity / renewed regularly / granted with possible revocation under "certain circumstances" (which)?

@jangorecki

This comment was marked as off-topic.

@AngelFelizR

This comment was marked as off-topic.

@jangorecki

This comment was marked as off-topic.

@MichaelChirico
Copy link
Member

MichaelChirico commented Nov 4, 2023

@AngelFelizR The r-contributors slack (r-contributors.slack.com) hosted a book club on learning C for R users last year:

https://github.com/r-devel/c-book-club/

I believe there are videos still available; try asking in the #book-club-modern-c channel there. Otherwise expressing new interest is a way to get the book club running a second time (others have already inquired).


As for data.table's own C code, I think the most straightforward stuff would be:

  • cj.c
  • coalesce.c
  • fifelse.c
  • idatetime.c
  • nafill.c
  • rbindlist.c
  • transpose.c

I quite like recent improvements to GitHub's in-browser code-reading experience BTW, you can click through on function calls to find their definition / where symbols are defined / hover-over for their types.


Lastly, keep in mind that there's a ton of R code in data.table to improve as well! Over 8,000 lines already.

@AngelFelizR
Copy link

@MichaelChirico, thank you for your advice. I hope to contribute C code in the long term to continue progressing this amazing project.

I want to be prepared to the point where we can task moving data.table to work with data on disk.

I am here because the data.table survey asked if I wanted to contribute, so I started reading the issues.

@tdhock
Copy link
Member

tdhock commented Nov 6, 2023

another way to contribute, even without knowing much about how data.table works (in C or otherwise), is to look at the open issues, and try to see if you can reproduce a bug report, then add a comment on the issue that explains what you did and whether or not the issue is reproducible. (and if you can make a simpler example than what is reported, that is even better)

@tdhock
Copy link
Member

tdhock commented Mar 28, 2024

To make this a concrete proposal:

  1. we will add a section in README.md entitled "Seal of Approval" with a brief statement explaining that this is a list of packages which are built using data.table, etc.
  2. the approval process is the same as for anything. If you want to list your package under Seal of Approval, submit a PR that changes README.md (add your package to the list). Make sure the change has a link to your package web page, and also a brief description of how data.table is used, and what new/interesting/unique functionality your package provides.
  3. The data.table maintainers will judge submissions based on relevance -- does this package provide a new/interesting/unique functionality beyond what is provided by data.table? Also I believe Seal of Approval packages should be outside of data.table scope. A data.table maintainer will merge the PR if there is consensus, just like for any other PR.

For example I have been developing https://cran.r-project.org/package=nc which provides named capture regex functionality, using and outputting data.tables, and I would like that package for inclusion under Seal of Approval.

Another example would be the mlr3 packages which are built using data.table.

I see the Seal of Approval as a way of building community, by increasing awareness about how widely-used data.table is among other R packages.

@TysonStanley
Copy link
Member Author

I think this will ultimately be a pretty low lift while allowing more public connections to the community.

@tdhock
Copy link
Member

tdhock commented Mar 29, 2024

@tdhock
Copy link
Member

tdhock commented Mar 29, 2024

glad to see some positive feedback to my proposal.
also would be cool to have some logo with a sea lion giving a thumbs up, does anybody have graphics/art skills? @MaraDestefanis ?
My vision is that the README.md should have a one-line mention of the package, with a link to a blog post on https://rdatatable-community.github.io/The-Raft/ which gives further details. So that would entail a little extra work for the package author: writing that blog post. (But no extra work for data.table devs, who just review the PR with a change to README.md)

@MaraDestefanis
Copy link

Hey @tdhock I'm stepping in and I can give the logo a shot. It would be great to have it in high definition, if possible. Can you provide that? Also, is there anything else you need from me for the blog?

@tdhock
Copy link
Member

tdhock commented Apr 2, 2024

Hi Mara the existing logo graphics files are in https://github.com/Rdatatable/data.table/tree/master/.graphics, is that high enough definition?

@MaraDestefanis
Copy link

MaraDestefanis commented Apr 3, 2024 via email

@tdhock tdhock linked a pull request Apr 3, 2024 that will close this issue
@MaraDestefanis
Copy link

MaraDestefanis commented Apr 11, 2024 via email

@kbodwin
Copy link

kbodwin commented May 7, 2024

Hi all,

I'm reaching out with a couple ideas/options for the Seal of Approval process, to see if we can find one that everyone agrees on.

On this repository:

@jangorecki expressed some concern with a list on this repository's ReadMe, because it implies some kind of closed community, instead of data.table being accessible to anyone.

I propose that we simply add a Seal-of-Approval.md file in this repo that contains a simple list of packages that have gotten approval. Then, we can link to this md at the bottom the ReadMe, in the Community section, and reserve all additional details for blog posts on the raft instead of them clogging up this repo.

Approval process:

At first, @TysonStanley had suggested that approval was initialized with a PR to this repo, and @MichaelChirico was wondering about the expectations from maintainers for reviewing.

I want to suggest a reverse-order:

  • Folks seeking Seal of Approval would submit by PR to The Raft.
  • Here is a sample draft for what would be submitted.
  • I would act as a first screen, and then if the proposal seems reasonable, I would PR this repo with the relevant information.
  • I could also, if desired, tag reviewer(s) in my PR.

This would be kind of a mini "journal-style" process that would maybe take some of the burden off the maintainers.

Longevity:

Michael also asked about whether this approval is granted in perpetuity or not. I think just workload wise, we wouldn't commit to periodic re-reviews. However, if someone were to alert us to an issue with a package - say, it's no longer actively maintained - we'd take it off the list at maintainer's discretion.

Type of SoA Packages:

I've come up with four types of packages that might merit approval; in principle, a submitter would have to justify the package falling in one or more of these categories. I'd love feedback if anything seems amiss:

  • An extension package: Adds to the internal functionality of data.table

  • An application package: Uses data.table to accomplish a particular task or analysis.

  • A bridge package: Translates data.table syntax to different syntax or provides helper functions for transitioning between data.table and another object type.

  • A partner package: Not necessarily directly connected to data.table, but deliberately follows the core philosophies of data.table.


So, tl;dr, in this proposal:

  • Someone submits info on their package to The Raft
  • I review it, then PR this repo adding the package name to a simple .md list, and supply relevant info for maintainters
  • Maintainers merge PR at their discretion
  • Package is then published as a blog post to The Raft

Let me know if this sounds workable to you, or if you have other suggestions! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants