Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Bump minimum needed R version to R 4.0 across the ecosystem? #401

Closed
IndrajeetPatil opened this issue May 11, 2024 · 43 comments · Fixed by #408
Closed

Discussion: Bump minimum needed R version to R 4.0 across the ecosystem? #401

IndrajeetPatil opened this issue May 11, 2024 · 43 comments · Fixed by #408

Comments

@IndrajeetPatil
Copy link
Member

IndrajeetPatil commented May 11, 2024

I think this will be a good idea for a few reasons:

  • It has been 5 years since R 3.6 came out (2019-04-26); that's enough time for people to have upgraded to a newer R version.

  • An increasing number of dependencies are using the base pipe (introduced in R 4.0), which means the new versions of dependencies can't even be installed on R 3.6. If you haven't noticed that, it's because I have been carving out exceptions for them in the workflows.

  • Very few of our test suites pass on R 3.6 (e.g. datawizard fails), and some don't even run (e.g. see because the graphics engine for R 3.6 is not supported by vdiffr).

  • It's holding us back from using the base pipe, which now both us (as developers) and the users have got used to using.

All of this combined makes it quite a hurdle to keep supporting this R version. Even the tidyverse no longer supports it:

image
@bwiernik
Copy link
Contributor

Let's do it!

@vincentarelbundock
Copy link

Frankly, I don't see the benefit. Only Suggests packages require it, and carve outs for those are easy.

Over the past few months, I've heard from several users still on 3.6 because of (bad) institutional policies and it would be a shame that leave them behind.

If there was a compelling computational or language reason, sure. But we can use base pipe in the website vignettes, without enforcing version 4.

That said, my view on this is not deeply held. It's not a super big deal, and I will of course support the majority.

@mattansb
Copy link
Member

Generally I am pro native pipe (:

But

  • The tidyverse still does support 3.6 (dplyr even supports 3.5).

  • How would reverse depends/suggests/imports be affected? We want to allow devs to continuously rely on us...

@vincentarelbundock
Copy link

  • How would reverse depends/suggests/imports be affected? We want to allow devs to continuously rely on us...

This would break marginaleffects and at least 3 or 4 packages that rely on marginaleffects, forcing several developers to update their packages.

Many of the packages that depend on insight allow earlier versions, and they would all need to be updated by their maintainers.

I would hate to ask people to work on this without giving them any tangible benefit.

Yeah, I think this is not a great move.

@IndrajeetPatil
Copy link
Member Author

The tidyverse still does support 3.6 (dplyr even supports 3.5).

Well, they don't check on those R versions in their CI. So, for all intent and purposes, they don't support it. They are just waiting for someone to complain (cf. tidyverse/purrr#1045) before the version is bumped.

That's not the user experience I want our users to have. This is why I have still retained R 3.6 checks in our CI. Without checking that the package works and tests pass in CI, we might as well support R 2.0 🤷

We want to allow devs to continuously rely on us

That's not a compelling enough reason for me. There is too much lethargy among developers to bump R version because some cluster on some university's server might still be using it; guess what, they might still be using R 3.0.

Those users can continue to use the older versions of our packages from the archive. And waiting for 5 years is not being too aggressive. Matrix already has bumped to R > 4.4, emmeans to R > 4.1, etc. We have been quite conservative in this regard.


That said, if you want to continue to support R 3.6, then you should also make sure that all tests pass on this R version in our CI. Otherwise, we are just making empty promises.

@IndrajeetPatil
Copy link
Member Author

This would break marginaleffects and at least 3 or 4 packages that rely on marginaleffects, forcing several developers to update their packages.

@vincentarelbundock Can you clarify what do you mean by "break" here? Because CRAN doesn't check packages on R 3.6.

@vincentarelbundock
Copy link

Matrix is not bumped to 4.4. see the thread on R-devel. The number indicated on CRAN website is the version of R in which Matrix was compiled.

@vincentarelbundock
Copy link

@vincentarelbundock Can you clarify what do you mean by "break" here? Because CRAN doesn't check packages on R 3.6.

I mean that developers who import an easystats package will no long be able to support the users they want to support, and will have to release a new package indicating new version requirements.

@IndrajeetPatil
Copy link
Member Author

Matrix is not bumped to 4.4. see the thread on R-devel. The number indicated on CRAN website is the version of R in which Matrix was compiled.

Are you sure? Because the entire CI of easystats is currently in turmoil because of this issue:

  ! Could not solve package dependencies:
  * deps::.: Can't install dependency BayesFactor
  * BayesFactor: Can't install dependency MatrixModels
  * MatrixModels: Can't install dependency Matrix (>= 1.6-0)
  * Matrix: Needs R >= 4.5
  * Matrix: Needs R >= 4.4.0
Screenshot 2024-05-11 at 19 33 27

@vincentarelbundock
Copy link

See this thread and in particular MM's post (but others too): https://stat.ethz.ch/pipermail/r-devel/2024-April/083377.html

@IndrajeetPatil
Copy link
Member Author

I mean that developers who import an easystats package will no long be able to support the users they want to support, and will have to release a new package indicating new version requirements.

Yes, but what if these developers wish to continue to support this user base for the next 5 years? How long are we supposed to indulge them? What about the maintenance cost and burden involved in supporting such legacy versions that probably not even 1% of the entire user base is still on?

We are basically condemning ourselves to not benefit from any improvements in R for at least 5–7 years at minimum.

@IndrajeetPatil
Copy link
Member Author

I would hate to ask people to work on this without giving them any tangible benefit.

The tangible benefit are improvements in syntax in the language itself, not having to backport newer functions and so reduced maintenance overhead, improvements in performance, etc.

@IndrajeetPatil
Copy link
Member Author

This is exactly why I wanted us as an organization to come up with a policy for R version support, which we still haven't done. This means I always have to be the bad cop and bring up this topic, and I need to because I am the one primarily maintaining the CI infrastructure.

#295

If we come up with some policy and assure some guarantees to developers using easystats about for how long we will support some R versions, we need not go through this routine every six months.

@vincentarelbundock
Copy link

My argument is that there have been no "must have" feature added to R in a loooong time.

My view is that supporting old versions is essentially costless, and that we should keep supporting them "forever", or until a really compelling new feature is added to R that truly makes developers' lives considerably easier. I don't see any such new feature in the last 10 years or so.

(Again, we can use base pipe on the website.)

You ask "why shouldn't we support even older versions?" I answer "no" because I don't want to do that work. But status quo is free.

I really didn't mean for this to devolve into a long debate, so will let others discuss and decide.

@IndrajeetPatil
Copy link
Member Author

My view is that supporting old versions is essentially costless

No, it's not.

The longer we go on like this, the more and more number of R versions we need to check:

          - { os: ubuntu-latest, r: "devel" }
          - { os: ubuntu-latest, r: "release" }
          - { os: ubuntu-latest, r: "oldrel-1" }
          - { os: ubuntu-latest, r: "oldrel-2" }
          - { os: ubuntu-latest, r: "oldrel-3" }
          - ... # other R versions
          - { os: ubuntu-latest, r: "3.6" }

It is an enormous effort to make sure ten different R packages continue to work on legacy R versions. This is because of the breaking changes that are introduced in every major or minor release (e.g., change in serialization format, change in random number generation, stringAsFactors behaviour, etc.), and tests need to be adjusted for the before and after behaviour, or they at least need to be skipped appropriately. This is a ton of work.

@IndrajeetPatil
Copy link
Member Author

IndrajeetPatil commented May 11, 2024

Yeah, I am convinced it's a bad idea to continue to support R < 4.0, and manually maintain this freakshow for years to come:

      - uses: r-lib/actions/setup-r-dependencies@v2
        with:
          pak-version: devel
          upgrade: "TRUE"
          cache-version: 8
          extra-packages: |
            any::rcmdcheck
            any::BH
            any::RcppEigen
            BayesFactor=?ignore-before-r=100.0.0
            car=?ignore-before-r=100.0.0
            Matrix=?ignore-before-r=100.0.0
            MatrixModels=?ignore-before-r=100.0.0
            lme4=?ignore-before-r=100.0.0
            quantreg=?ignore-before-r=100.0.0
            TMB=?ignore-before-r=100.0.0
            ivprobit=?ignore-before-r=100.0.0
            mhurdle=?ignore-before-r=100.0.0
            brms=?ignore-before-r=4.3.0
            estimability=?ignore-before-r=4.3.0
            effects=?ignore-before-r=4.3.0
            nestedLogit=?ignore-before-r=4.3.0
            FactoMineR=?ignore-before-r=4.3.0
            factoextra=?ignore-before-r=4.3.0
            emmeans=?ignore-before-r=4.3.0
            bayesQR=?ignore-before-r=4.2.0
            MuMIn=?ignore-before-r=4.2.0
            ape=?ignore-before-r=4.1.0
            car=?ignore-before-r=4.1.0
            drc=?ignore-before-r=4.1.0
            EGAnet=?ignore-before-r=4.1.0
            ggpubr=?ignore-before-r=4.1.0
            Hmisc=?ignore-before-r=4.1.0
            mediation=?ignore-before-r=4.1.0
            rstatix=?ignore-before-r=4.1.0
            PROreg=?ignore-before-r=4.1.0
            rmcorr=?ignore-before-r=4.1.0
            rms=?ignore-before-r=4.1.0
            randomForest=?ignore-before-r=4.1.0
            pbkrtest=?ignore-before-r=4.1.0
            afex=?ignore-before-r=4.1.0
            car=?ignore-before-r=4.1.0
            ICS=?ignore-before-r=4.1.0
            ivreg=?ignore-before-r=4.1.0
            AER=?ignore-before-r=4.1.0
            WRS2=?ignore-before-r=4.1.0
            tinytable=?ignore-before-r=4.1.0
            survey=?ignore-before-r=4.1.0
            epiR=?ignore-before-r=4.0.0
            energy=?ignore-before-r=4.0.0
            gam=?ignore-before-r=4.0.0
            gdtools=?ignore-before-r=4.0.0
            flextable=?ignore-before-r=4.0.0
            ftExtra=?ignore-before-r=4.0.0
            gsl=?ignore-before-r=4.0.0
            fastICA=?ignore-before-r=4.0.0
            metafor=?ignore-before-r=4.0.0
            metadat=?ignore-before-r=4.0.0
            metaplus=?ignore-before-r=4.0.0
            multgee=?ignore-before-r=4.0.0
            panelr=?ignore-before-r=4.0.0
            ICSOutlier=?ignore-before-r=4.0.0
            multimode=?ignore-before-r=4.0.0
            jtools=?ignore-before-r=4.0.0
            mmrm=?ignore-before-r=4.0.0
            rsvd=?ignore-before-r=4.0.0
            sparsepca=?ignore-before-r=4.0.0
            qqconf=?ignore-before-r=4.0.0
            qqplotr=?ignore-before-r=4.0.0
            rtdists=?ignore-before-r=4.0.0
            VGAM=?ignore-before-r=4.0.0
            ggside=?ignore-before-r=4.0.0
          needs: check

@mattansb
Copy link
Member

My view is that supporting old versions is essentially costless

No, it's not.

I agree with this.

I still think we should give a proper heads-up to maintainers of packages that use easystats about this change.

(FWIW @IndrajeetPatil I don't think you're the bad cop - you're the righteous cop!)

@IndrajeetPatil
Copy link
Member Author

I still think we should give a proper heads-up to maintainers of packages that use easystats about this change.

Having a stated policy about supported R versions on the website is the heads-up. This is also what tidyverse does: they don't email every maintainer before they bump R versions; they just need to respect their policy.

@mattansb
Copy link
Member

🤷‍♂️

Let's do then!

@IndrajeetPatil
Copy link
Member Author

We're trying to do it since Sept'22 😭

#295

@bwiernik
Copy link
Contributor

bwiernik commented May 12, 2024

I agree. For precedent, tidyverse's declared policy is also to support the current version and 4 previous versions https://www.tidyverse.org/blog/2019/04/r-version-support/

So that would be currently back to R 4.0. While earlier versions may be supported on individual packages, they make no attempt to maintain support there and will bump the version up if an older version breaks anything.

@etiennebacher
Copy link
Member

etiennebacher commented May 12, 2024

An increasing number of dependencies are using the base pipe (introduced in R 4.0)

I didn't read the whole thread yet, but I just want to point out that the base pipe was introduced in 4.1.0, not 4.0.0, so bumping the requirement to 4.0.0 wouldn't enable using pipe in the source code. (Search the sentence "R now provides a simple native forward pipe syntax" in the NEWS: https://cran.r-project.org/doc/manuals/r-release/NEWS.html)

@IndrajeetPatil
Copy link
Member Author

4.0. While earlier versions may be supported on individual packages, they make no attempt to maintain support there and will bump the version up if an older version breaks anything.

And that's exactly the part that I don't agree with because we are offloading what should be the developer responsibility (making sure things work for all supported on R versions) to users (and not just other developers) who need to go through the painful process of debugging why things are no longer working (e.g. tidyverse/purrr#1045, r-lib/rcmdcheck#220, etc.).

@DominiqueMakowski
Copy link
Member

I'm pro bump

@strengejacke
Copy link
Member

Not quite related, but a bit... I think we should stop requiring the very latest versions of packages in our DESCRIPTION files, at least for other packages. If we really need to test on new changes in recent updates of other packages, we can use testthat::skip_in_not_installed(minimum_version). The casual R user does not update packages once a week, not even once a month (and even rarer they update R itself!). This regularly causes issues when teaching R and using the easystats eco-system.

E.g., the latest see update nearly took half an hour to fix installation issues (scales 1.3.0 was required, but despite R 4.1 being installed, didn't want to update - package version not available). This is really somewhat annoying...

@strengejacke
Copy link
Member

On topic: we could still depend on R 3.6, but show a startup message on not-supported R versions.
Like, "this package probably can be loaded in R 3.6, but we can only test functionality down to R 4.0, so you may run into problems. It is recommended to update R to 4.0 or higher". If we show this msg once a session or on every startup, it doesn't require us to care about older R versions, and the users have an idea why something probably is not working.

@IndrajeetPatil
Copy link
Member Author

This is a completely separate issue, and has nothing to with R upgrade.

We literally have one external hard dependency for which we specify the minimum needed R version: ggplot2. And, sure, we can drop package requirements.

Are you installing source packages? If you use RSPM binaries, typically available in a couple of days after CRAN update, package installation tends to be quite rapid.

All other package version requirements come via our own packages, which is going to be the case until we have 1.0 releases. After that, we will no longer need to ask users to install the latest versions because the API is guaranteed to be stable.

@strengejacke
Copy link
Member

We literally have one external hard dependency for which we specify the minimum needed R version: ggplot2. And, sure, we can drop package requirements.

Yes, but even only this one makes easystats::install_latest() fail - so you can't keep packages up to date (and you can't install see, which is definitely an important packages when it comes to visualization).

Are you installing source packages?

Just RStudio on Windows. scales won't update, despite being quite a long time on CRAN, I think.

@IndrajeetPatil
Copy link
Member Author

Set this to https://packagemanager.posit.co/cran/latest, and see how quick the installation process is.

Screenshot 2024-05-13 at 09 57 36

This is also what we do in our GHA workflows: https://github.com/easystats/workflows/blob/c45e44ee6990211411d8d3ba489d5a2c9f6184b8/.github/workflows/R-CMD-check.yaml#L57

@strengejacke
Copy link
Member

Ah, cool, didn't know that!

@strengejacke
Copy link
Member

On topic: we could still depend on R 3.6, but show a startup message on not-supported R versions. Like, "this package probably can be loaded in R 3.6, but we can only test functionality down to R 4.0, so you may run into problems. It is recommended to update R to 4.0 or higher". If we show this msg once a session or on every startup, it doesn't require us to care about older R versions, and the users have an idea why something probably is not working.

Wanted to raise attention to my suggestion again...

@IndrajeetPatil
Copy link
Member Author

we could still depend on R 3.6, but show a startup message on not-supported R versions

I beg to differ.

As a user, I would be way more pissed if a package did this to me because this message implies foresight and negligence on the author's side. You intentionally let me download something that you know may not be working as expected. I'd much rather the author gatekeep me into using only package versions that are known to work with those R versions.

Note

This is how I feel. But the majority wins, so if others feel strongly about continuing to support R 3.6, I am happy to bite the bullet.


P.S. For the sake of the Socratic debate, why stop at R 3.6? This could also work with any historical R version, no? E.g.

"this package probably can be loaded in R 3.0, but we can only test functionality down to R 4.0, so you may run into problems. It is recommended to update R to 4.0 or higher"

Our packages have never failed to install on older R versions; only the tests didn't pass. So we could set the Depends field in DESCRIPTIO to an even older R version.

@mattansb
Copy link
Member

I'm pro 4.0. Got to get with with times...

@IndrajeetPatil IndrajeetPatil changed the title Bump minimum needed R version to R 4.0 across the ecosystem? Discussion: Bump minimum needed R version to R 4.0 across the ecosystem? May 16, 2024
@IndrajeetPatil
Copy link
Member Author

Closing in favour of #405

@strengejacke
Copy link
Member

Just a short comment:

Even the tidyverse no longer supports it:
(#401 (comment))

On topic: we could still depend on R 3.6, but show a startup message on not-supported R versions.
Like, "this package probably can be loaded in R 3.6, but we can only test functionality down to R 4.0, so you may run into problems. It is recommended to update R to 4.0 or higher". If we show this msg once a session or on every startup, it doesn't require us to care about older R versions, and the users have an idea why something probably is not working.
(#401 (comment))

I don't think this is a contradiction:

dplyr: R >= 3.5 (!)
tibble: R >= 3.4 (!)
...

and so on.

If we're referring to the tidyverse rules (which we do not in our policy document, for reasons...), we should probably follow my suggestion. :-)

@strengejacke
Copy link
Member

@IndrajeetPatil
Copy link
Member Author

Thank, yes, I am also waiting for my LinkedIn poll to conclude in two days, and then we can compare notes.

This is why re-opened this issue to make a more data-informed decision.

@bwiernik
Copy link
Contributor

I think a reasonable middle choice would be to put hard requirements on for most of the packages ("we support these versions and provide a strong guarantee that they will work) but be more flexible with insight because of its use as a foundation by other packages (we officially support versions >= 4 but will allow install/import on older versions without guarantees, and will up the stated minimum version when we become aware of a conflict rather than fixing it)

@strengejacke
Copy link
Member

but be more flexible with insight because of its use as a foundation by other packages

What would be sufficient to qualify as "foundation by other packages"? parameters and performance are imported (or suggested) by quite a lot of (prominent) packages outside our easyverse.

@strengejacke
Copy link
Member

Polls from Twitter and LinkedIn suggest ~14% to 23% of R users probably have to use R 3.x

Conservative estimate maybe 5-10%. Or the opposite, we have more technically affine users on social media, so actually proportion could be up to 25% (though I doubt that)

@IndrajeetPatil
Copy link
Member Author

There are some new data points that prompted me to re-open this issue and that, I think, should lead us to revise our R version support policy.

Screenshot 2024-05-22 at 09 58 47

This makes me quite comfortable about having dropped the R 3.5 support since the overwhelming majority of our users are going to be on RStudio (as opposed to VS Code, PyCharm, etc.).

  • At least based on an informal (and maybe even unrepresentative) poll on LinkedIn, the percentage of users on R 3.6 (or below) is above 5% (which would have been my personal threshold to cut off support).
Screenshot 2024-05-22 at 09 59 04

Given these updates, I would prefer if we change our policy to support at least five previous R versions; in other words, we can remove support for R 3.6 after R 4.5 is out. But this also means that each maintainer needs to fix the builds for R 3.6.

An interesting suggestion I received on LinkedIn was the following:

Its [sic] a good idea to support prev major (3.x) till current major (4.x) reaches atleast 4.5

I am not sure how regular R is in terms of its release frequency to turn this suggestion into a version support document, but I found it to be interesting nonetheless.

@strengejacke
Copy link
Member

I'm fine with both, testing latest 4 releases, depends >= 3.6 (but not reliably tested); or test versions to R 3.6+. I think that quite some substantial part of the work is up to you, Indra, namely making the GitHub Actions work in order we can test for R 3.6. Thus, I would let you decide which way to go.

@IndrajeetPatil
Copy link
Member Author

I would prefer to continue to support this version until R 4.5 (or whatever the next minor version will be) is out.

@etiennebacher Can you please update the version support document to reflect that we support the last five versions, and not four? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment