Dependency management (file requirements.txt and friends) is far too lenient: Too old versions and incompatible new versions. #3734

aknrdureegaesr · 2024-01-10T16:59:15Z

Environment

Python Version: 3.9

Nikola Version: Git b96f05a

Operating System: Debian Bullseye

Description:

The present Nikola dependency version requirements are far too lenient.

I wish to raise two separate problems with the present files requirements*.txt files:

Allowing too new versions

Nikola allows very new versions of its dependencies. When some library introduces incompatible changes and upgrades to a new major version (in full compliance with semver), any unsuspecting user who does pip install Nikola[extras] might be the first person in the world that actually mixes that version with Nikola. And it may break.

In such a scenario, I see no fault with our dependency, nor with the user, so the fault lies with Nikola.

Some moral equivalent of this happened to me, when a pull request of mine collided with a new, incompatible version of lxml that happened to come out between the time when I had set up my venv and the time that pull request was tested, giving rise to #3732.

Allowing too old versions

I fiddled a bit to find the oldest versions of all Nikola dependencies that would install on my machine and still satisfy all Nikola dependencies. The result is requirements-also_doesntwork.txt. Running pytest with those ancient packages installed fails.

Analyzing that particular failure, I found that a requirement Pygments>=1.6 had been added to requirements.txt with git eab48d8 on 2014-07-17. Later, 2020-03-19 (git 7b792fe) and again 2022-04-24 (7e2fd4f), code was added that uses pygments.formatters._formatter_cache. That attribute does not yet exist in Pygments 1.6. So ever since then, Nikola was boasting to be able to run with that old Pygments version while in fact it could not. I believe this to be just a tip of an iceberg and there are many more compatibility problems of this sort hiding in the code.

The text was updated successfully, but these errors were encountered:

Kwpolska · 2024-01-10T19:28:39Z

I agree that the minimum dependencies listed in our requirements.txt file may be too old, and it would be a good idea to update the minimum versions there — perhaps to something at most a year old, or perhaps to the most recent major version (while verifying this specific setup works).

On the other hand, the maximum dependency versions should not be limited, unless the package has a track record of breaking backwards compatibility. There should be no pinning, especially in requirements.txt (unless the package’s maintainers break backwards compatibility very often, in which case we should reconsider the dependency).

The problem with dependencies in Python is caused by Python’s package management being awful, 14 competing standards and all. Python ties its packages to an environment — which might be system-wide (including user packages, but that doesn’t matter) or virtual. There are three main ways in which someone may install Nikola: in a venv, system/user-wide with pip, or using their system package manager. If someone is using a venv exclusively to run Nikola, everything would be fine, and we could be using pinned dependencies or very narrow ranges (maintenance burden notwithstanding). But if someone is using a system-wide install, we could cause total breakage if we demand very specific versions. Things are even worse if you include distro packages into the mix: Nikola is packaged in some distros, like Fedora, Arch Linux, or Gentoo. The way Python works means there is only one version of Pygments in the distro repositories. This one version in Arch Linux’s repos must work not only for Nikola, but also for 84 other things (and that doesn’t include transitive dependencies). Some distros are more eager to upgrade things, while others are more conservative, so we can’t just support the latest version — we must have some leeway.

Package versioning is hard, and different entities have different approaches to backwards compatibility. Microsoft will fight tooth and nail to keep things compatible (I have done some cursed package combinations in the .NET land, mixing 5-year-old versions with the latest-and-greatest), but CPython is eager to break stuff (#3719 is one example, but many things are removed with a 2-year warning). The packages Nikola depends on have many different maintainers with different views on backwards compatibility. Some of them follow semver, some follow a scheme detached from backwards compatibility, and some of them don’t have a predictable version numbering scheme. And even then, if we say >=1.2.0, <1.3.0, things can still break if the maintainer introduces a regression. The only predictable thing would be to pin specific versions, but that would get us removed from Linux distros and would make us break people’s systems. We instead choose to hope the latest version doesn’t break — and it often doesn’t, while also making it more likely for us to have support for new Python versions without requiring a new Nikola release.

To sum up: some things can certainly be improved, and the versions in requirements.txt are a lie that needs fixing. But we can’t pin dependencies, use very tight dependency ranges, or use <= unless we absolutely need it, because that would cause more trouble for our users and downstreams.

aknrdureegaesr · 2024-01-10T22:34:12Z

I have no "magic bullet" for the upper version limit problem.

My personal intuition would be: Let us trust everybody is using semantic versioning. That means, "pin to the same major version we tested". We also could have something automatic that alerts us of stuff releasing a new major version, so we can test that before relaxing our requirements to accept it.

So far the theory. In practice, semantic versioning is largely not being followed in the Python world. 23% of all of our dependencies and transitive dependencies sport a version number of 0.something. If you actually read semver 2.0.0 (rule 4.), that translates to "this package does not claim to have any stable API yet".

So far for the bad news. For the good news, at least I have some idea for the other end of the problem.

We could generate a requirement-oldest.txt, hopefully on the fly as part of our automatic tests. That file has a compilation of all oldest versions we think should still work. (The idea is, it is very simply generated by string replacement: Replace all >= with ==.) Then, in one automatic test run, we combine into one Python installations all those oldest versions - and, of course, see whether our tests still run. If not, some new code has introduced what you call a "lie", and we need to tighten our lower bound on some dependency.

Does that sound like something you'd merge if I managed to code it?

Kwpolska · 2024-01-10T22:52:49Z

I would accept testing of requirements-oldest.txt on the oldest supported Python version on Linux (the newest one is unlikely to work with it). I would also accept pinning all dependencies dependencies to <= the latest major version as of ~2 months ago, or the specific version that was current ~12 months ago (with no >= anywhere).

felixfontein · 2024-01-11T06:56:00Z

I don't like upper version pins (unless needed for explicitly known incompatibilities), I've seen them cause problems with dependent projects and users too often...

If we do upper version pins, we really need test infrastructure that keeps track of new major releases and runs tests with these.

aknrdureegaesr · 2024-01-11T09:15:09Z

"with no >= anywhere" Why not, Chris?

I thought rather to the opposite: Judiciously ruling out ancient version seems to be the no-brainer here, though some effort in practice. It's the future versions that is the difficult part, even conceptually. Or at least that's what I thought.

Not ruling out ancient versions essentially claims, e.g., "Nikola is fine with any version of Pygments that ever existed." Why would we want to claim that if we know it is not true?

aknrdureegaesr · 2024-01-11T14:44:02Z

Fortunately, Pypi is quite scriptable / can be scraped easily. Here is a list of all versions of the primary requirements of Nokia currently available on Pypi:

pypi_requirements_available_versions.json

Kwpolska · 2024-01-11T20:05:33Z

"with no >= anywhere" Why not, Chris?

I might have messed up the equality signs. I’m against pinning the maximum/upper versions.

aknrdureegaesr · 2024-01-15T15:21:11Z

During the last days, my PC has tried many combinations of dependency versions to see whether the Nikola tests still run with those or don't. "Experimental computer science." It is not done with that yet. But the oldest requirements known to me that make the tests pass are (output of pip freeze):

aiohttp==3.8.6
aiosignal==1.3.1
argh==0.31.0
asttokens==2.4.1
async-timeout==4.0.3
atomicwrites==1.4.1
attrs==23.2.0
Babel==2.12.0
beautifulsoup4==4.12.2
bleach==6.1.0
blinker==1.3
certifi==2023.11.17
charset-normalizer==3.3.2
cloudpickle==3.0.0
comm==0.2.1
coverage==7.4.0
debugpy==1.8.0
decorator==5.1.1
defusedxml==0.7.1
docutils==0.19
doit==0.33.1
exceptiongroup==1.2.0
executing==2.0.1
fastjsonschema==2.19.1
feedparser==6.0.2
flake8==7.0.0
freezegun==0.2.5
frozenlist==1.4.1
ghp-import==0.2.0
hsluv==5.0.0
html5lib==0.2
idna==3.6
importlib-metadata==7.0.1
ipykernel==6.25.2
ipython==8.18.1
ipython-genutils==0.2.0
jedi==0.19.1
Jinja2==3.1.0
jsonschema==4.20.0
jsonschema-specifications==2023.12.1
jupyter-client==8.6.0
jupyter-core==5.7.1
jupyterlab-pygments==0.3.0
lxml==4.5.2
Mako==1.0.9
Markdown==3.0
MarkupSafe==2.1.3
matplotlib-inline==0.1.6
mccabe==0.7.0
micawber==0.2.2
mistune==3.0.2
more-itertools==10.2.0
multidict==6.0.4
natsort==4.0.0
nbclient==0.9.0
nbconvert==7.14.1
nbformat==5.9.2
nest-asyncio==1.5.9
notebook==6.0.0
packaging==23.2
pandocfilters==1.5.0
parso==0.8.3
pathtools==0.1.2
pexpect==4.9.0
phpserialize==1.3
piexif==1.0.0
Pillow==9.2.0
platformdirs==4.1.0
pluggy==1.3.0
prometheus-client==0.19.0
prompt-toolkit==3.0.43
psutil==5.9.7
ptyprocess==0.7.0
pure-eval==0.2.2
py==1.11.0
pycodestyle==2.11.1
pyflakes==3.2.0
pygal==2.0.11
Pygments==2.12.0
pyinotify==0.9.6
Pyphen==0.8
PyRSS2Gen==1.1
pytest==4.3.0
pytest-cov==2.9.0
python-dateutil==2.8.2
PyYAML==6.0.1
pyzmq==25.1.2
referencing==0.32.1
requests==2.31.0
rpds-py==0.17.1
ruamel.yaml==0.15.98
Send2Trash==1.8.2
sgmllib3k==1.0.0
six==1.16.0
smartypants==2.0.1
soupsieve==2.5
stack-data==0.6.3
terminado==0.18.0
tinycss2==1.2.1
toml==0.7.1
tornado==6.4
traitlets==5.14.1
typing-extensions==4.9.0
typogrify==2.0.2
Unidecode==0.4.20
urllib3==2.1.0
watchdog==0.7.1
wcwidth==0.2.13
webencodings==0.5.1
yarl==1.9.4
zipp==3.17.0

This is the tests without coverage reports (to speed up things).

aknrdureegaesr · 2024-01-16T11:53:23Z

After #3744 has been merged, it is possible to create one version of requirements-oldest.txt by simply replacing all >= with == in the three requirement files. This is not at all concerned about age of those versions that are included. My script has just searched for any old versions that still make the tests succeed.

Are you interested in the script itself, or only in the result (the latter being in #3744)?

aknrdureegaesr · 2024-01-16T11:58:10Z

(The script is far from perfect. There could be even older versions that, too, make the tests succeed. In particular, if several of our dependencies depend on certain versions of each other without announcing that fact to pip.)

Kwpolska · 2024-01-16T22:48:41Z

You can post the script in this issue for future reference, we probably don’t need to add it to the repository.

aknrdureegaesr · 2024-01-17T17:13:01Z

find_oldest_working_dependencies.zip actually has two scripts.

The first goes to Pypi and fetches all versions of our dependencies. The result is a JSON file.
The second tries to decrease version numbers and runs tests, to find out what the lowest versions is that work. It needs no dependencies itself, just plain Python (tested with 3.9).

The scripts want to sit in a subdirectory of scripts.

Deficiencies of the second script:

It will not terminate even if the algorithm cannot find any lower working versions any more (endless loop).
It writes a file (that could be easily parsed) that could be used if the program was interrupted to continue where it left off. However, parsing that file was never implemented, as, after one long run, I improved on the format. Instead, there is one-off code with known "good" versions.

aknrdureegaesr added the bug label Jan 10, 2024

aknrdureegaesr mentioned this issue Jan 16, 2024

Update of some requirements versions. #3744

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dependency management (file requirements.txt and friends) is far too lenient: Too old versions and incompatible new versions. #3734

Dependency management (file requirements.txt and friends) is far too lenient: Too old versions and incompatible new versions. #3734

aknrdureegaesr commented Jan 10, 2024

Kwpolska commented Jan 10, 2024

aknrdureegaesr commented Jan 10, 2024 •

edited

Kwpolska commented Jan 10, 2024

felixfontein commented Jan 11, 2024

aknrdureegaesr commented Jan 11, 2024

aknrdureegaesr commented Jan 11, 2024

Kwpolska commented Jan 11, 2024

aknrdureegaesr commented Jan 15, 2024

aknrdureegaesr commented Jan 16, 2024

aknrdureegaesr commented Jan 16, 2024

Kwpolska commented Jan 16, 2024

aknrdureegaesr commented Jan 17, 2024 •

edited

Dependency management (file requirements.txt and friends) is far too lenient: Too old versions and incompatible new versions. #3734

Dependency management (file requirements.txt and friends) is far too lenient: Too old versions and incompatible new versions. #3734

Comments

aknrdureegaesr commented Jan 10, 2024

Environment

Description:

Allowing too new versions

Allowing too old versions

Kwpolska commented Jan 10, 2024

aknrdureegaesr commented Jan 10, 2024 • edited

Kwpolska commented Jan 10, 2024

felixfontein commented Jan 11, 2024

aknrdureegaesr commented Jan 11, 2024

aknrdureegaesr commented Jan 11, 2024

Kwpolska commented Jan 11, 2024

aknrdureegaesr commented Jan 15, 2024

aknrdureegaesr commented Jan 16, 2024

aknrdureegaesr commented Jan 16, 2024

Kwpolska commented Jan 16, 2024

aknrdureegaesr commented Jan 17, 2024 • edited

aknrdureegaesr commented Jan 10, 2024 •

edited

aknrdureegaesr commented Jan 17, 2024 •

edited