`test_conda_downgrade` and `test_python2_update` exhibit mixed behavior of results #317

conda-bot · 2023-10-06T07:11:56Z

The Tests workflow failed on 2023-10-10 07:12 UTC

Full run: https://github.com/conda/conda-libmamba-solver/actions/runs/6465510729

(This post will be updated if another test fails, as long as this issue remains open.)

jaimergp · 2023-10-10T08:38:09Z

We have been seeing this a couple times with some flakiness, where Python version is downgraded to 3.5 instead of 3.6.

Traceback (most recent call last):
  File "/opt/conda-libmamba-solver-src/tests/test_modified_upstream.py", line 888, in test_conda_downgrade
    assert pkg.version == "3.6.2"
AssertionError: assert '3.5.4' == '3.6.2'
  - 3.6.2
  + 3.5.4

Probably worth taking a look, specially because of the mixed behavior across platforms and channels (only fails on Linux + conda-forge; maybe due to different "pressures" in the packaging landscape over there).

jaimergp · 2023-10-19T09:52:49Z

Also observed in Windows x Python 3.8 x conda-forge, so not unique to Linux.

jaimergp · 2023-10-19T09:53:42Z

Note that the failing assertions were skipped in #323, but we should still investigate and make this fully pass in a consistent way.

jaimergp · 2023-11-13T22:53:47Z

I gave this a try today in #378 but it's not enough. I left the following bash loop running for a while:

i=0
while pytest "tests/core/test_solve.py::test_conda_downgrade[libmamba]"; do 
  i=$((i+1));
  echo ATTEMPT $i;
done

It ran FOR A WHILE (pun very much intended) and, eventually, at ATTEMPT 137:

FAILED tests/core/test_solve.py::test_conda_downgrade[libmamba] - AssertionError: assert '3.5.4' == '3.6.2'

To be clear: it ran successfully for 136 attempts before failing. This is with #378 checked out and this conda info:

$ conda info

     active environment : base
    active env location : /opt/conda
            shell level : 1
       user config file : /home/test_user/.condarc
 populated config files : /opt/conda/.condarc
          conda version : 23.9.1.dev70+g67806fda9
    conda-build version : 3.27.0
         python version : 3.11.6.final.0
                 solver : libmamba (default)
       virtual packages : __archspec=1=aarch64
                          __glibc=2.36=0
                          __linux=5.15.49=0
                          __unix=0=0
       base environment : /opt/conda  (read only)
      conda av data dir : /opt/conda/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/linux-aarch64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /opt/conda/pkgs
                          /home/test_user/.conda/pkgs
       envs directories : /home/test_user/.conda/envs
                          /opt/conda/envs
               platform : linux-aarch64
             user-agent : conda/23.9.1.dev70+g67806fda9 requests/2.31.0 CPython/3.11.6 Linux/5.15.49-linuxkit-pr debian/12 glibc/2.36 solver/libmamba conda-libmamba-solver/23.9.4.dev19+gdc4e14c libmambapy/1.5.3
                UID:GID : 1001:1001
             netrc file : None
           offline mode : False

... this is going to be fun 😂 Where is the randomness coming from?

jaimergp · 2023-11-14T00:02:29Z

Reproduced again after 107 attempts on 5f39175.

jaimergp · 2023-11-14T09:58:50Z

With the following form (fixing the seed), you can reproduce in <20 attempts (better than ~150 before!):

i=1
while PYTHONHASHSEED=0 CONDA_VERBOSITY=3 pytest "tests/core/test_solve.py::test_conda_downgrade[libmamba]" -vvvs 2>&1 | tee -a ../conda-libmamba-solver-src/logs.txt; do 
  i=$((i+1));
  if [ $i -eq 200 ]; then break; fi;
  echo ATTEMPT $i >> ../conda-libmamba-solver-src/logs.txt;
done

This allowed me to iterate faster and try 933c0fb. With that change I haven't been able to reproduce in several rounds of 200 attempts. 🤷 I also observed in the logs that the average time spent on each test jumped from 16s (sorted) to ~50s (unsorted), because when unsorted the solver has to backtrack a lot. Even if they take longer, the solver eventually finds a solution... it's just that it can be a different one than the expected. IOW, the excess backtracking explains why sometimes it finds a Python 3.5 solution (or even 2.7), instead of the expected 3.6.

Anyway, let's see what the PR is saying now.

jaimergp · 2023-11-15T12:06:11Z

Well, #378 didn't work. We are seeing it in main again. I've run more tests and now I have full logs for the whole libsolv decision process. With a fixed python hash seed the issues are not coming from how containers are sorted anyway (we would either observe it all the time or none of the time, but not sometimes).

These are all passing and you can see identical libsolv logs:

Number 14 failed:

attempt_14.txt

If we diff 13 and 14, we can see how at some point libsolv branches out differently when selecting conda variants:

Note the scrollbar how the previous steps are identical (no diff colors).

Zooming in with text:

  info     libsolv  Selecting variant [b] of (a) conda-4.3.30-py34h69bfab2_0 vs (b) conda-4.3.30-py35hf9359ed_0 (score: 1)
  info     libsolv  Selecting variant [b] of (a) conda-4.3.30-py27h6ae6dc7_0 vs (b) conda-4.3.30-py36h5d9f9f4_0 (score: 1)
  info     libsolv  Selecting variant [b] of (a) conda-4.3.30-py35hf9359ed_0 vs (b) conda-4.3.30-py36h5d9f9f4_0 (score: 64)
  info     libsolv  Selecting variant [a] of (a) conda-4.3.30-py35hf9359ed_0 vs (b) conda-4.3.30-py27h6ae6dc7_0 (score: -1)
  info     libsolv  Selecting variant [a] of (a) conda-4.3.30-py34h69bfab2_0 vs (b) conda-4.3.30-py27h6ae6dc7_0 (score: -1)
  info     libsolv  creating a branch [data=223395]:
- info     libsolv    - conda-4.3.30-py36h5d9f9f4_0
  info     libsolv    - conda-4.3.30-py35hf9359ed_0
+ info     libsolv    - conda-4.3.30-py34h69bfab2_0
  info     libsolv    - conda-4.3.30-py27h6ae6dc7_0
- info     libsolv  installing conda-4.3.30-py36h5d9f9f4_0
+ info     libsolv  installing conda-4.3.30-py35hf9359ed_0
  info     libsolv  installing conda-build-3.12.1-py37_0
  info     libsolv  prune_to_best_version_conda 13

For some reason the py36 is not even listed in the 14th attempt logs, so libsolv ends up choosing the next one, py35.

Additionally, if we compare some successful logs, sometimes we see that not all conda variants were considered in the aforementioned branch. However, as long as py36 is there, it'll be considered first and then the test passes. The problem arises when py36 is not in the branch.

So far, I don't know exactly which strategy is followed to create a branch. The mamba docs provide some info, but I don't see anything there that points to things being dropped.

My next guess: createbranch in libsolv mentions "weak" deps here and there, so I'm going to see if our LOCK | WEAK jobs can just pass by with simply LOCK.

jaimergp · 2023-11-15T15:33:19Z

My next guess: createbranch in libsolv mentions "weak" deps here and there, so I'm going to see if our LOCK | WEAK jobs can just pass by with simply LOCK.

Doesn't seem to change much.

I've also checked in the locally exported synthetic channels part of the test_solve.py infra (get_solver_N helpers) were being added in order or not. They are identical across runs. For the sake of completeness I also tried adding sort_keys=True to the corresponding json.dump call, but didn't change much.

What it seems to help is to reduce the amount of conda specs considered. The failing test asks for a conda downgrade via conda<4.4.10. If I set it to something equally reasonable as conda>=4,<4.4.10, I can't reproduce it (doesn't mean that the bug is gone). This makes me think that the bug is somehow a function of the amount of records being considered candidate for a spec.

I traced the code back to the following chain of functions:

libmambapy.Solver.solve()
libsolv's solver_solve -> solver_run_sat -> resolve_jobrules -> selectandinstall.
In selectandinstall, we will end up calling the createbranch function that prints the list that is sometimes missing the py36 variant of conda. Before that, there are several functions that prune and reorder the list of candidate records. One of them is policy_filter_unwanted, which ends up calling prune_to_best_version. Since we only see v4.3.30 in the offending branch, I'm assuming the bug is there.
Interestingly, prune_to_best_version is patched in the conda libsolv package. So maybe the heisenbug is in this patch!

At this point, to continue debugging, I'm going to need to start recompiling libsolv and see where the missing conda records are. There seems to be some non-determinism in that part of the code.

jaimergp · 2023-11-15T19:42:36Z

Ok forget about (4) above, I added some logging to a custom libsolv (yay!), and now we can pinpoint where the conda recs are being dropped:

info     libsolv  - conda-4.3.8-py35_0 [10653]
info     libsolv  - conda-4.3.8-py36_0 [10654]
info     libsolv  - conda-4.3.9-py27_0 [10655]
info     libsolv  - conda-4.3.9-py34_0 [10656]
info     libsolv  - conda-4.3.9-py35_0 [10657]
info     libsolv  - conda-4.3.9-py36_0 [10658]
info     libsolv  Selecting variant [b] of (a) conda-4.3.30-py27h6ae6dc7_0 vs (b) conda-4.3.30-py34h69bfab2_0 (score: 1)
info     libsolv  Selecting variant [b] of (a) conda-4.3.30-py35hf9359ed_0 vs (b) conda-4.3.30-py36h5d9f9f4_0 (score: 1)
info     libsolv  Selecting variant [b] of (a) conda-4.3.30-py34h69bfab2_0 vs (b) conda-4.3.30-py36h5d9f9f4_0 (score: 2)
info     libsolv  Selecting variant [b] of (a) conda-4.3.30-py34h69bfab2_0 vs (b) conda-4.3.30-py35hf9359ed_0 (score: 128)
info     libsolv  BEFORE prune future
info     libsolv  - considering conda-4.3.30-py36h5d9f9f4_0
info     libsolv  - considering conda-4.3.30-py35hf9359ed_0
info     libsolv  - considering conda-4.3.30-py34h69bfab2_0
info     libsolv  - considering conda-4.3.30-py27h6ae6dc7_0
info     libsolv  BEFORE reorder future
info     libsolv  - considering conda-4.3.30-py27h6ae6dc7_0
info     libsolv  BEFORE prune_yumobs
info     libsolv  - considering conda-4.3.30-py27h6ae6dc7_0
info     libsolv  BEFORE createbranch
info     libsolv  - considering conda-4.3.30-py27h6ae6dc7_0

IOW, the culprit is prune_dq_for_future_installed. Records that fail the replaces_installed_package test are not kept in the queue and dropped. It seems that some records fail this particular check:

  if (!s->obsoletes)
    return 0;

, where s is a Solvable (i.e. a record). No clue what the obsoletes stuff is yet or why some. More importantly, the records that did pass the test did it through a previous check:

  FOR_PROVIDES(p2, pp2, s->name)
    {
      s2 = pool->solvables + p2;
      if (s2->repo == installed && s2->name == s->name && !(noupdate && MAPTST(noupdate, p - installed->start)))
        return 1;
    }

Of those &&-chained checks, it seems to be the last one that fails in some cases !(noupdate && MAPTST(noupdate, p - installed->start). For the life of me I don't know that is doing.

However, I realized this chunk of code was only added recently (libsolv 0.7.25, released in September), which matches the lifetime of this bug. This made it to conda-forge on Sep 29th, but it was never released in defaults (which is stuck with 0.7.24). That's why we only observed it the conda-forge CI.

ANYWAY, if that's true (I am rerunning local tests) we can probably patch it out in the conda-forge feedstock. No changes needed in this repo.

jaimergp · 2023-11-15T20:37:04Z

For completeness, here's a logfile with extended logging. This is the patched function I used:

static int
replaces_installed_package(Pool *pool, Id p, Map *noupdate)
{
  Repo *installed = pool->installed;
  Solvable *s = pool->solvables + p, *s2;
  Id p2, pp2;
  Id obs, *obsp;

  if (s->repo == installed && !(noupdate && MAPTST(noupdate, p - installed->start)))
  {
    POOL_DEBUG(SOLV_DEBUG_POLICY, "  - 1148\n");
    return 1;
  }
  FOR_PROVIDES(p2, pp2, s->name)
    {
      s2 = pool->solvables + p2;

      if (s2->repo == installed) {
        POOL_DEBUG(SOLV_DEBUG_POLICY, "  - s2: %s\n", pool_solvid2str(pool, p2));
        POOL_DEBUG(SOLV_DEBUG_POLICY, "  - 1155\n");
        if (s2->name == s->name) {
          POOL_DEBUG(SOLV_DEBUG_POLICY, "  - 1157\n");
          if (!(noupdate && MAPTST(noupdate, p - installed->start))) {
            POOL_DEBUG(SOLV_DEBUG_POLICY, "  - 1159\n");
            return 1;
          }
        } 
      }
    }
  if (!s->obsoletes)
    {
      POOL_DEBUG(SOLV_DEBUG_POLICY, "  - 1167\n");
      return 0;
    }
  obsp = s->repo->idarraydata + s->obsoletes;
  while ((obs = *obsp++) != 0)
    {
      FOR_PROVIDES(p2, pp2, obs)
	{
	  s2 = pool->solvables + p2;
	  if (s2->repo != pool->installed || (noupdate && MAPTST(noupdate, p - installed->start)))
	    continue;
	  if (!pool->obsoleteusesprovides && !pool_match_nevr(pool, s2, obs))
	    continue;
	  if (pool->obsoleteusescolors && !pool_colormatch(pool, s, s2))
	    continue;
    POOL_DEBUG(SOLV_DEBUG_POLICY, "  - 1177\n");
	  return 1;
	}
    }
  POOL_DEBUG(SOLV_DEBUG_POLICY, "  - 1181\n");
  return 0;
}

jaimergp · 2023-11-15T23:14:56Z

Hm, I found an example of a test running on defaults (hence libsolv 0.7.24, in principle exempt from the bug) that failed. However, every other example I could find was on conda-forge channels 🤔

I fetched all recent logs with these gh oneliner s(in the conda-libmamba-solver repo):

$ gh run list --json "databaseId,conclusion" -w tests --jq '.[] | select (.conclusion=="failure") | .databaseId' --limit 200 | xargs -L1 gh run view --log-failed >> failed_logs.txt
$ gh run list --json "databaseId,conclusion" -w "Upstream tests" --jq '.[] | select (.conclusion=="failure") | .databaseId' --limit 200 | xargs -L1 gh run view --log-failed >> failed_logs_upstream.txt

And then searched for == '3.6.2', which is the assertion we are making:

macos, Python 3.9, conda-forge, upstream-unit-2   2023-11-15T09:20:43.5517450Z �[31mFAILED�[0m tests/core/test_solve.py::�[1mtest_conda_downgrade[libmamba]�[0m - AssertionError: assert '3.5.4' == '3.6.2'
Windows, Python 3.8, conda-forge                  2023-10-19T08:17:45.8305762Z AssertionError: assert '3.5.4' == '3.6.2'
Linux, Python 3.10, conda-forge                   2023-10-19T08:38:22.4137804Z AssertionError: assert '3.5.4' == '3.6.2'
MacOS, Python 3.10, conda-forge                   2023-10-17T06:44:00.8011930Z AssertionError: assert '3.5.4' == '3.6.2'
Windows, Python 3.10, conda-forge                 2023-10-11T07:06:00.9548642Z AssertionError: assert '3.5.4' == '3.6.2'
Linux, Python 3.10, conda-forge                   2023-10-10T07:12:24.6626504Z AssertionError: assert '3.5.4' == '3.6.2'
Windows, Python 3.8, conda-forge                  2023-10-02T06:57:23.4930384Z AssertionError: assert '3.5.4' == '3.6.2'
Windows, Python 3.8, conda-forge                  2023-09-29T21:22:53.0153188Z AssertionError: assert '3.5.4' == '3.6.2'

So it looks like it's mostly conda-forge. Maybe some logs are missing, but the point stands: conda-forge errors are more prominent than in defaults (one one known case).

I do see that the companion function to the one we want to patch out also does some if (MAPTST(&solv->noupdate, p - solv->installed->start)) checks, but that part of the code has been there for 10 years. So honestly, no clue :)

In an abundance of caution I am going to run these experiments overnight:

Cross-diff all the 200 null attempts with the patch applied (none of them "failed"), just to see if the logs arrived to the solution via different backtracking paths. Usual diffs for "identical" runs are within 100-150 lines. When the solver does something different you end up in the thousands lines easily.
Run the reproduction shell script on defaults-Docker (with libsolv 0.7.24), for 1000 iterations max. If this doesn't reproduce it I think we are safe?

If the above results in no reproducers, I'm inclined to let the feedstock patch that function out and close this issue. If it keeps happening after the release (either in defaults or in conda-forge patched), then we reopen and reassess.

jaimergp · 2023-11-16T08:09:40Z

In an abundance of caution I am going to run these experiments overnight.

No reproducers. I think we are fine :)

jaimergp · 2023-11-16T13:16:11Z

Well, I'm seeing some defaults runs failing now on Windows at #381:

🤷 It does use pkgs/main::libsolv-0.7.24 so at this point we can only collect more data and reevaluate.

jaimergp · 2023-11-16T13:39:39Z

Sweet, another run where the same as above happens in all three Windows jobs.

jaimergp · 2023-11-28T09:38:54Z

Adding to the overall flakiness, I've seen tests/core/test_solve.py::test_solve_1[libmamba] fail sometimes by downgrading to Python 2.6 instead of 2.7:

  At index 6 diff: 'channel-1/osx-64::python-2.6.8-6' != 'channel-1/osx-64::python-2.7.5-0'
  Full diff:
    (
     'channel-1/osx-64::openssl-1.0.1c-0',
     'channel-1/osx-64::readline-6.2-0',
     'channel-1/osx-64::sqlite-3.7.13-0',
     'channel-1/osx-64::system-5.8-1',
     'channel-1/osx-64::tk-8.5.13-0',
     'channel-1/osx-64::zlib-1.2.7-0',
  -  'channel-1/osx-64::python-2.7.5-0',
  ?                              ^ ^ ^
  +  'channel-1/osx-64::python-2.6.8-6',
  ?                              ^ ^ ^
  -  'channel-1/osx-64::numpy-1.7.1-py27_0',
  ?                                    ^
  +  'channel-1/osx-64::numpy-1.7.1-py26_0',
  ?                                    ^
    )

This is happening again in test_solve.py, where tests are supposed to be more robust because synthetic local repodata is used (instead of live remote repodata). So this might be an issue with the whole "export memory repodata to disk" hacks in the testing helpers upstream.

jaimergp · 2024-01-16T12:44:45Z

I've been seeing some more tests/core/test_solve.py::test_python2_update[libmamba] flaky failures lately.

conda-bot added type::bug describes erroneous operation, use severity::* to classify the type type::testing issues about tests or the test infrastructure labels Oct 6, 2023

jaimergp changed the title ~~Scheduled tests failed~~ test_conda_downgrade exhibits mixed behavior of results in Linux x conda-forge Oct 10, 2023

jaimergp linked a pull request Oct 19, 2023 that will close this issue

Fail gracefully if offline and no cache is available #323

Merged

3 tasks

jaimergp removed a link to a pull request Oct 19, 2023

Fail gracefully if offline and no cache is available #323

Merged

3 tasks

jaimergp changed the title ~~test_conda_downgrade exhibits mixed behavior of results in Linux x conda-forge~~ test_conda_downgrade exhibits mixed behavior of results Oct 19, 2023

jaimergp mentioned this issue Nov 13, 2023

Always sort installed records and solver specs #378

Merged

3 tasks

jaimergp linked a pull request Nov 15, 2023 that will close this issue

patch prune_dq_for_future_installed out conda-forge/libsolv-feedstock#85

Draft

5 tasks

This was referenced Nov 15, 2023

Difference in performance between conda-forge and defaults? #382

Open

Sort JSON output in temporary test channels conda/conda#13335

Merged

This was referenced Nov 21, 2023

Remove old logic in SolverOutputState.prepare_specs() and apply enabled cleanups #381

Merged

mark test_conda_downgrade as xfail conda/conda#13367

Merged

jaimergp mentioned this issue Dec 17, 2023

Scheduled upstream tests failed #408

Closed

jaimergp mentioned this issue Jan 10, 2024

Scheduled upstream tests failed #417

Closed

jaimergp changed the title ~~test_conda_downgrade exhibits mixed behavior of results~~ test_conda_downgrade and test_python2_update exhibit mixed behavior of results Jan 16, 2024

jaimergp mentioned this issue Jan 16, 2024

Scheduled upstream tests failed #422

Closed

dholth mentioned this issue Feb 8, 2024

Catch ZstdError in jlap/repodata.json.zst handling conda/conda#13559

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`test_conda_downgrade` and `test_python2_update` exhibit mixed behavior of results #317

`test_conda_downgrade` and `test_python2_update` exhibit mixed behavior of results #317

conda-bot commented Oct 6, 2023 •

edited

jaimergp commented Oct 10, 2023

jaimergp commented Oct 19, 2023

jaimergp commented Oct 19, 2023

jaimergp commented Nov 13, 2023

jaimergp commented Nov 14, 2023

jaimergp commented Nov 14, 2023

jaimergp commented Nov 15, 2023

jaimergp commented Nov 15, 2023

jaimergp commented Nov 15, 2023 •

edited

jaimergp commented Nov 15, 2023

jaimergp commented Nov 15, 2023

jaimergp commented Nov 16, 2023

jaimergp commented Nov 16, 2023

jaimergp commented Nov 16, 2023

jaimergp commented Nov 28, 2023

jaimergp commented Jan 16, 2024

test_conda_downgrade and test_python2_update exhibit mixed behavior of results #317

test_conda_downgrade and test_python2_update exhibit mixed behavior of results #317

Comments

conda-bot commented Oct 6, 2023 • edited

jaimergp commented Oct 10, 2023

jaimergp commented Oct 19, 2023

jaimergp commented Oct 19, 2023

jaimergp commented Nov 13, 2023

jaimergp commented Nov 14, 2023

jaimergp commented Nov 14, 2023

jaimergp commented Nov 15, 2023

jaimergp commented Nov 15, 2023

jaimergp commented Nov 15, 2023 • edited

jaimergp commented Nov 15, 2023

jaimergp commented Nov 15, 2023

jaimergp commented Nov 16, 2023

jaimergp commented Nov 16, 2023

jaimergp commented Nov 16, 2023

jaimergp commented Nov 28, 2023

jaimergp commented Jan 16, 2024

`test_conda_downgrade` and `test_python2_update` exhibit mixed behavior of results #317

`test_conda_downgrade` and `test_python2_update` exhibit mixed behavior of results #317

conda-bot commented Oct 6, 2023 •

edited

jaimergp commented Nov 15, 2023 •

edited