Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_conda_downgrade and test_python2_update exhibit mixed behavior of results #317

Open
conda-bot opened this issue Oct 6, 2023 · 16 comments · May be fixed by conda-forge/libsolv-feedstock#85
Labels
type::bug describes erroneous operation, use severity::* to classify the type type::testing issues about tests or the test infrastructure

Comments

@conda-bot
Copy link
Contributor

conda-bot commented Oct 6, 2023

The Tests workflow failed on 2023-10-10 07:12 UTC

Full run: https://github.com/conda/conda-libmamba-solver/actions/runs/6465510729

(This post will be updated if another test fails, as long as this issue remains open.)

@conda-bot conda-bot added type::bug describes erroneous operation, use severity::* to classify the type type::testing issues about tests or the test infrastructure labels Oct 6, 2023
@jaimergp
Copy link
Contributor

We have been seeing this a couple times with some flakiness, where Python version is downgraded to 3.5 instead of 3.6.

Traceback (most recent call last):
  File "/opt/conda-libmamba-solver-src/tests/test_modified_upstream.py", line 888, in test_conda_downgrade
    assert pkg.version == "3.6.2"
AssertionError: assert '3.5.4' == '3.6.2'
  - 3.6.2
  + 3.5.4

Probably worth taking a look, specially because of the mixed behavior across platforms and channels (only fails on Linux + conda-forge; maybe due to different "pressures" in the packaging landscape over there).

@jaimergp jaimergp changed the title Scheduled tests failed test_conda_downgrade exhibits mixed behavior of results in Linux x conda-forge Oct 10, 2023
@jaimergp jaimergp linked a pull request Oct 19, 2023 that will close this issue
3 tasks
@jaimergp
Copy link
Contributor

Also observed in Windows x Python 3.8 x conda-forge, so not unique to Linux.

@jaimergp jaimergp changed the title test_conda_downgrade exhibits mixed behavior of results in Linux x conda-forge test_conda_downgrade exhibits mixed behavior of results Oct 19, 2023
@jaimergp
Copy link
Contributor

Note that the failing assertions were skipped in #323, but we should still investigate and make this fully pass in a consistent way.

@jaimergp
Copy link
Contributor

I gave this a try today in #378 but it's not enough. I left the following bash loop running for a while:

i=0
while pytest "tests/core/test_solve.py::test_conda_downgrade[libmamba]"; do 
  i=$((i+1));
  echo ATTEMPT $i;
done

It ran FOR A WHILE (pun very much intended) and, eventually, at ATTEMPT 137:

FAILED tests/core/test_solve.py::test_conda_downgrade[libmamba] - AssertionError: assert '3.5.4' == '3.6.2'

To be clear: it ran successfully for 136 attempts before failing. This is with #378 checked out and this conda info:

$ conda info

     active environment : base
    active env location : /opt/conda
            shell level : 1
       user config file : /home/test_user/.condarc
 populated config files : /opt/conda/.condarc
          conda version : 23.9.1.dev70+g67806fda9
    conda-build version : 3.27.0
         python version : 3.11.6.final.0
                 solver : libmamba (default)
       virtual packages : __archspec=1=aarch64
                          __glibc=2.36=0
                          __linux=5.15.49=0
                          __unix=0=0
       base environment : /opt/conda  (read only)
      conda av data dir : /opt/conda/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/linux-aarch64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /opt/conda/pkgs
                          /home/test_user/.conda/pkgs
       envs directories : /home/test_user/.conda/envs
                          /opt/conda/envs
               platform : linux-aarch64
             user-agent : conda/23.9.1.dev70+g67806fda9 requests/2.31.0 CPython/3.11.6 Linux/5.15.49-linuxkit-pr debian/12 glibc/2.36 solver/libmamba conda-libmamba-solver/23.9.4.dev19+gdc4e14c libmambapy/1.5.3
                UID:GID : 1001:1001
             netrc file : None
           offline mode : False

... this is going to be fun 😂 Where is the randomness coming from?

@jaimergp
Copy link
Contributor

Reproduced again after 107 attempts on 5f39175.

@jaimergp
Copy link
Contributor

With the following form (fixing the seed), you can reproduce in <20 attempts (better than ~150 before!):

i=1
while PYTHONHASHSEED=0 CONDA_VERBOSITY=3 pytest "tests/core/test_solve.py::test_conda_downgrade[libmamba]" -vvvs 2>&1 | tee -a ../conda-libmamba-solver-src/logs.txt; do 
  i=$((i+1));
  if [ $i -eq 200 ]; then break; fi;
  echo ATTEMPT $i >> ../conda-libmamba-solver-src/logs.txt;
done

This allowed me to iterate faster and try 933c0fb. With that change I haven't been able to reproduce in several rounds of 200 attempts. 🤷 I also observed in the logs that the average time spent on each test jumped from 16s (sorted) to ~50s (unsorted), because when unsorted the solver has to backtrack a lot. Even if they take longer, the solver eventually finds a solution... it's just that it can be a different one than the expected. IOW, the excess backtracking explains why sometimes it finds a Python 3.5 solution (or even 2.7), instead of the expected 3.6.

Anyway, let's see what the PR is saying now.

@jaimergp
Copy link
Contributor

Well, #378 didn't work. We are seeing it in main again. I've run more tests and now I have full logs for the whole libsolv decision process. With a fixed python hash seed the issues are not coming from how containers are sorted anyway (we would either observe it all the time or none of the time, but not sometimes).

These are all passing and you can see identical libsolv logs:

Number 14 failed:

If we diff 13 and 14, we can see how at some point libsolv branches out differently when selecting conda variants:

image

Note the scrollbar how the previous steps are identical (no diff colors).

Zooming in with text:

  info     libsolv  Selecting variant [b] of (a) conda-4.3.30-py34h69bfab2_0 vs (b) conda-4.3.30-py35hf9359ed_0 (score: 1)
  info     libsolv  Selecting variant [b] of (a) conda-4.3.30-py27h6ae6dc7_0 vs (b) conda-4.3.30-py36h5d9f9f4_0 (score: 1)
  info     libsolv  Selecting variant [b] of (a) conda-4.3.30-py35hf9359ed_0 vs (b) conda-4.3.30-py36h5d9f9f4_0 (score: 64)
  info     libsolv  Selecting variant [a] of (a) conda-4.3.30-py35hf9359ed_0 vs (b) conda-4.3.30-py27h6ae6dc7_0 (score: -1)
  info     libsolv  Selecting variant [a] of (a) conda-4.3.30-py34h69bfab2_0 vs (b) conda-4.3.30-py27h6ae6dc7_0 (score: -1)
  info     libsolv  creating a branch [data=223395]:
- info     libsolv    - conda-4.3.30-py36h5d9f9f4_0
  info     libsolv    - conda-4.3.30-py35hf9359ed_0
+ info     libsolv    - conda-4.3.30-py34h69bfab2_0
  info     libsolv    - conda-4.3.30-py27h6ae6dc7_0
- info     libsolv  installing conda-4.3.30-py36h5d9f9f4_0
+ info     libsolv  installing conda-4.3.30-py35hf9359ed_0
  info     libsolv  installing conda-build-3.12.1-py37_0
  info     libsolv  prune_to_best_version_conda 13

For some reason the py36 is not even listed in the 14th attempt logs, so libsolv ends up choosing the next one, py35.

Additionally, if we compare some successful logs, sometimes we see that not all conda variants were considered in the aforementioned branch. However, as long as py36 is there, it'll be considered first and then the test passes. The problem arises when py36 is not in the branch.

So far, I don't know exactly which strategy is followed to create a branch. The mamba docs provide some info, but I don't see anything there that points to things being dropped.

My next guess: createbranch in libsolv mentions "weak" deps here and there, so I'm going to see if our LOCK | WEAK jobs can just pass by with simply LOCK.

@jaimergp
Copy link
Contributor

My next guess: createbranch in libsolv mentions "weak" deps here and there, so I'm going to see if our LOCK | WEAK jobs can just pass by with simply LOCK.

Doesn't seem to change much.


I've also checked in the locally exported synthetic channels part of the test_solve.py infra (get_solver_N helpers) were being added in order or not. They are identical across runs. For the sake of completeness I also tried adding sort_keys=True to the corresponding json.dump call, but didn't change much.

What it seems to help is to reduce the amount of conda specs considered. The failing test asks for a conda downgrade via conda<4.4.10. If I set it to something equally reasonable as conda>=4,<4.4.10, I can't reproduce it (doesn't mean that the bug is gone). This makes me think that the bug is somehow a function of the amount of records being considered candidate for a spec.

I traced the code back to the following chain of functions:

  1. libmambapy.Solver.solve()
  2. libsolv's solver_solve -> solver_run_sat -> resolve_jobrules -> selectandinstall.
  3. In selectandinstall, we will end up calling the createbranch function that prints the list that is sometimes missing the py36 variant of conda. Before that, there are several functions that prune and reorder the list of candidate records. One of them is policy_filter_unwanted, which ends up calling prune_to_best_version. Since we only see v4.3.30 in the offending branch, I'm assuming the bug is there.
  4. Interestingly, prune_to_best_version is patched in the conda libsolv package. So maybe the heisenbug is in this patch!

At this point, to continue debugging, I'm going to need to start recompiling libsolv and see where the missing conda records are. There seems to be some non-determinism in that part of the code.

@jaimergp
Copy link
Contributor

jaimergp commented Nov 15, 2023

Ok forget about (4) above, I added some logging to a custom libsolv (yay!), and now we can pinpoint where the conda recs are being dropped:

info     libsolv  - conda-4.3.8-py35_0 [10653]
info     libsolv  - conda-4.3.8-py36_0 [10654]
info     libsolv  - conda-4.3.9-py27_0 [10655]
info     libsolv  - conda-4.3.9-py34_0 [10656]
info     libsolv  - conda-4.3.9-py35_0 [10657]
info     libsolv  - conda-4.3.9-py36_0 [10658]
info     libsolv  Selecting variant [b] of (a) conda-4.3.30-py27h6ae6dc7_0 vs (b) conda-4.3.30-py34h69bfab2_0 (score: 1)
info     libsolv  Selecting variant [b] of (a) conda-4.3.30-py35hf9359ed_0 vs (b) conda-4.3.30-py36h5d9f9f4_0 (score: 1)
info     libsolv  Selecting variant [b] of (a) conda-4.3.30-py34h69bfab2_0 vs (b) conda-4.3.30-py36h5d9f9f4_0 (score: 2)
info     libsolv  Selecting variant [b] of (a) conda-4.3.30-py34h69bfab2_0 vs (b) conda-4.3.30-py35hf9359ed_0 (score: 128)
info     libsolv  BEFORE prune future
info     libsolv  - considering conda-4.3.30-py36h5d9f9f4_0
info     libsolv  - considering conda-4.3.30-py35hf9359ed_0
info     libsolv  - considering conda-4.3.30-py34h69bfab2_0
info     libsolv  - considering conda-4.3.30-py27h6ae6dc7_0
info     libsolv  BEFORE reorder future
info     libsolv  - considering conda-4.3.30-py27h6ae6dc7_0
info     libsolv  BEFORE prune_yumobs
info     libsolv  - considering conda-4.3.30-py27h6ae6dc7_0
info     libsolv  BEFORE createbranch
info     libsolv  - considering conda-4.3.30-py27h6ae6dc7_0

IOW, the culprit is prune_dq_for_future_installed. Records that fail the replaces_installed_package test are not kept in the queue and dropped. It seems that some records fail this particular check:

  if (!s->obsoletes)
    return 0;

, where s is a Solvable (i.e. a record). No clue what the obsoletes stuff is yet or why some. More importantly, the records that did pass the test did it through a previous check:

  FOR_PROVIDES(p2, pp2, s->name)
    {
      s2 = pool->solvables + p2;
      if (s2->repo == installed && s2->name == s->name && !(noupdate && MAPTST(noupdate, p - installed->start)))
        return 1;
    }

Of those &&-chained checks, it seems to be the last one that fails in some cases !(noupdate && MAPTST(noupdate, p - installed->start). For the life of me I don't know that is doing.

However, I realized this chunk of code was only added recently (libsolv 0.7.25, released in September), which matches the lifetime of this bug. This made it to conda-forge on Sep 29th, but it was never released in defaults (which is stuck with 0.7.24). That's why we only observed it the conda-forge CI.

ANYWAY, if that's true (I am rerunning local tests) we can probably patch it out in the conda-forge feedstock. No changes needed in this repo.

@jaimergp
Copy link
Contributor

For completeness, here's a logfile with extended logging. This is the patched function I used:

static int
replaces_installed_package(Pool *pool, Id p, Map *noupdate)
{
  Repo *installed = pool->installed;
  Solvable *s = pool->solvables + p, *s2;
  Id p2, pp2;
  Id obs, *obsp;

  if (s->repo == installed && !(noupdate && MAPTST(noupdate, p - installed->start)))
  {
    POOL_DEBUG(SOLV_DEBUG_POLICY, "  - 1148\n");
    return 1;
  }
  FOR_PROVIDES(p2, pp2, s->name)
    {
      s2 = pool->solvables + p2;

      if (s2->repo == installed) {
        POOL_DEBUG(SOLV_DEBUG_POLICY, "  - s2: %s\n", pool_solvid2str(pool, p2));
        POOL_DEBUG(SOLV_DEBUG_POLICY, "  - 1155\n");
        if (s2->name == s->name) {
          POOL_DEBUG(SOLV_DEBUG_POLICY, "  - 1157\n");
          if (!(noupdate && MAPTST(noupdate, p - installed->start))) {
            POOL_DEBUG(SOLV_DEBUG_POLICY, "  - 1159\n");
            return 1;
          }
        } 
      }
    }
  if (!s->obsoletes)
    {
      POOL_DEBUG(SOLV_DEBUG_POLICY, "  - 1167\n");
      return 0;
    }
  obsp = s->repo->idarraydata + s->obsoletes;
  while ((obs = *obsp++) != 0)
    {
      FOR_PROVIDES(p2, pp2, obs)
	{
	  s2 = pool->solvables + p2;
	  if (s2->repo != pool->installed || (noupdate && MAPTST(noupdate, p - installed->start)))
	    continue;
	  if (!pool->obsoleteusesprovides && !pool_match_nevr(pool, s2, obs))
	    continue;
	  if (pool->obsoleteusescolors && !pool_colormatch(pool, s, s2))
	    continue;
    POOL_DEBUG(SOLV_DEBUG_POLICY, "  - 1177\n");
	  return 1;
	}
    }
  POOL_DEBUG(SOLV_DEBUG_POLICY, "  - 1181\n");
  return 0;
}

@jaimergp
Copy link
Contributor

Hm, I found an example of a test running on defaults (hence libsolv 0.7.24, in principle exempt from the bug) that failed. However, every other example I could find was on conda-forge channels 🤔

I fetched all recent logs with these gh oneliner s(in the conda-libmamba-solver repo):

$ gh run list --json "databaseId,conclusion" -w tests --jq '.[] | select (.conclusion=="failure") | .databaseId' --limit 200 | xargs -L1 gh run view --log-failed >> failed_logs.txt
$ gh run list --json "databaseId,conclusion" -w "Upstream tests" --jq '.[] | select (.conclusion=="failure") | .databaseId' --limit 200 | xargs -L1 gh run view --log-failed >> failed_logs_upstream.txt

And then searched for == '3.6.2', which is the assertion we are making:

macos, Python 3.9, conda-forge, upstream-unit-2   2023-11-15T09:20:43.5517450Z �[31mFAILED�[0m tests/core/test_solve.py::�[1mtest_conda_downgrade[libmamba]�[0m - AssertionError: assert '3.5.4' == '3.6.2'
Windows, Python 3.8, conda-forge                  2023-10-19T08:17:45.8305762Z AssertionError: assert '3.5.4' == '3.6.2'
Linux, Python 3.10, conda-forge                   2023-10-19T08:38:22.4137804Z AssertionError: assert '3.5.4' == '3.6.2'
MacOS, Python 3.10, conda-forge                   2023-10-17T06:44:00.8011930Z AssertionError: assert '3.5.4' == '3.6.2'
Windows, Python 3.10, conda-forge                 2023-10-11T07:06:00.9548642Z AssertionError: assert '3.5.4' == '3.6.2'
Linux, Python 3.10, conda-forge                   2023-10-10T07:12:24.6626504Z AssertionError: assert '3.5.4' == '3.6.2'
Windows, Python 3.8, conda-forge                  2023-10-02T06:57:23.4930384Z AssertionError: assert '3.5.4' == '3.6.2'
Windows, Python 3.8, conda-forge                  2023-09-29T21:22:53.0153188Z AssertionError: assert '3.5.4' == '3.6.2'

So it looks like it's mostly conda-forge. Maybe some logs are missing, but the point stands: conda-forge errors are more prominent than in defaults (one one known case).

I do see that the companion function to the one we want to patch out also does some if (MAPTST(&solv->noupdate, p - solv->installed->start)) checks, but that part of the code has been there for 10 years. So honestly, no clue :)

In an abundance of caution I am going to run these experiments overnight:

  • Cross-diff all the 200 null attempts with the patch applied (none of them "failed"), just to see if the logs arrived to the solution via different backtracking paths. Usual diffs for "identical" runs are within 100-150 lines. When the solver does something different you end up in the thousands lines easily.
  • Run the reproduction shell script on defaults-Docker (with libsolv 0.7.24), for 1000 iterations max. If this doesn't reproduce it I think we are safe?

If the above results in no reproducers, I'm inclined to let the feedstock patch that function out and close this issue. If it keeps happening after the release (either in defaults or in conda-forge patched), then we reopen and reassess.

@jaimergp
Copy link
Contributor

In an abundance of caution I am going to run these experiments overnight.

No reproducers. I think we are fine :)

@jaimergp
Copy link
Contributor

@jaimergp
Copy link
Contributor

Sweet, another run where the same as above happens in all three Windows jobs.

@jaimergp
Copy link
Contributor

Adding to the overall flakiness, I've seen tests/core/test_solve.py::test_solve_1[libmamba] fail sometimes by downgrading to Python 2.6 instead of 2.7:

  At index 6 diff: 'channel-1/osx-64::python-2.6.8-6' != 'channel-1/osx-64::python-2.7.5-0'
  Full diff:
    (
     'channel-1/osx-64::openssl-1.0.1c-0',
     'channel-1/osx-64::readline-6.2-0',
     'channel-1/osx-64::sqlite-3.7.13-0',
     'channel-1/osx-64::system-5.8-1',
     'channel-1/osx-64::tk-8.5.13-0',
     'channel-1/osx-64::zlib-1.2.7-0',
  -  'channel-1/osx-64::python-2.7.5-0',
  ?                              ^ ^ ^
  +  'channel-1/osx-64::python-2.6.8-6',
  ?                              ^ ^ ^
  -  'channel-1/osx-64::numpy-1.7.1-py27_0',
  ?                                    ^
  +  'channel-1/osx-64::numpy-1.7.1-py26_0',
  ?                                    ^
    )

This is happening again in test_solve.py, where tests are supposed to be more robust because synthetic local repodata is used (instead of live remote repodata). So this might be an issue with the whole "export memory repodata to disk" hacks in the testing helpers upstream.

@jaimergp jaimergp changed the title test_conda_downgrade exhibits mixed behavior of results test_conda_downgrade and test_python2_update exhibit mixed behavior of results Jan 16, 2024
@jaimergp
Copy link
Contributor

I've been seeing some more tests/core/test_solve.py::test_python2_update[libmamba] flaky failures lately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type::bug describes erroneous operation, use severity::* to classify the type type::testing issues about tests or the test infrastructure
Projects
Status: 🆕 New
Development

Successfully merging a pull request may close this issue.

2 participants