Flaky unit tests: joblib parallel ValueError #167

jeremyliweishih · 2019-10-30T20:42:09Z

Why are our results for 3.6 and 3.7 consistent but not 3.5?

jeremyliweishih · 2019-10-30T20:54:37Z

This issue can possible deal with Dict() in 3.5 having insertion order.

dsherry · 2020-01-06T16:28:40Z

We should write a summary here of what appears to be inconsistent

jeremyliweishih · 2020-01-08T17:13:55Z

I can't seem to find another ticket but this issue might also be related/fix CircleCI inconsistency with parallelization in 3.5.

Errors that could pop up can be found here.

dsherry · 2020-01-08T22:13:17Z

Awesome, thanks.

What I see in the logs is that some of our unit tests are failing on python 3.5 but still passing on 3.6 and 3.7.

This issue can possible deal with Dict() in 3.5 having insertion order.

That would make sense.

Next questions/tasks:

Decide: how important is it that we support python 3.5 in evalml? (@kmax12: thoughts?)
Reproduce this locally, debug and verify the root cause

dsherry · 2020-01-09T01:05:28Z

I started looking into this.

Summary: able to repro unreliably/occasionally. Still not sure of the root cause.

From sifting through the CircleCI results, it looks like this happens a small percentage of the time. Maybe 10-20%.

Note: I filed #311 to track some warning messages I saw in the unit tests. May be related, unsure.

Stack trace

evalml/tests/automl_tests/test_auto_regression_search.py:83:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
evalml/automl/auto_base.py:165: in search
    self._do_iteration(X, y, pbar, raise_errors)
evalml/automl/auto_base.py:261: in _do_iteration
    raise e
evalml/automl/auto_base.py:258: in _do_iteration
    score, other_scores = pipeline.score(X_test, y_test, other_objectives=self.additional_objectives)
evalml/pipelines/pipeline_base.py:257: in score
    y_predicted = self.predict(X)
evalml/pipelines/pipeline_base.py:205: in predict
    return self.estimator.predict(X_t)
evalml/pipelines/components/estimators/estimator.py:17: in predict
    return self._component_obj.predict(X)
test_python/lib/python3.5/site-packages/sklearn/ensemble/_forest.py:782: in predict
    for e in self.estimators_)
test_python/lib/python3.5/site-packages/joblib/parallel.py:1004: in __call__
    if self.dispatch_one_batch(iterator):
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
...
>               islice = list(itertools.islice(iterator, big_batch_size))
E               ValueError: Stop argument for islice() must be None or an integer: 0 <= x <= sys.maxsize.

test_python/lib/python3.5/site-packages/joblib/parallel.py:808: ValueError

Some possibilities

Bug with the py3.5 version of sklearn's random forest, which uses joblib.Parallel internally. I've noticed the stack traces seem to always mention the line test_python/lib/python3.5/site-packages/sklearn/ensemble/_forest.py:782 (which, if I got the right version, is here). The sklearn version used by CircleCI for py3.5 is 0.22.1.
Perhaps the docker container is snubbing joblib somehow. The final frame in the stack trace is in joblib code

Stuff I tried
I grabbed this failed "linux python 3.5 unit tests" job and used CircleCI's "rerun with SSH" (super handy). Once inside I activated the test_python venv and ran the pytest cmd triggered by make circleci-test

I tried running some of the unit tests which had failed individually, with no luck. It was only when I ran all of them at once that I was able to repro some failures. But the tests which failed changed a bit each time and seemed unpredictable.

Next steps

Make sure this bug is worth the effort: are we going to continue to support python 3.5?
What evalml estimator and dataset is this failure occurring for? Can we repro this by calling the estimator directly? What about with other data?
Can we repro this on either mac or windows?
Continue to try to reliably repro. Perhaps write a similar unit test which wraps sklearn's random forest instead of calling evalml.

Not the cause

I found an issue online with the same ValueError which said: "If you move to a 64 bit build of Python, sys.maxsize will jump from 231 - 1 to 263 - 1." This had me wondering if the test was using 32-bit python. I verified we are using 64-bit python, so that's not it.
I noticed we're using the -n flag on pytest. Perhaps this issue is exposing a bug in the way pytest spins up parallel workers -- wait, never mind, because this test failed before the -n was added by Jeremy on his branch

kmax12 · 2020-01-09T01:36:04Z

in terms of the necessity to support 3.5...

currently 5-10% of our featuretools downloads come from python 3.5. i also looked at a few other ml-related libraries

scikit-learn: ~10%
pandas: 10-15%
xgboost: 20-30%
numpy: 10%

so, my thought would be that yes, we should try to support it since there are people using it. if maintaining it is a slowing us down drastically, we could revisit that.

checkout whatever package you want here: https://pypistats.org/packages/pandas

dsherry · 2020-01-31T22:45:32Z

Just saw another instance of this failure on my PR, here. It hasn't magically gone away :) we should dig into this soon

dsherry · 2020-03-03T18:53:30Z

We're removing support for python 3.5 in #435.

But note @angela97lin mentioned she's seen this failure on python 3.6 💩 Updating issue name to correspond.

RE comment in #435, I wonder if this issue has something to do with our use of OrderedDict... probably not, just adding to the list of possibilities.

@angela97lin do you have any info / links / repro with the 3.6 failure you saw? Was it local or on circleci?

angela97lin · 2020-03-03T19:27:41Z

Sure! I've only run into it via my random_state PR for python 3.6, so I've been trying to debug. Here's that PR: #431

From Slack thread:
My guess is that it has to do something with n_jobs=-1 since in the stack trace, we get a ValueError: Stop argument for islice() must be None or an integer: 0 <= x <= sys.maxsize. ; it’s likely that when n_jobs=-1, the stop argument passed in becomes negative and triggers this exception. The error goes away when n_jobs an positive integer.

Here’s a run where I ran into this: https://app.circleci.com/jobs/github/FeatureLabs/evalml/12366

It seems to only happen on CircleCI, so I wonder if that has to do with the issue at all?

dsherry · 2020-03-04T21:13:38Z

Status
We're able to repro this on python 3.6 on @angela97lin 's PR #441. Circleci failure is here. We can't repro by running individual tests, have to run them all.

We were previously seeing this failure only on python 3.5. Now Angela tweaking the random_state causes this to fail on python 3.6 only. This makes me think there's a race condition which has to do with the ordering of calls to the random number generator. It's quite helpful that it appears to be failing consistently on python 3.6 on Angela's random_state branch.

Next steps

Dylan check numpy/sklearn package versions on 3.6 vs 3.7, and use that to check their changelogs
Dylan try to get another reproducer, off master
Dylan try rerunning all tests in circleci via ssh, see if that fails
Dylan try messing with docker config in circleci job, potential fix
Could add something like the following (docker doc, circleci doc) to unit_tests circleci config to limit test job to 8 cpus:

    docker:
      - command: ['--cpuset', '0-7']

Angela get [WIP] Allow numpy.random.RandomState for random_state #441 back to state where this is the only failing test

dsherry · 2020-03-10T20:23:19Z

We should reevaluate if this is still an issue now that @christopherbunn merged #407

dsherry · 2020-03-30T22:44:33Z

I haven't seen this issue since #407 was merged. Closing.

christopherbunn assigned christopherbunn and unassigned christopherbunn Jan 7, 2020

dsherry mentioned this issue Jan 8, 2020

Support python 3.8 #310

Closed

dsherry changed the title ~~3.5 Inconsistency~~ Unit tests failing on Python 3.5 Jan 8, 2020

dsherry mentioned this issue Jan 8, 2020

Warning messages in unit test: "invalid value encountered in double_scalars" and others #311

Closed

dsherry added the bug Issues tracking problems with existing features. label Jan 8, 2020

dsherry changed the title ~~Unit tests failing on Python 3.5~~ Linux python 3.5 unit tests are failing randomly Jan 9, 2020

christopherbunn mentioned this issue Feb 14, 2020

Write a "dependency update warning" PR checkin test #324

Merged

dsherry mentioned this issue Mar 2, 2020

Drop support for python 3.5 #435

Closed

dsherry changed the title ~~Linux python 3.5 unit tests are failing randomly~~ Linux python 3.5/3.6 unit tests are failing randomly with joblib parallel ValueError Mar 3, 2020

dsherry changed the title ~~Linux python 3.5/3.6 unit tests are failing randomly with joblib parallel ValueError~~ Flaky linux python 3.5/3.6 unit tests: joblib parallel ValueError Mar 3, 2020

dsherry changed the title ~~Flaky linux python 3.5/3.6 unit tests: joblib parallel ValueError~~ Flaky linux python 3.6 unit tests: joblib parallel ValueError Mar 3, 2020

dsherry changed the title ~~Flaky linux python 3.6 unit tests: joblib parallel ValueError~~ Flaky unit tests: joblib parallel ValueError Mar 3, 2020

christopherbunn mentioned this issue Mar 3, 2020

Updated CircleCI tests to cap number of OpenMP threads #407

Merged

dsherry mentioned this issue Mar 3, 2020

Update to XGBoost 1.0.2 #401

Closed

dsherry self-assigned this Mar 4, 2020

dsherry mentioned this issue Mar 10, 2020

[WIP] Allow numpy.random.RandomState for random_state #441

Closed

dsherry closed this as completed Mar 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky unit tests: joblib parallel ValueError #167

Flaky unit tests: joblib parallel ValueError #167

jeremyliweishih commented Oct 30, 2019

jeremyliweishih commented Oct 30, 2019

dsherry commented Jan 6, 2020

jeremyliweishih commented Jan 8, 2020

dsherry commented Jan 8, 2020

dsherry commented Jan 9, 2020 •

edited

kmax12 commented Jan 9, 2020

dsherry commented Jan 31, 2020

dsherry commented Mar 3, 2020

angela97lin commented Mar 3, 2020

dsherry commented Mar 4, 2020 •

edited

dsherry commented Mar 10, 2020

dsherry commented Mar 30, 2020

Flaky unit tests: joblib parallel ValueError #167

Flaky unit tests: joblib parallel ValueError #167

Comments

jeremyliweishih commented Oct 30, 2019

jeremyliweishih commented Oct 30, 2019

dsherry commented Jan 6, 2020

jeremyliweishih commented Jan 8, 2020

dsherry commented Jan 8, 2020

dsherry commented Jan 9, 2020 • edited

kmax12 commented Jan 9, 2020

dsherry commented Jan 31, 2020

dsherry commented Mar 3, 2020

angela97lin commented Mar 3, 2020

dsherry commented Mar 4, 2020 • edited

dsherry commented Mar 10, 2020

dsherry commented Mar 30, 2020

dsherry commented Jan 9, 2020 •

edited

dsherry commented Mar 4, 2020 •

edited