Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] Accept numpy generators as random_state #6174

Merged
merged 7 commits into from Nov 9, 2023

Conversation

david-cortes
Copy link
Contributor

This PR adds support for numpy generators as possible inputs for random_state. Generators are the recommended way of drawing random numbers from numpy, while RandomState is deprecated. Lots of other software such as SciPy have moved towards Generator and/or allow both (example: scipy.sparse.random).

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for this! I definitely support it, but think some changes are needed in this PR.


Lots of other software such as SciPy have moved towards Generator

In the future, we'd appreciate if claims like this were supported with some evidence. I do see scipy/scipy#11680 from 3 years ago, looks like scipy has been supporting this for a while.

Although I think it'd be more accurate to say they "added support for" Generator than "moved to" it, as scipy still supports RandomState:

https://github.com/scipy/scipy/blob/6ce4aaf4bc5222e3d1355cd8a392e88f45b4f959/scipy/_lib/_util.py#L63-L64


RandomState is deprecated

That may be true, but I don't support raising lightgbm's oldest support numpy version just to support it.

As of this writing that's numpy==1.16.6:

'numpy==1.16.6' \

It looks to me that np.random.Generator didn't make it into numpy until 1.17.0:

numpy/numpy#13163

That's why the CI job where we test lightgbm's compatibility with the oldest versions it supports is failing on this PR:

Traceback (most recent call last):
  File "./examples/python-guide/advanced_example.py", line 11, in <module>
    import lightgbm as lgb
  File "/usr/local/lib/python3.6/site-packages/lightgbm/__init__.py", line 13, in <module>
    from .sklearn import LGBMClassifier, LGBMModel, LGBMRanker, LGBMRegressor
  File "/usr/local/lib/python3.6/site-packages/lightgbm/sklearn.py", line 391, in <module>
    class LGBMModel(_LGBMModelBase):
  File "/usr/local/lib/python3.6/site-packages/lightgbm/sklearn.py", line 414, in LGBMModel
    importance_type: str = 'split',
AttributeError: module 'numpy.random' has no attribute 'Generator'

(build link)

Can you please introduce this in a way that's backwards-compatible with versions of numpy that didn't yet have np.random.Generator? That'd mean the following changes:

  1. add a block in compat.py which defines a np_random_Generator within a try-catch block, like this:

"""cpu_count()"""
try:
from joblib import cpu_count
def _LGBMCpuCount(only_physical_cores: bool = True) -> int:
return cpu_count(only_physical_cores=only_physical_cores)
except ImportError:
try:
from psutil import cpu_count
def _LGBMCpuCount(only_physical_cores: bool = True) -> int:
return cpu_count(logical=not only_physical_cores) or 1
except ImportError:
from multiprocessing import cpu_count
def _LGBMCpuCount(only_physical_cores: bool = True) -> int:
return cpu_count()

  1. use that compat.np_random_Generator in code here, similar to how compat.pd_DataFrame is used:

if not isinstance(X, (pd_DataFrame, dt_DataTable)):
_X, _y = _LGBMCheckXY(X, y, accept_sparse=True, force_all_finite=False, ensure_min_samples=2)

  1. put all of the type hints touched in this PR in double quotes, so their evaluation will be deferred to type-checking time and not cause import errors in environments with older versions of numpy

like this:

dtype: "np.typing.DTypeLike",

as described in https://mypy.readthedocs.io/en/stable/runtime_troubles.html

@@ -534,11 +534,12 @@ def test_non_serializable_objects_in_callbacks(tmp_path):
assert gbm.booster_.attr_set_inside_callback == 40


def test_random_state_object():
@pytest.mark.parametrize("rng_constructor", [np.random.RandomState, np.random.default_rng])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot to add in my review... thanks very much for adding this to this test!

@david-cortes
Copy link
Contributor Author

Added a compat entry and quoted type hints for running with older numpy.

@david-cortes
Copy link
Contributor Author

I'm not sure why the linter is complaining about seemingly unrelated files. This is what I found in the logs:

python-package/lightgbm/basic.py:805: error: Incompatible return value type (got "tuple[Any, list[str], list[str] | list[int], list[list[Any]]]", expected "tuple[Any, list[str], list[str], list[list[Any]]]")  [return-value]
python-package/lightgbm/basic.py:2939: error: Array constructor argument 1 of type "map[int]" is not convertible to the array element type "Iterable[c_char_p]"  [misc]
python-package/lightgbm/basic.py:2953: error: Array constructor argument 1 of type "map[int]" is not convertible to the array element type "Iterable[c_char_p]"  [misc]
python-package/lightgbm/basic.py:4619: error: Array constructor argument 1 of type "map[int]" is not convertible to the array element type "Iterable[c_char_p]"  [misc]
python-package/lightgbm/basic.py:4633: error: Array constructor argument 1 of type "map[int]" is not convertible to the array element type "Iterable[c_char_p]"  [misc]
python-package/lightgbm/basic.py:4843: error: Array constructor argument 1 of type "map[int]" is not convertible to the array element type "Iterable[c_char_p]"  [misc]
python-package/lightgbm/basic.py:4859: error: Array constructor argument 1 of type "map[int]" is not convertible to the array element type "Iterable[c_char_p]"  [misc]
python-package/lightgbm/engine.py:294: error: Incompatible types in assignment (expression has type "list[tuple[str, str, float, bool]] | list[tuple[str, str, float, bool, float]]", variable has type "list[tuple[str, str, float, bool]]")  [assignment]
Found 8 errors in 2 files (checked 9 source files)

None of those files are being modified here.

@jameslamb
Copy link
Collaborator

Those mypy errors don't actually cause the lint CI job to fail:

mypy \
--config-file=./python-package/pyproject.toml \
./python-package \
|| true

That job is failing on this PR for reasons unrelated to this PR, caused by the latest release of {lintr}, which will be fixed once we merge #6180.

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! This looks good to me. I'll take care of updating it to latest master and merging it once the unrelated CI issues are fixed.

@jameslamb jameslamb merged commit 501e6e6 into microsoft:master Nov 9, 2023
41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants