Refactor: Remove Cython - Reduce tech debt #346

eddiebergman · 2023-12-18T14:33:18Z

This PR attempts to first convert everything to Python, so a type checker can be run across everything as well as use linters properly. This also includes updating everything to pytest.

Note: This PR is unreviewable. The strategy has been to minimally change tests while refactoring to ensure tested behaviour still passes. Some notable exceptions are where functionality was removed or sampling strategy was changed and the expected values differ. Unfortunatly the formatter has rendered them also unreviewable.

Update: All SMAC unittests passed without code changes so it seems mostly backwards compatible.

Some notable changes:

No more unbounded Normals
No more quantized hyperparameters
Sampling is 3x faster, Neighbors 3.5x faster and validation 2x faster (see scripts/benchmark_sampling.py)
More code re-use between classes
Pure Python, jumping to definition and docstrings should work in editors
Defined data-structures to capture heirarchy in conditional spaces, moving the logic out of ConfigurationSpace.
Better defined public API for ConfigurationSpace.
Better support for custom hyperparameters, i.e. backed by Scipy Distributions (see BetaFloat/Int). Should require a lot less code to do so.
Better defined public API for hyperparameters
A whole load of deprecations with recommended alternatives
Categoricals/Ordinals support arbitrary objects, e.g. no more need to specify Category([True, False, "None"]) (Now handled in follow up feat(Hyperparameters): Allow arbitrary objects in category ordinal #359)

TODO:

Seems deeply nested conditions with OR's were not solved Check that a condition is only part of one ConfigurationSpace object #72
We can add a flag Configuration(space, vector, validate=False) to deal with Problems integrating with BOHB #75
Allow constants with arbitrary values Boolean Constants #86 Legal values for hyperparameter #95

codecov · 2023-12-19T09:18:59Z

Codecov Report

Attention: Patch coverage is 92.06843% with 51 lines in your changes are missing coverage. Please review.

Project coverage is 78.91%. Comparing base (93abc5b) to head (2a29120).
Report is 2 commits behind head on main.

❗ Current head 2a29120 differs from pull request most recent head f84cbd4. Consider uploading reports for the commit f84cbd4 to get more accurate results

Files	Patch %	Lines
ConfigSpace/forbidden.py	82.35%	17 Missing and 1 partial ⚠️
ConfigSpace/conditions.py	93.04%	7 Missing and 1 partial ⚠️
ConfigSpace/configuration_space.py	64.28%	3 Missing and 2 partials ⚠️
ConfigSpace/hyperparameters/normal_float.py	88.09%	4 Missing and 1 partial ⚠️
ConfigSpace/hyperparameters/normal_integer.py	85.71%	4 Missing and 1 partial ⚠️
ConfigSpace/hyperparameters/categorical.py	90.90%	3 Missing ⚠️
ConfigSpace/hyperparameters/ordinal.py	91.42%	3 Missing ⚠️
ConfigSpace/hyperparameters/uniform_integer.py	96.29%	2 Missing ⚠️
ConfigSpace/hyperparameters/numerical.py	97.05%	1 Missing ⚠️
ConfigSpace/hyperparameters/uniform_float.py	96.96%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #346      +/-   ##
==========================================
+ Coverage   73.54%   78.91%   +5.37%     
==========================================
  Files          29       45      +16     
  Lines        2831     4767    +1936     
  Branches      629     1005     +376     
==========================================
+ Hits         2082     3762    +1680     
- Misses        632      806     +174     
- Partials      117      199      +82

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mfeurer

And here's a first batch of comments. I skimmed roughly 40% so far, and will probably take a bit more time...

.github/workflows/pytest.yml

.pre-commit-config.yaml

mfeurer · 2024-04-25T13:39:17Z

pyproject.toml

@@ -1,12 +1,12 @@
 [project]
 name = "ConfigSpace"
-version = "0.7.2"
+version = "0.8.0"


Since this is a major change, how about we go up to 1.0?

I'm good with that

pyproject.toml

mfeurer · 2024-04-25T15:49:56Z

test/test_functional.py

+        (np.array([]), (0, 10), 10, np.array([])),
+    ],
+)
+def test_quantize(


There's not test for the function that quantizes on a log scale. But as asked above, I'm not sure if that's actually still needed.

Yeah that's true, there's no tests for it. The parameters that use it are tested though and I figured that's good enough.

mfeurer · 2024-04-25T15:53:13Z

test/test_util.py

-            "test_searchspaces",
-            "mini_autosklearn_original.pcs",
-        )
+    cs.seed(2)


What's the reason to change the seeds here?

Whoops might have been through debugging it was left in. No reason it should have been changed at the end.

mfeurer · 2024-04-25T15:57:53Z

test/test_util.py

+    all_neighbors = _test_get_one_exchange_neighbourhood(hp)
+    all_neighbors = [neighbor["a"] for neighbor in all_neighbors]
+    assert pytest.approx(np.mean(all_neighbors), abs=1e-2) == 5.44
+    assert pytest.approx(np.var(all_neighbors), abs=1e-2) == 9.14


Do you have any idea why the variance is twice as high as in the previous implementation?

So after some investigation, I found out why, the problem was to do with the test handle it called.

The first part of def _test_get_one_exchange_neighbourhood calculated a num_neighbors which for this uniform int becomes 10.

In the second part, previously it was never passed in and the num_neighbors parameter wasn't used. In this PR, I noticed it wasn't used and must have passed it in as I thought I might have removed it accidentally.

I'm not sure what the intended behaviour is:

Do we remove the for loop calculating num_neighbors? In which case, I get something close to the previous mean and variance

Do we pass it in? In which case the variance is ~9 as we compute a neighbor configuration for every possible integer.

As a side note, the parameter name is misleading as the num_neighbors parameter to get_one_exchange_neighbourhood is actually the number of neighbor configs per hyperparameter, not the total number of neighbors to retrieve.

The test relies on this function:

def _test_get_one_exchange_neighbourhood(hp): cs = ConfigurationSpace() num_neighbors = 0 if not isinstance(hp, list): hp = [hp] for hp_ in hp: cs.add(hp_) if np.isinf(hp_.get_num_neighbors()): num_neighbors += 4 else: num_neighbors += hp_.get_num_neighbors() cs.seed(1) config = cs.get_default_configuration() all_neighbors = [] for i in range(100): neighborhood = get_one_exchange_neighbourhood( config, i, # Before, num_neighbors was set as blank, which defaulted to 4 num_neighbors=4, # The test after updating had it set to the above `num_neighbors` num_neighbors=num_neighbors )

@mfeurer this needs a resolution.

mfeurer · 2024-04-25T15:58:36Z

test/test_util.py

+    all_neighbors = [neighbor["a"] for neighbor in all_neighbors]
+    assert hp.default_value == 3
+
+    assert pytest.approx(np.mean(all_neighbors), abs=1e-2) == 5.77


Do you have any idea why this value is so much higher than in the previous implementation?

mfeurer · 2024-04-25T15:58:56Z

test/test_util.py

+    assert hp.default_value == 3
+
+    assert pytest.approx(np.mean(all_neighbors), abs=1e-2) == 5.77
+    assert pytest.approx(np.var(all_neighbors), abs=1e-2) == 8.39


Similar as above, why would the variance increase so much here?

mfeurer

Part 2: I finished looking at all the tests. Looks really great, I've only got a few questions this time.

mfeurer · 2024-04-30T09:29:56Z

test/test_hyperparameters.py

+    wrong_shape_2 = np.array([3, 5, 7]).reshape(1, -1)
+    wrong_shape_3 = np.array([3, 5, 7]).reshape(-1, 1)
+
+    assert c1.pdf_values(point_1)[0] == pytest.approx(0.7559999999999997)


Why did these five values increase by a factor of five?

Ay yes, this I didn't know how to handle properly.

The previous implementation returned the pdf in the value space (even when considering vectorized values).

ConfigSpace/ConfigSpace/hyperparameters/beta_float.pyx

Lines 189 to 213 in ce27ba2

def _pdf(self, vector: np.ndarray) -> np.ndarray:

"""

Computes the probability density function of the parameter in

the transformed (and possibly normalized, depends on the parameter

type) space. As such, one never has to worry about log-normal

distributions, only normal distributions (as the inverse_transform

in the pdf method handles these).

Parameters

----------

vector: np.ndarray

the (N, ) vector of inputs for which the probability density

function is to be computed.

Returns

----------

np.ndarray(N, )

Probability density values of the input vector

"""

ub = self._inverse_transform(self.upper)

lb = self._inverse_transform(self.lower)

alpha = self.alpha

beta = self.beta

return spbeta(alpha, beta, loc=lb, scale=ub-lb).pdf(vector) \

* (ub-lb) / (self._upper - self._lower)

You can see from the bottom of that function it normalizes with respect to self.upper and self.lower, numbers in value space. In this case reducing it by a factor of 10 (and not 5 as you mentioned).

The newer implementation essentially has hyperparameters being backed by some distribution in vectorized space, that have no idea about what their vectorized format means in value space. The new values reflect this in they not normalized by the lower and upper bound in value space. I figured this was okay as pdf only has a relative meaning when compared to other pdfs, and they all no longer have this value space normalization.

Here's the relevant snippet from the code. I did originally use to pass in the _pdf_norm to rescale things but i figured it was better to have the pdf refer to their pdf in vectorized space and be agnostic to value space.

@dataclass class ScipyContinuousDistribution(Distribution): rv: rv_continuous_frozen lower_vectorized: f64 upper_vectorized: f64 _max_density: float _pdf_norm: float = 1 def sample_vector( self, n: int, *, seed: RandomState | None = None, ) -> Array[f64]: return self.rv.rvs(size=n, random_state=seed).astype(f64) # type: ignore def max_density(self) -> float: return float(self._max_density) def pdf_vector(self, vector: Array[f64]) -> Array[f64]: pdf = self.rv.pdf(vector) / self._pdf_norm # By definition, we don't allow NaNs in the pdf return np.nan_to_num(pdf, nan=0) # type: ignore

As a side beneift, this means its first of all easier to implement new hps as you do not need to consider the interplay of pdf w.r.t. vectorized and value space and secondly if we just rely on the pdf returned by the underlying disitribution, things become a lot more uniform in terms of implementation.

test/test_hyperparameters.py

src/ConfigSpace/api/types/float.py

src/ConfigSpace/api/types/integer.py

mfeurer · 2024-04-30T10:29:10Z

test/test_configuration_space.py

+    assert cs["algo2_subspace:algo2_param1"].default_value == "Y"
+
+
+def test_acts_as_mapping_2():


Hey, what's the reason to rename this test and add a number?

Was an issue where two tests in the same file had the same name but were part of test classes. Making it pytest and removing the classes caused it to fail and I think this was me being lazy.

Renamed to test_configuration_space_acts_as_mapping and test_configuration_acts_as_mapping

test/test_configuration_space.py

mfeurer · 2024-04-30T10:55:56Z

test/test_conditions.py

-        assert cond.vector_value == cond_.vector_value
-
-        # Test invalid conditions:
-        self.assertRaises(TypeError, EqualsCondition, hp2, "parent", 0)


What's the reason to remove this check?

Seems I removed the these checks in the base Condition class when reworking them. I honestly find crowding every __init__ with several instance(param, ThingIThinkItToBe): raise TypeError("Usually a long message to tell user they called it wrong") to be a bit noisy and provide little value that your editor/type checker can provide before running anything. Exceptions being when the type doesnt convey information, i.e. a float between 0-1.

Also raises the blurry line of which parameters should we check of which objects?
If your set on these being required I can add the checks back in.

mfeurer · 2024-04-30T10:58:29Z

test/test_conditions.py

-        assert not condition.evaluate({"blkrest": "0"})
-        assert condition.evaluate({"blkrest": "1"})
-
-    def test_get_parents(self):


Why did you drop this test?

Stray causality, I added it back in and it passes.

sonovice · 2024-05-28T06:24:08Z

For everyone else being stuck due to type errors etc:
You can pull in the current state of this PR to your poetry env with

poetry add git+https://github.com/automl/ConfigSpace.git@refs/pull/346/merge

Might help in your case.

uncythonize, pytest update

ff0f2b2

This was referenced Dec 18, 2023

Remove hyperparameter in a configuration space #332

Open

Added TypedConfigurationSpace #345

Open

eddiebergman changed the title ~~Refactor: Update types for Cython~~ Refactor: Update types for Cython [In Progress] Dec 18, 2023

tests passing; fix add_conditions

e915964

eddiebergman added 18 commits December 19, 2023 14:58

Fixup tests

2a29120

refactor: UniformFloat

79decbd

optim: UniformFloat get neighbors

add84b5

refactor: deprecate UniformFloat is_legal_vector

a3f4297

big overhaul

60dc4db

fix: depercation wraning of np.integer for dtype

279cac2

refactor: Big internal rewrite, passing tests

3bc359f

test passing checkpoint

b7f5d9d

pythonize conditionals

fb368c4

optim: Topological sort and vectorize cond/forbiddens

e4064cf

optim: Further small optimizations

5ebd21f

optim: Transpose sampling matrix for cache locality

a4916be

optim: Add test script for neighbor sampling

4ab1c80

test: Repass tests

584b8ef

tmpcommmit - lets pray

87eb45d

refactor: Factor out DAG structure into dedicated class

fceffa6

refactor: Move things out of c_util

ba1cb59

fix: Minimum spanning

26430fd

eddiebergman changed the title ~~Refactor: Update types for Cython [In Progress]~~ Refactor: Remove Cython - Reduce tech debt Mar 28, 2024

eddiebergman mentioned this pull request Mar 28, 2024

Missing documentation on Forbidden Relations #352

Open

eddiebergman added 2 commits March 28, 2024 16:16

fix: min-cond-span sorted by min(depth of all children)

5fb53f6

refactor: All tests and benchmarks pass

82e3fab

eddiebergman mentioned this pull request Apr 10, 2024

Importing a custom incomplete configuration space? #355

Closed

Merge branch 'main' into typing-cython-3

147e6ec

eddiebergman added 21 commits April 19, 2024 10:46

fix: Use 3.8 compatable isinstance checks

3fe243b

typing: configuration_space.py

cb74fde

typing: dictionary.py

470071b

typing: _condition_tree.py

77522df

typing: forbidden.py

222b097

ci: Add scipy.* to list of mypy ignore missing import

5f65e5a

typing: conditions.py

636f0b6

typing: Fix hyperparameter related issues

9c8a0c9

test: Fixups

560ada3

refactor: Remove nx vendored lib

08b8cf9

test: Use new add as old was deprecated

ad79b5c

refactor: Remove deprecations

4cdb5b3

fix: Use isinstance(..., (type1, type2))

9d66f56

fix: Better casting for reading from pcs

d144026

debug: Check windows issue of not legal

a4467f4

ci: Don't use cache for installation

f773910

fix: Change error type to IllegalVectorizedValue

215256a

debug: More debug prints for ci

01c68c6

ci: Exclude 3.8 3.9 mac

e94825d

debug: More ci debugging statements

05fbe4b

fix: UnitScaler promotes to 64bit to prevent overflow on windows

8a9d51f

mfeurer reviewed Apr 25, 2024

View reviewed changes

fix: Remove non assigned type

f6d1343

mfeurer reviewed Apr 30, 2024

View reviewed changes

eddiebergman added 2 commits April 30, 2024 16:33

fix: Address minor comments

dbaab3f

fix: Minor review comments

f84cbd4

eddiebergman mentioned this pull request May 14, 2024

Bugfix for problematic typecheck when creating Integer #361

Closed

bpkroth mentioned this pull request May 21, 2024

ConfigSpace crashes with Python 3.12 microsoft/MLOS#727

Open

relf mentioned this pull request May 29, 2024

Support for Python 3.12 SMTorg/smt#575

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: Remove Cython - Reduce tech debt #346

Refactor: Remove Cython - Reduce tech debt #346

eddiebergman commented Dec 18, 2023 •

edited

codecov bot commented Dec 19, 2023 •

edited

mfeurer left a comment

mfeurer Apr 25, 2024

eddiebergman Apr 26, 2024

mfeurer Apr 25, 2024

eddiebergman Apr 26, 2024

mfeurer Apr 25, 2024

eddiebergman Apr 26, 2024

mfeurer Apr 25, 2024

eddiebergman Apr 26, 2024

eddiebergman Apr 30, 2024

mfeurer Apr 25, 2024

eddiebergman Apr 26, 2024

mfeurer Apr 25, 2024

eddiebergman Apr 26, 2024

mfeurer left a comment

mfeurer Apr 30, 2024

eddiebergman Apr 30, 2024

mfeurer Apr 30, 2024

eddiebergman Apr 30, 2024

mfeurer Apr 30, 2024

eddiebergman Apr 30, 2024 •

edited

mfeurer Apr 30, 2024

eddiebergman Apr 30, 2024

sonovice commented May 28, 2024

	def _pdf(self, vector: np.ndarray) -> np.ndarray:
	"""
	Computes the probability density function of the parameter in
	the transformed (and possibly normalized, depends on the parameter
	type) space. As such, one never has to worry about log-normal
	distributions, only normal distributions (as the inverse_transform
	in the pdf method handles these).

	Parameters
	----------
	vector: np.ndarray
	the (N, ) vector of inputs for which the probability density
	function is to be computed.

	Returns
	----------
	np.ndarray(N, )
	Probability density values of the input vector
	"""
	ub = self._inverse_transform(self.upper)
	lb = self._inverse_transform(self.lower)
	alpha = self.alpha
	beta = self.beta
	return spbeta(alpha, beta, loc=lb, scale=ub-lb).pdf(vector) \
	* (ub-lb) / (self._upper - self._lower)

		assert cs["algo2_subspace:algo2_param1"].default_value == "Y"


		def test_acts_as_mapping_2():

Refactor: Remove Cython - Reduce tech debt #346

Are you sure you want to change the base?

Refactor: Remove Cython - Reduce tech debt #346

Conversation

eddiebergman commented Dec 18, 2023 • edited

codecov bot commented Dec 19, 2023 • edited

Codecov Report

mfeurer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mfeurer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eddiebergman Apr 30, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonovice commented May 28, 2024

eddiebergman commented Dec 18, 2023 •

edited

codecov bot commented Dec 19, 2023 •

edited

eddiebergman Apr 30, 2024 •

edited