Add probabilistic classification to hiclass #119

LukasDrews97 · 2024-04-09T14:51:33Z

Add probabilistic classification via calibration to hiclass using the following methods:

Platt Scaling
Isotonic Regression
Beta calibration
(Inductive/Cross) Venn-ABERS calibration

…ty distribution

mirand863 · 2024-04-16T13:29:39Z

hiclass/HierarchicalClassifier.py

@@ -96,6 +105,8 @@ def __init__(
            If True, skip scikit-learn's checks and sample_weight passing for BERT.
        classifier_abbreviation : str, default=""
            The abbreviation of the local hierarchical classifier to be displayed during logging.
+        calibration_method : {"ivap", "cvap", "platt", "isotonic"}, str, default=None


Maybe there is a better way to represent the multiple possible values here. Perhaps Union["ivap", "cvap", "platt", "isotonic"]? I am not so sure without building the documentation to see the result.

mirand863 · 2024-04-16T14:24:24Z

hiclass/LocalClassifierPerLevel.py

+
+            else:
+                calibrators = Parallel(n_jobs=self.n_jobs)(
+                    delayed(logging_wrapper)(


Have you tested the parallel logging in the cluster? It used to be the case that messages were repeated multiple times.

mirand863 · 2024-04-17T11:38:51Z

hiclass/LocalClassifierPerParentNode.py

+        )
+        proba = calibrator.predict_proba(X)
+
+        y[:, 0] = calibrator.classes_[np.argmax(proba, axis=1)]


If you need to use the predictions, wouldn't it be better to use the already implemented predict method? I imagine it could simplify the code here and avoid redundancy

mirand863 · 2024-04-17T11:57:05Z

hiclass/__init__.py

@@ -5,6 +5,8 @@
 from .LocalClassifierPerLevel import LocalClassifierPerLevel
 from .LocalClassifierPerNode import LocalClassifierPerNode
 from .LocalClassifierPerParentNode import LocalClassifierPerParentNode
+from .LocalClassifierPerLevel import LocalClassifierPerLevel


Suggested change

from .LocalClassifierPerLevel import LocalClassifierPerLevel

mirand863 · 2024-04-17T12:07:35Z

hiclass/_calibration/Calibrator.py

+        X : {array-like, sparse matrix} of shape (n_samples, n_features)
+            The calibration input samples. Internally, its dtype will be converted
+            to ``dtype=np.float32``. If a sparse matrix is provided, it will be
+            converted into a sparse ``csc_matrix``.


Suggested change

converted into a sparse ``csc_matrix``.

converted into a sparse ``csr_matrix``.

mirand863 · 2024-04-17T14:39:17Z

hiclass/metrics.py

+    y_true = make_leveled(y_true)
+    y_true = classifier._disambiguate(y_true)


I am not sure I follow why make_leveled and _disambiguate need to be called here.

mirand863 · 2024-04-24T12:54:57Z

hiclass/probability_combiner/ProbabilityCombiner.py

+                predecessor = list(self.classifier.hierarchy_.predecessors(node))[0]
+            except NetworkXError:
+                # skip empty levels
+                continue


perhaps it would be better to have an if/else here since this might hide real errors

mirand863 · 2024-04-24T15:18:18Z

tests/test_LocalClassifierPerNode.py

@@ -189,10 +350,82 @@ def test_predict_sparse(fitted_logistic_regression):
    assert_array_equal(ground_truth, prediction)


+def test_predict_proba(fitted_logistic_regression):


maybe you can use parametrize to reduce redundancy when tests are the same in different files

mirand863 · 2024-04-24T16:43:31Z

hiclass/_calibration/IsotonicRegression.py

why is this wrapper necessary?

mirand863 · 2024-04-29T14:01:10Z

hiclass/_calibration/VennAbersCalibrator.py

+        positive_label = 1
+        unique_labels = np.unique(y)
+        assert len(unique_labels) <= 2
+
+        y = np.where(y == positive_label, 1, 0)
+        y = y.reshape(-1)  # make sure it's a 1D array
+


Maybe these lines can be replaced with the binary_only estimator tag https://scikit-learn.org/stable/developers/develop.html#estimator-tags

Suggested change

positive_label = 1

unique_labels = np.unique(y)

assert len(unique_labels) <= 2

y = np.where(y == positive_label, 1, 0)

y = y.reshape(-1) # make sure it's a 1D array

mirand863 · 2024-05-03T14:06:29Z

Hi @LukasDrews97,

Just a quick request from someone from France that reached out to me via e-mail. Would it be possible to add a threshold to remove labels that have low probability?

LukasDrews97 added 30 commits January 11, 2024 14:40

update pipfile

79b54f2

add 'calibration_method' parameter

2afe7ec

update

211d249

update

eb9b753

update

ded07f1

add predict_proba() method for LocalClassifierPerNode

c85c0f8

fix tests

6e69c95

add scipy as dependency

a0a3206

add scipy as dependency

40a4c9e

flake8, pydocstyle

e522365

add stratified sampling to cvap

505af20

add tests for calibration

fd3c66c

add local brier score + test

521741f

fix cvap

d437d0e

add log loss + test, make brier loss more robust

c0ddc09

add MultiplyCombiner + test

9bc6b2e

add ArithmeticMeanCombiner + test

eb2c044

add GeometricMeanCombiner + test

dacf667

add more test cases for ArithmeticMeanCombiner and GeometricMeanCombiner

3be6b55

enable multithreaded calibration

8f2fa73

add custom Pipeline to support calibration step

d59fea4

add ECE, SCE and ACE calibration metrics + tests

85f8554

fix misspelling

918f734

merge main into uncertainty

0506ebd

add support for LocalClassifierPerParentNode

33e0764

add predict_proba to all model types, change output to full probabili…

8ff021f

…ty distribution

fix test - make LocalClassifierPerLevel compatible with scikit learn

e7bbbf5

add multithreading to LocalClassifierPerLevel

d124318

fix bug

f95981e

fix error with different number of levels

aad04a6

LukasDrews97 added 14 commits April 6, 2024 17:26

add tests for predict_proba

abe6677

refactor InductiveVennAbersCalibrator

a913ff5

refactorings and documentation

ba530b7

merge main into uncertainty branch

38bc5a2

add beta calibration

555b697

flake8 linting

44e5a20

add docstrings, add tests for calibration metrics

f3272e4

merge main into uncertainty branch

4d8a164

add type hints

89dc3fa

run black formatter

34eacd3

make metrics type hints compatible with older python versions

2d0b9db

black formatting for tests

121e134

change black version

e4600d8

correct test

ad2f735

mirand863 reviewed Apr 16, 2024

View reviewed changes

enable BetaCalibrator to handle null values;add test

6f9a1f5

mirand863 reviewed Apr 17, 2024

View reviewed changes

LukasDrews97 added 2 commits April 19, 2024 20:19

add method to normalize probabilities

49ca501

remove unused code

f6fe948

mirand863 reviewed Apr 24, 2024

View reviewed changes

hiclass/_calibration/IsotonicRegression.py Outdated

Copy link

Collaborator

mirand863 Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this wrapper necessary?

mirand863 reviewed Apr 29, 2024

View reviewed changes

LukasDrews97 added 2 commits May 4, 2024 17:29

fix error when calculating log loss

d1a73ac

fix bug when calculating metrics for a single level

5f4fca9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add probabilistic classification to hiclass #119

Add probabilistic classification to hiclass #119

LukasDrews97 commented Apr 9, 2024

mirand863 Apr 16, 2024

mirand863 Apr 16, 2024

mirand863 Apr 17, 2024

mirand863 Apr 17, 2024

mirand863 Apr 17, 2024

mirand863 Apr 17, 2024

mirand863 Apr 24, 2024

mirand863 Apr 24, 2024

mirand863 Apr 24, 2024

mirand863 Apr 29, 2024

mirand863 commented May 3, 2024 •

edited

	converted into a sparse ``csc_matrix``.
	converted into a sparse ``csr_matrix``.

		y_true = make_leveled(y_true)
		y_true = classifier._disambiguate(y_true)

		@@ -189,10 +350,82 @@ def test_predict_sparse(fitted_logistic_regression):
		assert_array_equal(ground_truth, prediction)


		def test_predict_proba(fitted_logistic_regression):

Add probabilistic classification to hiclass #119

Are you sure you want to change the base?

Add probabilistic classification to hiclass #119

Conversation

LukasDrews97 commented Apr 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mirand863 commented May 3, 2024 • edited

mirand863 commented May 3, 2024 •

edited