Fix confusion matrix using only predictions as source for labels #249

levkk · 2022-10-17T02:47:42Z

Fix confusing matrix incorrectly using labels from predict only instead of using labels from predict and ground truth. Ideally we should expose the Scikit-like API that passes in all the labels, in case the labels in the test set are not all inclusive (which would be a mistake in train/test partitioning, but can happen).

I'm somewhat confused by the way the API is written because the argument for the confusion_matrix method is called ground_truth, but shouldn't it be the predicted points instead?

Add serialization for LogisticRegression

Serialization for multi-class

Float type restriction with handwritten bounds

Confusion matrix should use labels from predictions and ground truth

Fix clippy

codecov-commenter · 2022-10-17T03:06:54Z

Codecov Report

Base: 39.24% // Head: 39.26% // Increases project coverage by +0.02% 🎉

Coverage data is based on head (3356d42) compared to base (5ebe23c).
Patch coverage: 60.00% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #249      +/-   ##
==========================================
+ Coverage   39.24%   39.26%   +0.02%     
==========================================
  Files          92       92              
  Lines        6085     6089       +4     
==========================================
+ Hits         2388     2391       +3     
- Misses       3697     3698       +1

Impacted Files	Coverage Δ
src/dataset/mod.rs	`29.03% <50.00%> (-0.60%)`	⬇️
src/metrics_classification.rs	`38.36% <100.00%> (-0.63%)`	⬇️
algorithms/linfa-nn/src/linear.rs	`45.16% <0.00%> (-1.72%)`	⬇️
src/correlation.rs	`29.57% <0.00%> (-1.41%)`	⬇️
algorithms/linfa-svm/src/classification.rs	`46.49% <0.00%> (-0.88%)`	⬇️
...rithms/linfa-trees/src/decision_trees/algorithm.rs	`36.60% <0.00%> (-0.45%)`	⬇️
algorithms/linfa-nn/tests/nn.rs	`78.04% <0.00%> (ø)`
algorithms/linfa-linear/src/glm/mod.rs	`52.77% <0.00%> (ø)`
... and 3 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

This is the correct test

YuhanLiin · 2022-10-19T04:08:29Z

The argument is ground_truth because self is the predicted points. The point about using labels from both sources still stands though.

YuhanLiin · 2022-10-19T04:19:11Z

src/dataset/mod.rs

@@ -323,6 +323,18 @@ pub trait Labels {
    fn labels(&self) -> Vec<Self::Elem> {
        self.label_set().into_iter().flatten().collect()


For some reason this method doesn't dedup the final vector. It should do something like union all HashSet together. Or we can just change the return type to HashSet, but that might be too invasive.

YuhanLiin · 2022-10-19T04:26:40Z

src/dataset/mod.rs

@@ -323,6 +323,18 @@ pub trait Labels {
    fn labels(&self) -> Vec<Self::Elem> {
        self.label_set().into_iter().flatten().collect()
    }
+
+    fn combined_labels(&self, other: Vec<Self::Elem>) -> Vec<Self::Elem> {


Better to have this method take &impl Labels or &Self as input. Then you can call label_set on both self and the input and union all the hashsets before converting it into a Vec.

levkk and others added 11 commits October 6, 2022 09:19

Add serialization for LogisticRegression

fb7a1fa

Merge pull request #1 from postgresml/levkk-add-ser-for-logistic

0930322

Add serialization for LogisticRegression

Serialization for multi-class

378c3b4

Merge pull request #2 from postgresml/levkk-fix-missing-ser

ecd48d0

Serialization for multi-class

Float type restriction with handwritten bounds

fe3ae53

Merge pull request #3 from gkobeaga/serde-logistic

9392fe6

Float type restriction with handwritten bounds

Merge branch 'rust-ml:master' into master

3fa43a8

Confusion matrix should use labels from predictions and ground truth

c44940b

Merge pull request #4 from postgresml/levkk-f1-division-by-zero

4057c2d

Confusion matrix should use labels from predictions and ground truth

Clippy fixes

d91de55

Merge pull request #5 from postgresml/levkk-fix-f1-metric

3356d42

Fix clippy

levkk and others added 2 commits October 16, 2022 21:58

This is the correct test

4ac3ec8

Merge pull request #6 from postgresml/levkk-fix-test-not-sure

3dd71b1

This is the correct test

YuhanLiin reviewed Oct 19, 2022

View reviewed changes

montanalow and others added 3 commits June 5, 2023 18:02

Merge branch 'rust-ml:master' into master

1e8ac38

Merge branch 'rust-ml:master' into master

ef0a23a

Merge branch 'rust-ml:master' into master

01c8224

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix confusion matrix using only predictions as source for labels #249

Fix confusion matrix using only predictions as source for labels #249

levkk commented Oct 17, 2022 •

edited

codecov-commenter commented Oct 17, 2022

YuhanLiin commented Oct 19, 2022

YuhanLiin Oct 19, 2022 •

edited

YuhanLiin Oct 19, 2022

		@@ -323,6 +323,18 @@ pub trait Labels {
		fn labels(&self) -> Vec<Self::Elem> {
		self.label_set().into_iter().flatten().collect()

Fix confusion matrix using only predictions as source for labels #249

Are you sure you want to change the base?

Fix confusion matrix using only predictions as source for labels #249

Conversation

levkk commented Oct 17, 2022 • edited

codecov-commenter commented Oct 17, 2022

Codecov Report

YuhanLiin commented Oct 19, 2022

YuhanLiin Oct 19, 2022 • edited

Choose a reason for hiding this comment

YuhanLiin Oct 19, 2022

Choose a reason for hiding this comment

levkk commented Oct 17, 2022 •

edited

YuhanLiin Oct 19, 2022 •

edited