[FEATURE/BUG] Enable definition of traning and validation set for privacy-preserving machine learning #377

prasser · 2022-01-29T14:48:43Z

Is your feature request related to a problem? Please describe.

The privacy-preserving machine learning framwork in ARX uses k-fold cross-validation to quantify the performance of privacy-preserving models. This can lead to misleading estimates as training and validation sets both influence the optimization process performed during anonymization.

Describe the solution you'd like
As an alternative it would be good to enable users to specify a training and a validation set in such a way that only the training set influences the anonymization process. This can easily be done in ARX by using the "research subset" feature, which allows selecting a subset of the records in a dataset that are then anonymized. What would needed to be added is a feature that allows to specify that machine learning performance is determined based on the set of records that is not included in the research subset.

prasser · 2022-01-29T14:49:07Z

@srcds or @idhamari might want to take a look at this.

idhamari · 2022-01-29T18:36:18Z

@prasser @srcds sounds interesting, I will have a look at this next week.

idhamari · 2022-05-10T09:40:58Z

@prasser

When a user select random records by clicking on "Select Randomly", a view output does not reflect that e.g. by selecting 0.80:

    System.out.println("this.model.getOutput().getNumRows()                   : " + this.model.getOutput().getNumRows());
    System.out.println("this.model.getOutput().getView().getNumRows()         : " + this.model.getOutput().getView().getNumRows());
    System.out.println("this.model.getInputConfig().getResearchSubset.size()  : " + this.model.getInputConfig().getResearchSubset().size());

I get the output:

         this.model.getOutput().getNumRows()                          : 30162
         this.model.getOutput().getView().getNumRows()          : 30162
         this.model.getInputConfig().getResearchSubset.size()  : 24070

I am using the view output to get the training and the testing subset which makes a problem.

prasser · 2022-05-10T10:50:42Z

This is OK and the expected behaviour.

prasser · 2022-06-19T21:01:00Z

An implementation of this is provided in the following branch: feature-training-test

Might still need a little bit of polishing, though.

jenno-verdonck · 2022-10-05T12:13:47Z

I was wondering if this feature can already be used it it's current state?

prasser · 2022-10-05T16:09:59Z

Yes, should work. We would be happy to receive feedback.

prasser · 2022-10-05T16:14:22Z

To be clear. The feature lives here in this branch:

https://github.com/arx-deidentifier/arx/tree/feature-training-test

prasser added bug enhancement labels Jan 29, 2022

prasser self-assigned this Jan 29, 2022

prasser mentioned this issue May 4, 2022

Ia fix 377 core #383

Closed

This was referenced Jun 13, 2022

Ia fix 377 core #389

Merged

Fix issue 377 #394

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE/BUG] Enable definition of traning and validation set for privacy-preserving machine learning #377

[FEATURE/BUG] Enable definition of traning and validation set for privacy-preserving machine learning #377

prasser commented Jan 29, 2022 •

edited

prasser commented Jan 29, 2022

idhamari commented Jan 29, 2022

idhamari commented May 10, 2022

prasser commented May 10, 2022

prasser commented Jun 19, 2022

jenno-verdonck commented Oct 5, 2022

prasser commented Oct 5, 2022

prasser commented Oct 5, 2022

[FEATURE/BUG] Enable definition of traning and validation set for privacy-preserving machine learning #377

[FEATURE/BUG] Enable definition of traning and validation set for privacy-preserving machine learning #377

Comments

prasser commented Jan 29, 2022 • edited

prasser commented Jan 29, 2022

idhamari commented Jan 29, 2022

idhamari commented May 10, 2022

prasser commented May 10, 2022

prasser commented Jun 19, 2022

jenno-verdonck commented Oct 5, 2022

prasser commented Oct 5, 2022

prasser commented Oct 5, 2022

prasser commented Jan 29, 2022 •

edited