Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Staged Processing #1

Open
adewes opened this issue Aug 18, 2020 · 0 comments
Open

Staged Processing #1

adewes opened this issue Aug 18, 2020 · 0 comments
Labels
enhancement New feature or request

Comments

@adewes
Copy link
Member

adewes commented Aug 18, 2020

Motivation

Many privacy-enhancing transformations require multiple stages. Generalizing attributes for example requires us to define a generalization hierarchy first. Then, in a second step we can apply this hierarchy to the data items. This requires us to process items in stages.

Examples:

  • Generalization hierarchy:

    • Stage 1:
      • Analyze value distribution in items.
    • Stage 2:
      • Generalize items with the given distribution.
  • k-Anonymity:

    • Stage 1:
      • Analyze attribute frequencies.

Implementation Proposal

To enable such staged processing, we plan to make the following additions to the Kodex stream processing mechanisms:

  • Add a numerical stage attribute to the Config model.
  • Add a Batch model that stores information about the processing of a given stage for a number of items.
  • Add an internal buffering mechanism (using internal channels) that enables us to buffer items for multi-stage processing.
  • Make the group store functionality currently implemented in the anonymization/aggregation action available to all actions as a means to perform distributed, parallel computation on data items.
  • Change the scheduler to enable staged processing of data items.
@adewes adewes added the enhancement New feature or request label Aug 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant