Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvement to dataplane selection and configuration #4031

Open
10 of 12 tasks
paullatzelsperger opened this issue Mar 21, 2024 · 0 comments
Open
10 of 12 tasks

Improvement to dataplane selection and configuration #4031

paullatzelsperger opened this issue Mar 21, 2024 · 0 comments
Assignees
Labels
breaking-change Will require manual intervention for version update dpf Feature related to the Data Plane Framework enhancement New feature or request story Overarching issue with linked sub-issues
Milestone

Comments

@paullatzelsperger
Copy link
Member

paullatzelsperger commented Mar 21, 2024

Feature Request

This is a story issue that spans across several subtasks and aims to improve the data plane capabilities in the aspects detailed here.

1. The transferType as key indicator

In the provider data plane, the transferType (sometimes referred to as "format") should dictate the physical destination of a data pipeline:

  • in PULL transfers: the DataSink is inferred, e.g. HttpData-PULL means, the data is piped into a HttpDataSink
  • in PUSH transfers: the DataSink is inferred, but additional data is required, e.g. AmazonS3-PUSH, the data is pushed into a S3BucketDataSink, but the bucket name, region, etc. must be provided by the consumer.

Note that the source of the data pipeline is always determined by the DataAddress that is associated with the Asset.

From this, we can derive the following requirements:

  1. There must be a way to infer the data destination type from the transferType. I.e. the provider DP must be able to infer from HttpData-PULL that it must instantiate a HttpDataSink. This must be extensible.
  2. The data plane registration only contains a list of explicit mappings transferType -> destinationType. To achieve that, the filtering based on the sourceType should be dropped, and there must be an explicit mapping instead of two disjoint lists ("allowedTransferType", "allowedDestinationTypes").
  3. Changes to the data request: TransferRequestMessage#dataDestination should be optional. DataAddress#getType could be made optional.

2. Additional filtering when building the catalog

In the current implementation, the inclusion of a Distribution for an asset in the catalog is solely based on the physical capability of a data plane (determined by its corresponding DataPlaneInstance). That means, if there is a data plane, that can handle a certain format, it will be included in the catalog. In some situations, users may want to restrict the transfer of certain assets to specific data planes, for example due to security concerns. Note that this could cause some assets not to be included in the catalog at all!

To achieve that, the data plane selector is extended with a getCandidates() method, which evaluates dynamically at runtime the set of data planes that can satisfy a data offering.

3. Automatic mapping of the transferType

As stated before, it is necessary to infer the physical DataSink from the transferType. To achieve that, an extensible directory of transferType -> Class<DataSink> is to be provided.

Which Areas Would Be Affected?

Data Plane, Data Plane Selector

Why Is the Feature Desired?

compliance with DSP (transferType as sole arbiter), avoid invalid configurations in data plane registration

Solution Proposal

the following list of subtasks will realize this feature:

@paullatzelsperger paullatzelsperger added breaking-change Will require manual intervention for version update dpf Feature related to the Data Plane Framework enhancement New feature or request labels Mar 21, 2024
@paullatzelsperger paullatzelsperger added this to the Milestone 15 milestone Mar 21, 2024
@github-actions github-actions bot added the triage all new issues awaiting classification label Mar 21, 2024
@ndr-brt ndr-brt added story Overarching issue with linked sub-issues and removed triage all new issues awaiting classification labels Mar 27, 2024
@ndr-brt ndr-brt self-assigned this Apr 15, 2024
@ndr-brt ndr-brt modified the milestones: Milestone 15, Milestone 16 May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change Will require manual intervention for version update dpf Feature related to the Data Plane Framework enhancement New feature or request story Overarching issue with linked sub-issues
Projects
None yet
Development

No branches or pull requests

4 participants