Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[spec-ify-assets-def] replace dicts inside AssetsDefinition with dict of AssetSpecs #21937

Merged
merged 2 commits into from
May 23, 2024

Conversation

sryza
Copy link
Contributor

@sryza sryza commented May 17, 2024

Summary & Motivation

This PR is the star of the "[spec-ify-assets-def]" stack.

It replaces all of these properties on AssetsDefinition:

    _asset_deps: Mapping[AssetKey, AbstractSet[AssetKey]]
    _group_names_by_key: Mapping[AssetKey, str]
    _metadata_by_key: Mapping[AssetKey, ArbitraryMetadataMapping]
    _tags_by_key: Mapping[AssetKey, Mapping[str, str]]
    _freshness_policies_by_key: Mapping[AssetKey, FreshnessPolicy]
    _auto_materialize_policies_by_key: Mapping[AssetKey, AutoMaterializePolicy]
    _code_versions_by_key: Mapping[AssetKey, Optional[str]]
    _descriptions_by_key: Mapping[AssetKey, str]
    _owners_by_key: Mapping[AssetKey, Sequence[str]]

with just:

    _specs_by_key: Mapping[AssetKey, AssetSpec]

It should have no effect on behavior or public APIs.

A couple things to think about here:

  • Perf. Every time we copy an AssetsDefinition, we need to go back and forth between the two representations. I have followup PRs that make this no longer the case, but that are more invasive.
  • AssetSpec vs. AssetDefinition vs. AssetNode. At times, we've discussed the possibility that AssetSpec should not be the internal object of record that we use to describe assets. This change does not lock us into AssetSpec being the internal object of record. By consolidating all these dicts into one place, this change should make it easier to swap it out for something else in the future if we choose to do so.

How I Tested These Changes

@sryza sryza changed the title [spec-ify-assets-def] replace dicts inside AssetsDefinition with dict… [spec-ify-assets-def] replace dicts inside AssetsDefinition with dict of AssetSpecs May 17, 2024
@sryza sryza force-pushed the assets-def-constructor-validation branch from 7895327 to 377d349 Compare May 17, 2024 20:03
@sryza sryza force-pushed the assets-def-constructor-validation branch from 377d349 to 362059e Compare May 17, 2024 20:16
@sryza sryza force-pushed the assets-def-constructor-validation branch from 362059e to 8a48268 Compare May 17, 2024 21:01
@sryza sryza force-pushed the assets-def-constructor-validation branch from 8a48268 to 78661d0 Compare May 18, 2024 00:08
Base automatically changed from assets-def-constructor-validation to master May 18, 2024 14:16
Copy link
Collaborator

@smackesey smackesey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I heartily approve this direction. Some inline comments. I also wonder whether we can get rid of a lot of the non-public (*_by_key) properties in a follow-up and instead access the specs with a get() method. That's the pattern that AssetGraph and AssetLayer now use.

)
)

self._specs_by_key = {spec.key: spec for spec in normalized_specs}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look at this super-carefully but sort of seems like this logic should be contained in specs_from_asset_key_params or another function rather than inlined into constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specs_from_asset_key_params is narrowly focused on converting from the many-dicts representation to the specs representation.

The reason that encapsulating that is important is that, in followup PRs, we'll allow passing in specs to the AssetsDefinition constructor directly. specs_from_asset_key_params will only be needed when many-dicts are passed in, but the validation logic will need to run either way.

is_subset=self._is_subset,
owners_by_key=owners_by_key,
tags_by_key=tags_by_key,
is_subset=self.is_subset,
Copy link
Collaborator

@smackesey smackesey May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't a lot of these by_key dicts already properties you could just read?

Also, it's unfortunate that we have to round-trip through this *_by_key pattern when copying. Great that you have follow-ups that address that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't a lot of these by_key dicts already properties you could just read?

Yes, but each of those properties requires iterating over the specs_by_key dict. So using them would require iterating over that dict once per property, instead of just once.

And I'd like to eventually eliminate those properties (which are private) and expect people to go through the specs.

@smackesey
Copy link
Collaborator

Also, why AssetSpec instead of AssetNode (I understand we're not locked in though)?

@sryza
Copy link
Contributor Author

sryza commented May 20, 2024

Also, why AssetSpec instead of AssetNode

The main reasons:

  • AssetSpec is already involved in AssetsDefinition construction. I.e. it's already an argument to @multi_asset, so seems like it's not a stretch to make it an argument to the AssetsDefinition constructor (even though that's not done in this PR).
  • AssetSpec is public. This means we can make AssetsDefinition.specs_by_key a public method (even though that's not done in this PR)..

@sryza sryza requested a review from smackesey May 20, 2024 22:18
@smackesey
Copy link
Collaborator

I think we will want to transition to AssetNode later (and convert specs to nodes in the constructor) so that our internals are maximally decoupled from public AssetSpec but I think AssetSpec is fine for now. ✅

@schrockn
Copy link
Member

You love to see it

@sryza sryza merged commit 7c1845f into master May 23, 2024
1 check passed
@sryza sryza deleted the specify-internal branch May 23, 2024 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants