Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add featurestore module including Featurestore, EntityType, and Feature classes; add get, update, delete, list methods in all featurestore classes; add search method in Feature class #850

Merged
merged 16 commits into from Nov 30, 2021

Conversation

morgandu
Copy link
Contributor

@morgandu morgandu commented Nov 17, 2021

feat: add featurestore module including Featurestore, EntityType, and Feature classes; add get, update, delete, list methods in all featurestore classes; add search method in Feature class

@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Nov 17, 2021
@product-auto-label product-auto-label bot added the api: aiplatform Issues related to the AI Platform API. label Nov 17, 2021
@morgandu morgandu requested a review from lclc19 November 17, 2021 19:22
@morgandu morgandu force-pushed the mor--feature-store-base branch 3 times, most recently from 9af4f69 to 6c97c6e Compare November 17, 2021 21:38

def __init__(
self,
entity_type_name: str,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a common pattern in other resource? I personally find this a bit confuse for overload this arg with both short form and long form, which make a weird case when full path is used, but project/location/featurestore_id also provided. Is this possible to have overloading constructor in Python?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python does not support overloading constructor, that's why we have this as the common pattern for all other Vertex SDK resource classes, for example this is Vertex AI SDK Dataset class constructor.

Agree on the confusion of the usage of optional arguments, but that's the language limitation we are working with. We try to compensate by emphasizing the Docstring and Example Usage throughout the SDK.

def __init__(
self,
entity_type_name: str,
featurestore_id: Optional[str] = None,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's probably easier to keep the project , location, featurestore_id order

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the usage pattern for Vertex SDK is user to initialize the project, see usage example https://github.com/googleapis/python-aiplatform#initialization. And user does not need to specify project and location unless overwritten. That's why Vertex SDK normally ranks project and location to bottom among other arguments and mark as optional.

Also Optional arguments can not go before the Required arguments.

Example: "projects/123/locations/us-central1/featurestores/my_featurestore_id/entityTypes/my_entity_type_id"
or "my_entity_type_id" when project and location are initialized or passed, with featurestore_id passed.
featurestore_id (str):
Optional. Featurestore to retrieve entityType from.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the value if not set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optional featurestore_id defaults to None, see couple of lines above.

featurestore_id: Optional[str] = None,

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean if it's None, does it mean the value from the entity_type_name name will be used, which implies entity_type_name must be the fully qualified path. And when it's None, then entity_type_name must be the short ID form. Probably adding some explanation to be clear

"""
return _featurestores.Featurestore(self._featurestore_name)

def get_feature(self, feature_id: str) -> "_featurestores.Feature":
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect the feature resource exist (i.e it was created before)? If not, then should the (Feature) update function throw error in that case?

Similar for get_entity_type for featurestore class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both get_* methods will pass in the fully-qualified resource name into resource class constructor. If the resource exists, the method will return the constructed resource class, if the resource does not exist, it will actually hit the service client's get_* method and throw service error.

@morgandu morgandu self-assigned this Nov 18, 2021
FEATURE_NAME_PATTERN_REGEX = (
ENTITY_TYPE_NAME_PATTERN_REGEX
+ r"\/features\/(?P<feature_id>"
+ RESOURCE_ID_PATTERN_REGEX
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add more check for feature_id -- it cannot be one of reserved words: entity_id, feature_timestamp, arrival_timestamp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vertex SDK is relying on the service for this type of value validation, most validation we are doing is to assist certain usage within the SDK which is to prepare the value to be passed into the service, i.e. these pattern validations essentially are to assist the limitation due to lack of overloading constructor, and we are validating the arguments and form the final resource name to pass into the service.

However, if the service does not throw an error on the validation you proposed and will cause un alerted error, we can definitely add these in the SDK layer.

Please let me know!


def __init__(
self,
featurestore_name: str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change the params in project , location, featurestore_id order?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

project and location are Optional arguments, whereas featurestore_name is Required argument. The required arguments have to be listed before the optional arguments. There was another discussion of why our project and location are marked as optional.

):
raise ValueError(
f"{featurestore_name} is not in form of a fully-qualified "
f"featurestore resource name nor a featurestore ID."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nor a featurestore ID -> nor a featurestore ID with project and location are initialized or passed.

Copy link
Contributor Author

@morgandu morgandu Nov 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored this raise ValueError into featurestore_utils.py and it does not involve project and location.

google/cloud/aiplatform/_featurestores/featurestore.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/_featurestores/featurestore.py Outdated Show resolved Hide resolved
"""Deletes entity_type resources in this Featurestre given their entity_type IDs.
WARNING: This deletion is permanent.

Args:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add entity_type_ids in args descripiton

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"""Deletes feature resources in this EntityType given their feature IDs.
WARNING: This deletion is permanent.

Args:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add feature_ids description

Copy link
Contributor Author

@morgandu morgandu Nov 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added ids docstring in both Featurestore class and EntityType class's delete method

google/cloud/aiplatform/_featurestores/entity_type.py Outdated Show resolved Hide resolved
List[EntityTypes] - A list of managed entityType resource objects
"""

cls._featurestore_id = featurestore_utils.validate_and_get_featurestore_resource_id(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mutating the class itself, not an instance of the class, which we shouldn't do because it alters the value for this class and all instances of this class. I assume this is needed because the _list implementation or one of the base methods it depends on isn't satisfactory for nested resources.

The guidance here is to override those methods in a general way so they can be used for nested resources. I'm open to raising NotImplementedError on list in this PR and following up in another PR to capture this change because it would be valuable for more than just FeatureStore including Metadata and Tensorboard resources. Or you can tackle this in this PR.

Copy link
Contributor Author

@morgandu morgandu Nov 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated a couple of unnecessary cls._* =* which seems causing the confusion on mutating the class (Those are only to parse some ids and prepare the parent resource name for retrieving child resources).

I have changed the base _list() method to include an optional parent field. Let me know what you think on this implementation and if this addressed your concern.

One thing I would need more guidance is the forming of _resource_noun for the nested resource, which require the ancestor IDs as part of the _resource_noun, which can not be pre defined as our current pattern.

google/cloud/aiplatform/_featurestores/entity_type.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/_featurestores/feature.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/_featurestores/featurestore.py Outdated Show resolved Hide resolved
(compat.V1, featurestore_service_client_v1.FeaturestoreServiceClient),
(compat.V1BETA1, featurestore_service_client_v1beta1.FeaturestoreServiceClient),
(
"online_" + compat.V1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider managing online as a separate client because that's currently the pattern we follow(ie: Endpoint and Prediction), so this would be introducing a new pattern in usage, and it seems like we may want the online client to remain connected (_is_temporary=False) for performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added an FeaturestoreOnlineServingClientWithOverride matching the current pattern

return bool(re.compile(r"^" + RESOURCE_ID_PATTERN_REGEX + r"$").match(resource_id))


def validate_featurestore_name(featurestore_name: str) -> Dict[str, str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preference to name these parse instead of validate.

Also, is additional RESOURCE_ID validation the main reason we are not relying on the GAPIC parse methods[1]?

  1. https://github.com/googleapis/python-aiplatform/blob/main/google/cloud/aiplatform_v1/services/featurestore_service/client.py#L230

Copy link
Contributor Author

@morgandu morgandu Nov 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but now since you point out, I think it is not necessary to have the resource id validation inside the path parsing. Switching to GAPIC parse methods

google/cloud/aiplatform/utils/featurestore_utils.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/utils/featurestore_utils.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/utils/featurestore_utils.py Outdated Show resolved Hide resolved
Copy link
Member

@sasha-gitg sasha-gitg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a more informative PR title because it will be added to the release notes.

@morgandu morgandu changed the title Feat: Vertex SDK Feature Store Integration Part I - Base feat: Vertex AI SDK Featurestore release feat: Add Featurestore, EntityType, and Feature resource classes in the featurestores module feat: Add get_*, update_*, delete_*, list_* methods for all resource classes in the featurestores module Nov 24, 2021
@morgandu morgandu changed the title feat: Vertex AI SDK Featurestore release feat: Add Featurestore, EntityType, and Feature resource classes in the featurestores module feat: Add get_*, update_*, delete_*, list_* methods for all resource classes in the featurestores module feat: Vertex AI SDK Featurestore release feat: Add Featurestore, EntityType, and Feature resource classes in the featurestores module feat: Add get, update, delete, list methods in all resource classes in the featurestores module feat: Add search_features in Featurestore class Nov 24, 2021
@morgandu
Copy link
Contributor Author

morgandu commented Nov 25, 2021

  • Addressed feedback to remove parent resource as property, instead utilize gca_resource i.e. self.resource_name when needed.
  • Removed _resource_noun in list() methods
  • Cleaned up unit tests according to the changes.
  • Added integration test, though in this PR we can only cover list() and search_features() methods in Featurestore class.

@morgandu morgandu force-pushed the mor--feature-store-base branch 2 times, most recently from 56285ae to f2bad65 Compare November 25, 2021 01:02
Example: "projects/123/locations/us-central1/featurestores/my_featurestore_id/entityTypes/my_entity_type_id"
or "my_entity_type_id" when project and location are initialized or passed, with featurestore_id passed.
featurestore_id (str):
Optional. Featurestore to retrieve entityType from.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean if it's None, does it mean the value from the entity_type_name name will be used, which implies entity_type_name must be the fully qualified path. And when it's None, then entity_type_name must be the short ID form. Probably adding some explanation to be clear

google/cloud/aiplatform/_featurestores/entity_type.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/_featurestores/entity_type.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/_featurestores/feature.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/_featurestores/feature.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/_featurestores/feature.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/_featurestores/featurestore.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/_featurestores/entity_type.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/_featurestores/featurestore.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/_featurestores/featurestore.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/utils/featurestore_utils.py Outdated Show resolved Hide resolved
@morgandu morgandu changed the title feat: Vertex AI SDK Featurestore release feat: Add Featurestore, EntityType, and Feature resource classes in the featurestores module feat: Add get, update, delete, list methods in all resource classes in the featurestores module feat: Add search_features in Featurestore class feat: release Vertex AI SDK Featurestore; add Featurestore, EntityType, and Feature resource classes in the featurestores module; add get, update, delete, list methods in all resource classes in the featurestores module; add search_features in Featurestore class Nov 29, 2021
add private _featurestores module
add Featurestore/EntityType/Feature class
add featurestore_utils.py
add FeaturestoreClientWithOverride in utils __init__.py
update compat services and types to include featurestore and featurestore online
add get/update/delete/list/search methods
add unit tests for above methods
add skeleton for create/ingest/read/serve methods
@morgandu morgandu changed the title feat: release Vertex AI SDK Featurestore; add Featurestore, EntityType, and Feature resource classes in the featurestores module; add get, update, delete, list methods in all resource classes in the featurestores module; add search_features in Featurestore class feat: add featurestores module including Featurestore, EntityType, and Feature classes; add get, update, delete, list methods in all featurestore classes; add search method in Feature class Nov 30, 2021
…d forming methods; moved Featurestore.search_features() to Feature.search() and utilize existing class methods; update delete sync wait
google/cloud/aiplatform/_featurestores/featurestore.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/_featurestores/featurestore.py Outdated Show resolved Hide resolved
featurestore_name=self.resource_name, filter=filter, order_by=order_by,
)

@base.optional_sync()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized that this is decorated. I think this is new usage pattern for the SDK and there are three options on usage here to consider. This is worth discussion because this pattern should be used throughout.

1: if this method is decorated then the usage is:

 feature_store.delete_entity_types(..., sync=False)
 feature_store.wait()  # will throw if any of the deletions had issues

In this case there is no need to pass sync to the underlying deletions and instead we can default to asynchronous to make deletions concurrent. Also note that the method needs to be changed to wait on all entity_type deletes not just the last one as it is currently handled.

entity_types = []
for entity_type_id in entity_type_ids:
  entity_type = self.get_entity_type(entity_type_id=entity_type_id)
  entity_type.delete(sync=False)
  entity_types.append(entity_type)

for entity_type in entity_types:
  entity_type.wait()

2: other option is to pass sync directly to the underlying delete calls but not associate them with the FeatureStore:

 feature_store.delete_entity_types(..., sync=False)
 feature_store.wait()  # will NOT throw if any of the deletions had issues

In this case this method should not be decorated with base.optional_sync. There is no need to call wait here and the implementation would look like:

for entity_type_id in entity_type_ids:
  entity_type = self.get_entity_type(entity_type_id=entity_type_id)
  entity_type.delete(sync=sync)

3: is similar to the second option but instead we can return the list of deleted entity types so the user can wait on each one. So usage look like:

deleted_entity_types: List[EntityType] = feature_store.delete_entity_types(..., sync=False)
feature_store.wait()  # will NOT throw if any of the deletions had issues

for entity_type in deleted_entity_types:
  entity_type.wait()

The implementation would look similar to (2) but we compose the list and return it.

entity_types = []
for entity_type_id in entity_type_ids:
  entity_type = self.get_entity_type(entity_type_id=entity_type_id)
  entity_type.delete(sync=sync)
  entity_types.append(entity_type)
return entity_types

Also to unblock this PR, feel free to remove the async support for now and follow up. Up to you.

Copy link
Contributor Author

@morgandu morgandu Nov 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the options. I would vote for option 1 over 2 and 3. The main reason is when we delete child resources from the parent resource, I do want to provide sync option to prevent user to retrieve the delete pending child from the parent, i.e. Featurestore.get_entity_type() when Featurestore.delete_entity_types(). So add a Featurestore.wait() during deleting kind of make sense, and if any of the deletion had issue, it is captured in one place without extra user command.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Copy link
Member

@sasha-gitg sasha-gitg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

@morgandu morgandu changed the title feat: add featurestores module including Featurestore, EntityType, and Feature classes; add get, update, delete, list methods in all featurestore classes; add search method in Feature class feat: add featurestore module including Featurestore, EntityType, and Feature classes; add get, update, delete, list methods in all featurestore classes; add search method in Feature class Nov 30, 2021
@morgandu morgandu merged commit 66745a6 into googleapis:main Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: aiplatform Issues related to the AI Platform API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants