Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(vertexai): Elastic Text-Embedding Model demo. #4127

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

kewent
Copy link

@kewent kewent commented May 8, 2024

Description

Fixes: http://b/334964454

Checklist

  • I have followed Contributing Guidelines from CONTRIBUTING.MD
  • Tests pass: go test -v ./.. (see Testing)
  • Code formatted: gofmt (see Formatting)
  • Vetting pass: go vet (see Formatting)
  • These samples need a new API enabled in testing projects to pass (let us know which ones)
  • These samples need a new/updated env vars in testing projects set to pass (let us know which ones)
  • This sample adds a new sample directory, and I updated the CODEOWNERS file with the codeowners for this sample
  • This sample adds a new Product API, and I updated the Blunderbuss issue/PR auto-assigner with the codeowners for this sample
  • Please merge this PR for me once it is approved

@kewent kewent requested a review from a team as a code owner May 8, 2024 06:01
@product-auto-label product-auto-label bot added the samples Issues that are directly related to samples. label May 8, 2024
@kewent kewent force-pushed the text-embedding-google_io_2024_new_model branch from f5e6632 to 20f37c5 Compare May 8, 2024 06:12
@kewent kewent changed the title feat:override the 003 models to 004 model feat: Elastic Text-Embedding Model demo. May 10, 2024
@grayside grayside changed the title feat: Elastic Text-Embedding Model demo. feat(vertexai): Elastic Text-Embedding Model demo. May 10, 2024
@grayside
Copy link
Contributor

This PR is modifying a sample that's currently in conflict with https://github.com/GoogleCloudPlatform/golang-samples/blob/main/aiplatform/text-embeddings/embeddings.go. One of these samples should be removed or the region tag changed.

Comment on lines 31 to 32
apiEndpoint, project, model string, texts []string,
task string, customOutputDimensionality *int) ([][]float32, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: These parameters need additional usage documentation and several should possibly be bundled into a struct. See #4110 and #4113 for examples.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG, I'll hard-coded some of them and only leave project and texts.

Comment on lines 31 to 32
apiEndpoint, project, model string, texts []string,
task string, customOutputDimensionality *int) ([][]float32, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://googlecloudplatform.github.io/samples-style-guide/#minimal-arguments, we should be using minimal arguments.

blocker: apiEndpoint is generally not specified in samples and should be removed. Looking at the code, it seems like maybe the problem is there's a missing method in the library to facilitate text embedding? If so we can make this not-blocker.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG

if err != nil {
t.Fatal(err)
}
if len(embeddings) != len(texts) || len(embeddings[0]) != 768 {
t.Errorf("len(embeddings), len(embeddings[0]) = %d, %d, want %d, 768", len(embeddings), len(embeddings[0]), len(texts))
if len(embeddings) != len(texts) || len(embeddings[0]) != dimensionality {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: consider improving readability by extracting repeating len calls to 'got' variables.

@@ -28,7 +28,8 @@ import (
)

func embedTexts(
apiEndpoint, project, model string, texts []string, task string) ([][]float32, error) {
apiEndpoint, project, model string, texts []string,
task string, customOutputDimensionality *int) ([][]float32, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://googlecloudplatform.github.io/samples-style-guide/#minimal-arguments, we should be using minimal arguments.

suggestion: Remove customOutputDimensionality as a parameter in favor of hard-coding in the function body.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG

Copy link

snippet-bot bot commented May 14, 2024

Here is the summary of changes.

You are about to delete 1 region tag.

This comment is generated by snippet-bot.
If you find problems with this result, please file an issue at:
https://github.com/googleapis/repo-automation-bots/issues.
To update this comment, add snippet-bot:force-run label or use the checkbox below:

  • Refresh this comment

@kewent
Copy link
Author

kewent commented May 14, 2024

This PR is modifying a sample that's currently in conflict with https://github.com/GoogleCloudPlatform/golang-samples/blob/main/aiplatform/text-embeddings/embeddings.go. One of these samples should be removed or the region tag changed.

Removed duplicated files under ./text-embeddings/

@kewent kewent requested a review from grayside May 14, 2024 00:53
Copy link
Contributor

@grayside grayside left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I missed this feedback about specifying the endpoint on my last review, sorry. This should be the last adjustment.

ctx := context.Background()

apiEndpoint := "us-central1-aiplatform.googleapis.com:443"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocker: samples do not specify the API Endpoint unless specifically needed for the sample. Setting the region explicitly is typical.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
samples Issues that are directly related to samples.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants