New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically start Embeddings indexing #4091
Conversation
@@ -310,6 +310,11 @@ const register = async ( | |||
searchViewProvider.initialize() | |||
} | |||
|
|||
if (localEmbeddings) { | |||
// kick-off embeddings initialization | |||
localEmbeddings.start() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question to reviewers - is this the right place to start this? We start symf
indexing in the block above, and the call to .start()
takes <1ms in local testing, so this should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds acceptable to me.
if (!loadedOk) { | ||
// failed to load the index, let's see if we should start indexing | ||
if (this.autoIndexingEnabled && this.modelConfig.provider === 'sourcegraph') { | ||
this.index() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not an expert in this part of the code, so I need someone to verify my concern here.
It might be the case that this.eagerlyLoad
will return true, even if there are partial embeddings present for the repo. In that case, we will need check the status and call this.indexRetry
.
I see it mentioned in the test plan that you have covered this case but I just wanted to double-check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for checking and raising this - the existing treatment for partially indexed repos is that we require user action to continue indexing (screenshot), I tested that that this PR doesn't change that logic, I planned to have a separate PR for "automatic restart" once I understand why we don't automatically re-index in the first place (what comes to mind and what I saw in code is that some of the indexing processes can kill the agent, so automatic re-start could make Cody / agent crashloop?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dominiccooney (as the original author of this) - is there a specific reason why we don't automatically re-start indexing after the extension is restarted? Would you be ok with making that change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this PR to restart indexing on extension startup (when no errors occurred in the current run) if we're using Sourcegraph provider and the feature-flag is enabled.
@@ -310,6 +310,11 @@ const register = async ( | |||
searchViewProvider.initialize() | |||
} | |||
|
|||
if (localEmbeddings) { | |||
// kick-off embeddings initialization | |||
localEmbeddings.start() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds acceptable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Start Embedding indexing process on extension startup provided that Sourcegraph embedding API is used instead of OpenAI. Additionally, restarts embedding indexing process if a partial index is found.
Currently behind a dedicated feature flag, default off.
This doesn't remove the "Embedding consent" functionality as it is still needed for OpenAI (the current default).
Test plan