Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package reference embeddings #1151

Open
wants to merge 2 commits into
base: initial-discord-bot
Choose a base branch
from

Conversation

breadchris
Copy link
Contributor

Add package reference embeddings to the database and generate them from the CLI. This will let us be able to do a semantic search of package readmes.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 8, 2023

Hasura Semantic Diff

Hasura config files have changed. This comment shows which fields have changed ignoring formatting.

Click to expand!
(root level)
+ two map entries added:
  table:
    name: content_embedding
    schema: package
  object_relationships:
  - name: reference_content
    using:
      foreign_key_constraint_on: reference_content_id


array_relationships
  + one list entry added:
    - name: reference_contents
      using:
        foreign_key_constraint_on:
          column: package_id
          table:
            name: reference_content
            schema: package


(root level)
+ three map entries added:
  table:
    name: reference_content
    schema: package
  object_relationships:
  - name: package
    using:
      foreign_key_constraint_on: package_id
  array_relationships:
  - name: content_embeddings
    using:
      foreign_key_constraint_on:
        column: reference_content_id
        table:
          name: content_embedding
          schema: package

diff --git a/lunatrace/bsl/hasura/migrations/lunatrace/1677850286590_package_reference_embeddings/down.sql b/lunatrace/bsl/hasura/migrations/lunatrace/1677850286590_package_reference_embeddings/down.sql
new file mode 100644
index 00000000..504ab7e0
--- /dev/null
+++ b/lunatrace/bsl/hasura/migrations/lunatrace/1677850286590_package_reference_embeddings/down.sql
@@ -0,0 +1,2 @@
+DROP TABLE "package"."content_embedding";
+DROP TABLE "package"."reference_content";
diff --git a/lunatrace/bsl/hasura/migrations/lunatrace/1677850286590_package_reference_embeddings/up.sql b/lunatrace/bsl/hasura/migrations/lunatrace/1677850286590_package_reference_embeddings/up.sql
new file mode 100644
index 00000000..48c39ca7
--- /dev/null
+++ b/lunatrace/bsl/hasura/migrations/lunatrace/1677850286590_package_reference_embeddings/up.sql
@@ -0,0 +1,25 @@
+CREATE TABLE "package"."reference_content" (
+    "id" uuid NOT NULL DEFAULT gen_random_uuid(),
+    "package_id" uuid NOT NULL REFERENCES "package"."package"("id") ON UPDATE cascade ON DELETE cascade,
+    "url" text NOT NULL,
+    "content" text NOT NULL,
+    "normalized_content" text NOT NULL,
+    "content_type" text NOT NULL,
+    "last_successful_fetch" timestamptz DEFAULT NULL,
+    PRIMARY KEY ("id"),
+    UNIQUE ("package_id", "url")
+);
+
+CREATE TABLE "package"."content_embedding" (
+    "id" uuid NOT NULL DEFAULT gen_random_uuid(),
+    "content_hash" text NOT NULL,
+    "reference_content_id" uuid NOT NULL REFERENCES "package"."reference_content"("id") ON UPDATE cascade ON DELETE cascade,
+    "content" text NOT NULL,
+    "embedding" vector (1536) NOT NULL,
+    PRIMARY KEY ("id"),
+    UNIQUE ("content_hash")
+);
+
+CREATE INDEX ON "package"."content_embedding"
+    USING ivfflat (embedding vector_cosine_ops)
+    WITH (lists = 100);

@breadchris breadchris force-pushed the package-reference-embeddings branch from 2a1ee91 to de48522 Compare March 9, 2023 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant