Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade of pgvector to 0.7.0 #7726

Merged
merged 6 commits into from May 20, 2024
Merged

Upgrade of pgvector to 0.7.0 #7726

merged 6 commits into from May 20, 2024

Conversation

Bodobolero
Copy link
Contributor

@Bodobolero Bodobolero commented May 13, 2024

Upgrade pgvector to 0.7.0.

This PR is based on Heikki's PR #6753 and just uses pgvector 0.7.0 instead of 0.6.0

I have now done all planned manual tests.

The pull request is ready to be reviewed and merged and can be deployed in production together / after swap enablement.

See (neondatabase/autoscaling#800)

Fixes #6516
Fixes #7780

Documentation input for usage recommendations

maintenance_work_mem

In Neon

maintenance_work_mem is very small by default (depends on configured RAM for your compute but can be as low as 64 MB).
To optimize pgvector index build time you may have to bump it up according to your working set size (size of tuples for vector index creation).
You can do so in the current session using

SET maintenance_work_mem='10 GB';

The target value you choose should fit into the memory of your compute size and not exceed 50-60% of available RAM.
The value above has been successfully used on a 7CU endpoint.

max_parallel_maintenance_workers

max_parallel_maintenance_workers is also small by default (2). For efficient parallel pgvector index creation you have to bump it up with

SET max_parallel_maintenance_workers = 7

to make use of all the CPUs available, assuming you have configured your endpoint to use 7CU.

ID input for changelog

pgvector extension in Neon has been upgraded from version 0.5.1 to version 0.7.0.
Please see https://github.com/pgvector/pgvector/ for documentation of new capabilities in pgvector version 0.7.0

If you have existing databases with pgvector 0.5.1 already installed there is a slight difference in behavior in the following corner cases even if you don't run ALTER EXTENSION UPDATE:

L2 distance from NULL::vector

For the following script, comparing the NULL::vector to non-null vectors the resulting output changes:

SET enable_seqscan = off;

CREATE TABLE t (val vector(3));
INSERT INTO t (val) VALUES ('[0,0,0]'), ('[1,2,3]'), ('[1,1,1]'), (NULL);
CREATE INDEX ON t USING hnsw (val vector_l2_ops);

INSERT INTO t (val) VALUES ('[1,2,4]');

SELECT * FROM t ORDER BY val <-> (SELECT NULL::vector);

and now the output is

   val   
---------
 [1,1,1]
 [1,2,4]
 [1,2,3]
 [0,0,0]
(4 rows)

For the following script

SET enable_seqscan = off;

CREATE TABLE t (val vector(3));
INSERT INTO t (val) VALUES ('[0,0,0]'), ('[1,2,3]'), ('[1,1,1]'), (NULL);
CREATE INDEX ON t USING ivfflat (val vector_l2_ops) WITH (lists = 1);

INSERT INTO t (val) VALUES ('[1,2,4]');

SELECT * FROM t ORDER BY val <-> (SELECT NULL::vector);

the output now is

   val   
---------
 [0,0,0]
 [1,2,3]
 [1,1,1]
 [1,2,4]
(4 rows)

changed error messages

If you provide invalid literals for datatype vector you may get improved/changed error messages, for example:

neondb=> SELECT '[4e38,1]'::vector;
ERROR:  "4e38" is out of range for type vector
LINE 1: SELECT '[4e38,1]'::vector;
               ^

@Bodobolero Bodobolero requested review from a team as code owners May 13, 2024 10:13
@Bodobolero Bodobolero marked this pull request as draft May 13, 2024 10:13
Copy link

github-actions bot commented May 13, 2024

3078 tests run: 2951 passed, 0 failed, 127 skipped (full report)


Flaky tests (1)

Postgres 15

  • test_timeline_size_quota_on_startup: release

Code coverage* (full report)

  • functions: 31.3% (6384 of 20401 functions)
  • lines: 47.7% (48646 of 101897 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
5cab1c1 at 2024-05-18T18:09:39.038Z :recycle:

@Bodobolero Bodobolero changed the title Prepare upgrade of pgvector to 0.7.0 Upgrade of pgvector to 0.7.0 May 17, 2024
@Bodobolero Bodobolero requested a review from hlinnaka May 17, 2024 09:11
@Bodobolero Bodobolero marked this pull request as ready for review May 17, 2024 09:11
@Bodobolero Bodobolero added /release-notes Release notes content a/documentation Area: related to documentation labels May 17, 2024
@Bodobolero Bodobolero merged commit a7b84cc into main May 20, 2024
55 checks passed
@Bodobolero Bodobolero deleted the bodobolero/pgvector-v0.7.0 branch May 20, 2024 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a/documentation Area: related to documentation /release-notes Release notes content
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pgvector 0.7.0 integration and test Add pgvector 0.7.0
2 participants