hacky support for Postgres schemas w/ runtime interpolation #3370

bgentry · 2024-05-09T14:56:18Z

Hi @kyleconroy and @andrewmbenton 👋 @brandur and I have recently been looking more into making River fully support isolation at the Postgres schema level (that is, allowing different River clients to be configured to use different Postgres schemas). Our initial hope was to be able to rely on the search_path connection param as recommended by pgx and #2635 for this purpose.

Unfortunately, one of our users quickly discovered that this approach is basically incompatible with pgbouncer. As far as I know the only ways to make this work with pgbouncer and sqlc would be:

Explicitly SET search_path prior to any query, within a transaction. This is error prone and has perf penalties in the form of added round trips.
Configure pgbouncer to isolate client pools for specific schemas at the user level. This is a pain to configure and precludes mixing and matching multiple schemas within a single conn pool. Not great.
Hypothetically, configure the database queries to explicitly specify the schema prefix on every query where it's required. I say hypothetically because this is of course not supported by sqlc today due to the fact that the schema can't be parameterized (it must be explicitly inserted into the SQL string).

I decided to spend chunk of time last night seeing if I could hack a way to achieve option (3) in sqlc. It actually wasn't too difficult to achieve in a hacky way, though doing so did cement my dislike of Go's text/template 😆 This draft PR shows one potential path for achieving this using a new use_schema_placeholder option, which when enabled causes all queries to be passed through a runtime string replacement on the known prefix placeholder sqlc_schema_placeholder.. It's definitely ugly and probably shouldn't be shipped in any form that resembles this PR, but I did want to at least raise it as a possible proof-of-concept.

Here's a corresponding River commit showing the queries that I updated to make use of the placeholder prefix. This sqlc PR does actually build and generate fully functional code for being able to use arbitrary schemas specified at runtime as shown in that commit.

I have lots of ideas for how this could be improved or done differently from the way I've done here (caching, maybe make this customizable per-query ala Ecto instead of being statically wired into Queries, etc.), but rather than spin my wheels on them I thought I'd use this draft PR to start a discussion around if and how this might be achieved in sqlc given the fact that the previously recommended approach is not actually viable for those using pgbouncer in the most common way.

I also want to highlight that this is fairly distinct from allowing arbitrary runtime SQL substitution, because a schema prefix can be strictly validated to avoid i.e. injection concerns.

Related issues:

kyleconroy · 2024-05-09T16:09:18Z

@bgentry Thanks for kicking off the conversation. I agree with you that this is a hacky solution, but I think we can figure out a solution. To confirm, when writing your queries, the SQL looks like the following:

-- name: GetAuthor :one
SELECT * FROM sqlc_schema_placeholder.authors
WHERE id = $1 LIMIT 1;

Instead of this, I was thinking we could add support for dynamic identifiers. The queries would look something like this

-- name: GetAuthor :one
SELECT * FROM "{schema}.authors"
WHERE id = $1 LIMIT 1;

This would add another parameter to the queries. The generated code would look like:

func GetAuthor(ctx context.Context, schema string, id int) {
    ....
}

bgentry · 2024-05-09T16:42:57Z

@kyleconroy I think that would work perfectly well for what we would need, and it would be more flexible too (able to support fully qualified my_db.my_schema.table_name identifiers too). Given my limited knowledge of sqlc's internals I'm not really sure how it could be implemented but maybe you do? 😆

Do you have any thoughts on the query cache side of it? Obviously you wouldn't want to perform the same string substitution repeatedly hundreds of times per second. You could imagine a swappable implementation via an interface like:

type QueryCache interface {
    Substitute(query, placeholder, value) string
}

A naiive implementation could just replace it every single time, whereas a more advanced one could cache the results with a tunable size or LRU setup.

elee1766 · 2024-05-09T17:42:18Z

Do you have any thoughts on the query cache side of it?

im not him but i do. i dont think a cache here is as good as you think

Obviously you wouldn't want to perform the same string substitution repeatedly hundreds of times per second.

i dont think this is obvious either. a string replacement is simple O(n) copy operation that can be optimized in advance since the we know how many to replace and actually we already know where the things we need to replace are.

on the other hand, a lookup in cache/LRU is not only a O(n) operation since it needs to validate equality, but it also requires hashing and extra time to shuffle buckets, and it won't necessarily save you any string copies either.

A simple benchmark with naive strings.ReplaceAll shows this- I bet that some simple preprocessing and hits will make it faster and more memory efficient than the mapping. the speed is the same for small queries, and hash table is slower than replace for a large table, im assuming because of the hash + compound key. Making it a nested map might reduce the cost of the compound key but that would increase the runtime cost.

code:

package subbench_test

import (
	"strings"
	"testing"
)

func genQuery(replacement string, prefix, suffix int) string {

	return strings.Repeat("a", prefix) + replacement + strings.Repeat("b", suffix)
}

func BenchmarkReplaceSmallString(b *testing.B) {
	q := genQuery("{schema}", 30, 250)
	b.StartTimer()
	for i := 0; i < b.N; i++ {
		strings.ReplaceAll(q, "{schema}", "public")
	}
}

func BenchmarkReplaceLargeString(b *testing.B) {
	q := genQuery("{schema}", 1000000, 1000000)

	b.StartTimer()
	for i := 0; i < b.N; i++ {
		strings.ReplaceAll(q, "{schema}", "public")
	}
}

func BenchmarkCacheSmallString(b *testing.B) {
	q := genQuery("{schema}", 30, 250)
	ans := strings.ReplaceAll(q, "{schema}", "public")
	cache := map[string]string{}
	cache[q+"public"] = ans
	b.StartTimer()
	for i := 0; i < b.N; i++ {
		_ = cache[q+"public"]
	}
}

func BenchmarkCacheLargeString(b *testing.B) {
	q := genQuery("{schema}", 1000000, 1000000)
	ans := strings.ReplaceAll(q, "{schema}", "public")
	cache := map[string]string{}
	cache[q+"public"] = ans
	b.StartTimer()
	for i := 0; i < b.N; i++ {
		_ = cache[q+"public"]
	}
}

results:

goos: linux
goarch: amd64
pkg: benchmarks
cpu: 13th Gen Intel(R) Core(TM) i9-13900K
BenchmarkReplaceSmallString-32           9861722               194.3 ns/op           288 B/op          1 allocs/op
BenchmarkReplaceLargeString-32              3223            497422 ns/op         2008308 B/op          1 allocs/op
BenchmarkCacheSmallString-32             9496440               183.6 ns/op           320 B/op          1 allocs/op
BenchmarkCacheLargeString-32                3402            588169 ns/op         2009413 B/op          1 allocs/op
PASS

edit: upon some more thinking, it could perhaps be possible to create an efficient cache if one was initialized per query? that would remove the large cost of hashing the query, as that would be compiled into the code, and instead you would only hash your substitutions. It would still be debatable performance based off the query size and substitution size though. the query for pgx has to be a string anyways - and so for large queries, that allocation will probably always be your bottleneck, while for small strings substitution is rather fast.

hacky support for Postgres schemas w/ runtime interpolation

5013ed7

bgentry mentioned this pull request May 9, 2024

configuring schema using search path does not work well in pgbouncer in transaction mode riverqueue/river#345

Open

jessepeterson mentioned this pull request May 15, 2024

Support table prefixes in SQL storage backends micromdm/nanomdm#64

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hacky support for Postgres schemas w/ runtime interpolation #3370

hacky support for Postgres schemas w/ runtime interpolation #3370

bgentry commented May 9, 2024 •

edited

kyleconroy commented May 9, 2024

bgentry commented May 9, 2024

elee1766 commented May 9, 2024 •

edited

hacky support for Postgres schemas w/ runtime interpolation #3370

Are you sure you want to change the base?

hacky support for Postgres schemas w/ runtime interpolation #3370

Conversation

bgentry commented May 9, 2024 • edited

kyleconroy commented May 9, 2024

bgentry commented May 9, 2024

elee1766 commented May 9, 2024 • edited

bgentry commented May 9, 2024 •

edited

elee1766 commented May 9, 2024 •

edited