Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] - goose schema command (postgres only) #459

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from
Draft

Conversation

mfridman
Copy link
Collaborator

@mfridman mfridman commented Jan 28, 2023

Close #278

EDIT: I think I'd like to put this under goose beta schema, to denote that this is an unstable command and is experimental. And we can promote this to a more stable command, like goose schema in the future. Hopefully with such commands we can get feedback from users, and fix any issues before making it "stable".

This PR adds a goose schema command that shells out to either pg_dump or docker (latest stable postgres version). By default, this dumps with --schema-only and preserves the raw output, but there is an optional goose flag --clean that effectively does what #345 (comment) describes:

pg_dump --schema-only |\
    grep -v -e '^--' -e '^COMMENT ON' -e '^REVOKE' -e '^GRANT' -e '^SET' \
    -e 'ALTER DEFAULT PRIVILEGES' -e 'OWNER TO' |\
    cat -s

But why?

It's often desirable to check in the database schema, which can be used to re-create the database in lieu of the migrations.

Taking this a step further, when developers iterate on database changes locally they might interact with the database and apply SQL manually, but ultimately forget to port those changes over to their migration files. In CI the checked-in migrations can be applied and the schema dumped, that schema can then be compared to the committed schema within a PR, if there is a mismatch it means the migration files do not match the intended database schema, and the developer is likely working with incorrect assumptions. This should be caught as early as possible to avoid invalid assumptions of the database state.

Even if you don't take it this far, just having a schema in one place makes it easier to reason about. And building this into goose allows everyone to have a consistent way of dumping the schema.

Example

From the root of this repository:

$ make docker-start-postgres

$ export GOOSE_MIGRATION_DIR=./examples/sql-migrations
$ go run ./cmd/goose up
$ go run ./cmd/goose status
2023/01/28 16:27:17     Applied At                  Migration
2023/01/28 16:27:17     =======================================
2023/01/28 16:27:17     Sat Jan 28 21:19:06 2023 -- 00001_create_users_table.sql
2023/01/28 16:27:17     Sat Jan 28 21:19:06 2023 -- 00002_rename_root.sql
2023/01/28 16:27:17     Sat Jan 28 21:19:06 2023 -- 00003_no_transaction.sql

Then dump the schema based on the example migrations with pg_dump, the sha256 is:

pg_dump --dbname=testdb --host=localhost --port=5433 --username=dbuser --schema-only |\
    grep -v -e '^--' -e '^COMMENT ON' -e '^REVOKE' -e '^GRANT' -e '^SET' \
    -e 'ALTER DEFAULT PRIVILEGES' -e 'OWNER TO' |\
    cat -s | sha256sum

6380fab48d773d69abcfa38a6c451b704b4b466e2b272b3f96778a2462f9a998

And the resulting go run ./cmd/goose schema --clean | sha256sum command:

6380fab48d773d69abcfa38a6c451b704b4b466e2b272b3f96778a2462f9a998

SELECT pg_catalog.set_config('search_path', '', false);

CREATE TABLE public.goose_db_version (
    id integer NOT NULL,
    version_id bigint NOT NULL,
    is_applied boolean NOT NULL,
    tstamp timestamp without time zone DEFAULT now()
);

CREATE SEQUENCE public.goose_db_version_id_seq
    AS integer
    START WITH 1
    INCREMENT BY 1
    NO MINVALUE
    NO MAXVALUE
    CACHE 1;

ALTER SEQUENCE public.goose_db_version_id_seq OWNED BY public.goose_db_version.id;

CREATE TABLE public.post (
    id integer NOT NULL,
    title text,
    body text
);

CREATE TABLE public.users (
    id integer NOT NULL,
    username text,
    name text,
    surname text
);

ALTER TABLE ONLY public.goose_db_version ALTER COLUMN id SET DEFAULT nextval('public.goose_db_version_id_seq'::regclass);

ALTER TABLE ONLY public.goose_db_version
    ADD CONSTRAINT goose_db_version_pkey PRIMARY KEY (id);

ALTER TABLE ONLY public.post
    ADD CONSTRAINT post_pkey PRIMARY KEY (id);

ALTER TABLE ONLY public.users
    ADD CONSTRAINT users_pkey PRIMARY KEY (id);

@mfridman mfridman changed the title [wip] - pg_dump schema (postgres only) [wip] - goose schema command (postgres only) Jan 29, 2023
@mfridman mfridman changed the title [wip] - goose schema command (postgres only) [WIP] - goose schema command (postgres only) Jan 29, 2023
@bobhenkel
Copy link

Does this give you 1 giant sql file with all the db objects or is there a way to get 1 object per file? I'm assuming 1 big SQL file as that's typically what people are looking for though I could see value in a file per db object too.

@mfridman
Copy link
Collaborator Author

mfridman commented Feb 5, 2023

Yep, one big SQL file. For small to medium projects this works quite well, but for some, this command won't work at all and they'll probably be using pg_dump directly.

I think the goal with this command is to satisfy the 80% use case.

  1. How do we categorize "db objects", i.e., what is a db object?
  2. What benefit is there to splitting db objects per file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feature: add functionality to dump database schema
2 participants