Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency ordering: Generated schema/migration file has wrong DDL ordering with tables and functions, is not valid #196

Open
alexkreidler opened this issue Feb 27, 2022 · 1 comment
Labels

Comments

@alexkreidler
Copy link

I'm using migra with tusker, a tool that creates a temporary PG database CREATE DATABASE; for both my schema and migrations separately, runs them from my SQL files on disk, and then diffs the output.

When I run tusker diff -r schema migrations, it generates something like this (truncated for brevity):

create schema if not exists "app";

create extension if not exists "pgcrypto" with schema "public" version '1.3';

create table "app"."datasets" (
    "id" uuid not null default generate_ulid_uuid(),
    [...]
);

[...]

CREATE OR REPLACE FUNCTION public.generate_ulid_uuid()
 RETURNS uuid
 LANGUAGE plpgsql
AS $function$
DECLARE
    [...]
END
$function$
;

Generally, this generated diff/migration is good, eg it creates schemas before tables before constraints. But the user defined functions are at the end, and my table as you can see relies on a function, so directly applying this migration without manually reordering it results in an error.

I've looked at some issues that seem related to dependency tracking of DDL statements/db objects: #8 for functions, #142 inherited tables, #7 views.

Based on #8, my impression is that migra currently relies on native Postgres functionality to track the dependencies between DB objects, something like this.

The first bullet from #189:

  • Proper dependency resolution (make a graph of all dependencies and topologically sort it)

makes me think it would be great to implement in Python, eg parsing the DDL with a library, applying some rules to the parsed objects to extract possible references to other objects, then creating a DAG and sorting. I have a few thoughts on that:

  • it could be great to make the dependency sorter a separate library/CLI tools so people with a bunch of SQL scripts could keep them in whatever order they want and then run it.
  • because there are often many valid topological sorts of a given DAG, it could be useful to have some additional rules to sort objects when dependencies are not an issue, like ordering by object type: schemas, tables, constraints, functions, like the existing logic. This "stylistic sort" could make things more consistent and human readable. I don't know how complex it would be to combine those two goals.

I believe that ensuring outputted migrations have a valid DB object creation order, so they can be immediately run would be a serious UX (#25) benefit and hopefully encourage adoption of the tool.

I'd love any feedback anyone has on the next steps towards implementing this. Thanks for this great tool!

@djrobstep
Copy link
Owner

Thanks for filing this issue and your interest in this!

As you note, migra mostly relies on a carefully chosen, mostly static ordering which usually, but not always, spits things out in the right order.

The main obstacle right now to doing a more sophisticated sort is actually not the sorting, which as you've seen is a pretty straightforward topological sort conceptually, but the computing the underlying dependency information - there is some dependency information we are already gathering and tracking (eg table->view), but some we aren't - in your example for instance, we don't track other entities referred to in default expressions.

I like the idea of additional optional sorting - a topological sort should be a stable sort, so once we have a topological sort implemented, you should be able to otherwise sort however you want then do a dependency sort to finish off. The ability to load any script and then sort the definition also sounds like a good idea.

There's also a bit of rearchitecting required in terms of generating the output, because the pipeline needs to go from:

schemas -> differences -> output

to:

schemas -> differences -> topologically sortable intermediate state -> output

If you're keen to have a crack at any particular aspect of this, let me know - adding the extra dependency analysis to schemainspect is probably the most self-contained thing and the highest priority, that's definitely the one I would recommend starting with - but open to ideas. Let me know what you think.

@maximsmol maximsmol added the bug label Apr 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants