Dependency ordering: Generated schema/migration file has wrong DDL ordering with tables and functions, is not valid #196

alexkreidler · 2022-02-27T20:13:26Z

I'm using migra with tusker, a tool that creates a temporary PG database CREATE DATABASE; for both my schema and migrations separately, runs them from my SQL files on disk, and then diffs the output.

When I run tusker diff -r schema migrations, it generates something like this (truncated for brevity):

create schema if not exists "app";

create extension if not exists "pgcrypto" with schema "public" version '1.3';

create table "app"."datasets" (
    "id" uuid not null default generate_ulid_uuid(),
    [...]
);

[...]

CREATE OR REPLACE FUNCTION public.generate_ulid_uuid()
 RETURNS uuid
 LANGUAGE plpgsql
AS $function$
DECLARE
    [...]
END
$function$
;

Generally, this generated diff/migration is good, eg it creates schemas before tables before constraints. But the user defined functions are at the end, and my table as you can see relies on a function, so directly applying this migration without manually reordering it results in an error.

I've looked at some issues that seem related to dependency tracking of DDL statements/db objects: #8 for functions, #142 inherited tables, #7 views.

Based on #8, my impression is that migra currently relies on native Postgres functionality to track the dependencies between DB objects, something like this.

The first bullet from #189:

Proper dependency resolution (make a graph of all dependencies and topologically sort it)

makes me think it would be great to implement in Python, eg parsing the DDL with a library, applying some rules to the parsed objects to extract possible references to other objects, then creating a DAG and sorting. I have a few thoughts on that:

it could be great to make the dependency sorter a separate library/CLI tools so people with a bunch of SQL scripts could keep them in whatever order they want and then run it.
because there are often many valid topological sorts of a given DAG, it could be useful to have some additional rules to sort objects when dependencies are not an issue, like ordering by object type: schemas, tables, constraints, functions, like the existing logic. This "stylistic sort" could make things more consistent and human readable. I don't know how complex it would be to combine those two goals.

I believe that ensuring outputted migrations have a valid DB object creation order, so they can be immediately run would be a serious UX (#25) benefit and hopefully encourage adoption of the tool.

I'd love any feedback anyone has on the next steps towards implementing this. Thanks for this great tool!

The text was updated successfully, but these errors were encountered:

djrobstep · 2022-03-05T09:54:41Z

Thanks for filing this issue and your interest in this!

As you note, migra mostly relies on a carefully chosen, mostly static ordering which usually, but not always, spits things out in the right order.

The main obstacle right now to doing a more sophisticated sort is actually not the sorting, which as you've seen is a pretty straightforward topological sort conceptually, but the computing the underlying dependency information - there is some dependency information we are already gathering and tracking (eg table->view), but some we aren't - in your example for instance, we don't track other entities referred to in default expressions.

I like the idea of additional optional sorting - a topological sort should be a stable sort, so once we have a topological sort implemented, you should be able to otherwise sort however you want then do a dependency sort to finish off. The ability to load any script and then sort the definition also sounds like a good idea.

There's also a bit of rearchitecting required in terms of generating the output, because the pipeline needs to go from:

schemas -> differences -> output

to:

schemas -> differences -> topologically sortable intermediate state -> output

If you're keen to have a crack at any particular aspect of this, let me know - adding the extra dependency analysis to schemainspect is probably the most self-contained thing and the highest priority, that's definitely the one I would recommend starting with - but open to ideas. Let me know what you think.

maximsmol added the bug label Apr 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dependency ordering: Generated schema/migration file has wrong DDL ordering with tables and functions, is not valid #196

Dependency ordering: Generated schema/migration file has wrong DDL ordering with tables and functions, is not valid #196

alexkreidler commented Feb 27, 2022

djrobstep commented Mar 5, 2022

Dependency ordering: Generated schema/migration file has wrong DDL ordering with tables and functions, is not valid #196

Dependency ordering: Generated schema/migration file has wrong DDL ordering with tables and functions, is not valid #196

Comments

alexkreidler commented Feb 27, 2022

djrobstep commented Mar 5, 2022