Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

postgres -> postgres include/exclude logic not working correctly #1556

Open
7 tasks done
pob8888 opened this issue Jan 24, 2024 · 1 comment
Open
7 tasks done

postgres -> postgres include/exclude logic not working correctly #1556

pob8888 opened this issue Jan 24, 2024 · 1 comment

Comments

@pob8888
Copy link

pob8888 commented Jan 24, 2024

  • pgloader --version

    3.6.3~devel and also 3.6.2079646 (compiled from src today)
    
  • did you test a fresh compile from the source tree?

yes

  • did you search for other similar issues?

yes - issue #1153 similar, but was not resolved/progressed

  • how can I reproduce the bug?
load database
  from pgsql://adminuser:redacted@dbsrc.abcd.eu-west-2.rds.amazonaws.com:5432/postgres
  into pgsql://adminuser:redacted@dbdst.abcd.eu-west-2.rds.amazonaws.com:5432/postgres
  with drop schema
  including only table names matching ~/.*/ in schema 'schema1'
  including only table names matching ~/.*/ in schema 'schema2'
    excluding table names matching ~/_staging/ in schema 'schema1'
--  excluding table names matching ~/_staging/ in schema 'schema2'
  cast type "character varying" to "character varying" keep typemod
;

  • pgloader output you obtain
2024-01-24T20:21:37.008000Z LOG pgloader version "3.6.3~devel"
2024-01-24T20:21:37.008000Z LOG Data errors in '/tmp/pgloader/'
2024-01-24T20:21:37.008000Z LOG Parsing commands from file #P"/home/pob/my.dms"
2024-01-24T20:21:37.196002Z LOG Migrating from #<PGSQL-CONNECTION pgsql://adminuser@dbsrc.abcd.eu-west-2.rds.amazonaws.com:5432/postgres {1007EE2E93}>
2024-01-24T20:21:37.196002Z LOG Migrating into #<PGSQL-CONNECTION pgsql://adminuser@dbdst.abcd.eu-west-2.rds.amazonaws.com:5432/postgres {1007EE46A3}>
2024-01-24T20:21:39.052005Z LOG report summary reset
               table name     errors       rows      bytes      total time
-------------------------  ---------  ---------  ---------  --------------
          fetch meta data          0          1                     0.336s
           Create Schemas          0          0                     0.164s
         Create SQL Types          0          0                     0.052s
            Create tables          0          2                     0.144s
           Set Table OIDs          0          1                     0.020s
-------------------------  ---------  ---------  ---------  --------------
"schema1"."films_staging"          0          0                     0.312s
-------------------------  ---------  ---------  ---------  --------------
  COPY Threads Completion          0          4                     0.316s
   Index Build Completion          0          0                     0.000s
          Reset Sequences          0          0                     0.260s
             Primary Keys          0          0                     0.000s
      Create Foreign Keys          0          0                     0.000s
          Create Triggers          0          0                     0.096s
         Install Comments          0          0                     0.000s
-------------------------  ---------  ---------  ---------  --------------
        Total import time          ✓          0                     0.672s

  • data that is being loaded, if relevant
\d schema*.*
                          Table "schema1.films_staging"
 Column  |         Type          | Collation | Nullable |         Default         
---------+-----------------------+-----------+----------+-------------------------
 title   | character varying(20) |           |          | NULL::character varying
 release | date                  |           |          | 
 awards  | character varying(20) |           |          | NULL::character varying

  • How the data is different from what you expected, if relevant

I am expecting it to NOT load the table schema1.films_staging
I am also expecting to see schema1.films_me, schema2.films, schema2.filmes_staging being copied

Instead, it is copying only schema1.films_staging - it is as though the exclude is inverted.

Ultimately, I am trying to achieve something like this config:

load database
  from pgsql://adminuser:redacted@dbsrc.abcd.eu-west-2.rds.amazonaws.com:5432/postgres
  into pgsql://adminuser:redacted@dbdst.abcd.eu-west-2.rds.amazonaws.com:5432/postgres
  with drop schema
  including only table names matching ~/.*/ in schema 'schema1'
  including only table names matching ~/.*/ in schema 'schema2'
    excluding table names matching ~/_staging/ in schema 'schema1'
    excluding table names matching ~/_staging/ in schema 'schema2'
  cast type "character varying" to "character varying" keep typemod
;

successfully copy all tables from schema1 and schema2, excluding any tables in schema1 or schema2 that match the string *_staging

Further explanation:

src database:

postgres=> \d schema*.*
                     Table "schema1.films_me"
 Column  |         Type          | Collation | Nullable | Default 
---------+-----------------------+-----------+----------+---------
 title   | character varying(20) |           |          | 
 release | date                  |           |          | 
 awards  | character varying(20) |           |          | 

                  Table "schema1.films_staging"
 Column  |         Type          | Collation | Nullable | Default 
---------+-----------------------+-----------+----------+---------
 title   | character varying(20) |           |          | 
 release | date                  |           |          | 
 awards  | character varying(20) |           |          | 

                      Table "schema2.films"
 Column  |         Type          | Collation | Nullable | Default 
---------+-----------------------+-----------+----------+---------
 title   | character varying(20) |           |          | 
 release | date                  |           |          | 
 awards  | character varying(20) |           |          | 

                  Table "schema2.films_staging"
 Column  |         Type          | Collation | Nullable | Default 
---------+-----------------------+-----------+----------+---------
 title   | character varying(20) |           |          | 
 release | date                  |           |          | 
 awards  | character varying(20) |           |          | 

postgres=> 

destination database before running pgloader:

postgres=> \d schema*.*
Did not find any relation named "schema*.*".
postgres=> 

run pgloader my.loader

pob@pob-laptop:~$ ./pgloader my.loader 
2024-01-24T20:25:26.008000Z LOG pgloader version "3.6.2079646"
2024-01-24T20:25:26.008000Z LOG Data errors in '/tmp/pgloader/'
2024-01-24T20:25:26.008000Z LOG Parsing commands from file #P"/home/pob/my.loader"
2024-01-24T20:25:26.208001Z LOG Migrating from #<PGSQL-CONNECTION pgsql://adminuser@dbsrc.abcd.eu-west-2.rds.amazonaws.com:5432/postgres {1007E2A7A3}>
2024-01-24T20:25:26.208001Z LOG Migrating into #<PGSQL-CONNECTION pgsql://adminuser@dbdst.abcd.eu-west-2.rds.amazonaws.com:5432/postgres {1007E2BFB3}>
2024-01-24T20:25:27.924004Z LOG report summary reset
               table name     errors       rows      bytes      total time
-------------------------  ---------  ---------  ---------  --------------
          fetch meta data          0          1                     0.412s
           Create Schemas          0          0                     0.080s
         Create SQL Types          0          0                     0.048s
            Create tables          0          2                     0.148s
           Set Table OIDs          0          1                     0.020s
-------------------------  ---------  ---------  ---------  --------------
"schema1"."films_staging"          0          0                     0.236s
-------------------------  ---------  ---------  ---------  --------------
  COPY Threads Completion          0          4                     0.244s
   Index Build Completion          0          0                     0.000s
          Reset Sequences          0          0                     0.292s
             Primary Keys          0          0                     0.000s
      Create Foreign Keys          0          0                     0.000s
          Create Triggers          0          0                     0.040s
         Install Comments          0          0                     0.000s
-------------------------  ---------  ---------  ---------  --------------
        Total import time          ✓          0                     0.576s
pob@pob-laptop:~$ 

destination database after running pgloader

postgres=> \d schema*.*
                          Table "schema1.films_staging"
 Column  |         Type          | Collation | Nullable |         Default         
---------+-----------------------+-----------+----------+-------------------------
 title   | character varying(20) |           |          | NULL::character varying
 release | date                  |           |          | 
 awards  | character varying(20) |           |          | NULL::character varying

@mathew-maher
Copy link

Facing a similar issue. My set up is slightly different, but the symptom seems to be the same.

I'm running in a docker container, so I didn't do a fresh build locally. My Dockerfile is very basic:
FROM dimitri/pgloader:latest

I have a subset of tables I want to migrate from sqlserver (2016) -> postgres. The source and target DBs are hosted in docker containers as well.

This example only has a single table, as it reproduces the error consistently (my ms.load file can be seen in the log below). I'm setting a fetch limit to avoid hitting out of heap errors.

I run pgloader with -d and -v to see what information i could get:

# uname -a
Linux 01d1a0a03d1c 5.10.102.1-microsoft-standard-WSL2 #1 SMP Wed Mar 2 00:30:59 UTC 2022 x86_64 GNU/Linux
# pgloader --version
pgloader version "3.6.7~devel"
compiled with SBCL 2.1.1.debian

# pgloader -d -v ms.load
pgloader version 3.6.7~devel
compiled with SBCL 2.1.1.debian
sb-impl::*default-external-format* :UTF-8
tmpdir: #P"/tmp/pgloader/"
2024-01-24T23:30:27.000000Z NOTICE Starting pgloader, log system is ready.
2024-01-24T23:30:27.010000Z INFO Starting monitor
2024-01-24T23:30:27.010000Z LOG pgloader version "3.6.7~devel"
2024-01-24T23:30:27.020000Z INFO Parsed command:
load database
    from mssql://<user>@<host>/<src_db>
    into pgsql://<user>@<host>/<dst_db>

including only table names like 'address' in schema 'application'

WITH prefetch rows = 10000
;

2024-01-24T23:30:27.090000Z DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://<user>@<host>/<dst_db> {1007E429A3}>
2024-01-24T23:30:27.090000Z DEBUG SET client_encoding TO 'utf8'
2024-01-24T23:30:27.090000Z DEBUG SET application_name TO 'pgloader'
2024-01-24T23:30:27.100000Z LOG Migrating from #<MSSQL-CONNECTION mssql://<user>@<host>/<src_db> {1007E41EE3}>
2024-01-24T23:30:27.100000Z LOG Migrating into #<PGSQL-CONNECTION pgsql://<user>@<host>/<dst_db> {1007E429A3}>
Max connections reached, increase value of TDS_MAX_CONN
2024-01-24T23:30:27.120000Z SQL MSSQL: sending query: -- params: dbname
--         table-type-name
--         including
--         filter-list-to-where-clause including
--         excluding
--         filter-list-to-where-clause excluding
  select c.TABLE_SCHEMA,
         c.TABLE_NAME,
         c.COLUMN_NAME,
         c.DATA_TYPE,
         CASE
         WHEN c.COLUMN_DEFAULT LIKE '((%' AND c.COLUMN_DEFAULT LIKE '%))' THEN
             CASE
                 WHEN SUBSTRING(c.COLUMN_DEFAULT,3,len(c.COLUMN_DEFAULT)-4) = 'newid()' THEN 'GENERATE_UUID'
                 WHEN SUBSTRING(c.COLUMN_DEFAULT,3,len(c.COLUMN_DEFAULT)-4) LIKE 'convert(%varchar%,getdate(),%)' THEN 'today'
                 WHEN SUBSTRING(c.COLUMN_DEFAULT,3,len(c.COLUMN_DEFAULT)-4) = 'getdate()' THEN 'CURRENT_TIMESTAMP'
                 WHEN SUBSTRING(c.COLUMN_DEFAULT,3,len(c.COLUMN_DEFAULT)-4) LIKE '''%''' THEN SUBSTRING(c.COLUMN_DEFAULT,4,len(c.COLUMN_DEFAULT)-6)
                 ELSE SUBSTRING(c.COLUMN_DEFAULT,3,len(c.COLUMN_DEFAULT)-4)
             END
         WHEN c.COLUMN_DEFAULT LIKE '(%' AND c.COLUMN_DEFAULT LIKE '%)' THEN
             CASE
                 WHEN SUBSTRING(c.COLUMN_DEFAULT,2,len(c.COLUMN_DEFAULT)-2) = 'newid()' THEN 'GENERATE_UUID'
                 WHEN SUBSTRING(c.COLUMN_DEFAULT,2,len(c.COLUMN_DEFAULT)-2) LIKE 'convert(%varchar%,getdate(),%)' THEN 'today'
                 WHEN SUBSTRING(c.COLUMN_DEFAULT,2,len(c.COLUMN_DEFAULT)-2) = 'getdate()' THEN 'CURRENT_TIMESTAMP'
                 WHEN SUBSTRING(c.COLUMN_DEFAULT,2,len(c.COLUMN_DEFAULT)-2) LIKE '''%''' THEN SUBSTRING(c.COLUMN_DEFAULT,3,len(c.COLUMN_DEFAULT)-4)
                 ELSE SUBSTRING(c.COLUMN_DEFAULT,2,len(c.COLUMN_DEFAULT)-2)
             END
         ELSE c.COLUMN_DEFAULT
         END,
         c.IS_NULLABLE,
         COLUMNPROPERTY(object_id(c.TABLE_NAME), c.COLUMN_NAME, 'IsIdentity'),
         c.CHARACTER_MAXIMUM_LENGTH,
         c.NUMERIC_PRECISION,
         c.NUMERIC_PRECISION_RADIX,
         c.NUMERIC_SCALE,
         c.DATETIME_PRECISION,
         c.CHARACTER_SET_NAME,
         c.COLLATION_NAME

    from INFORMATION_SCHEMA.COLUMNS c
         join INFORMATION_SCHEMA.TABLES t
              on c.TABLE_SCHEMA = t.TABLE_SCHEMA
             and c.TABLE_NAME = t.TABLE_NAME

   where     c.TABLE_CATALOG = 'axelerator_hannover'
         and t.TABLE_TYPE = 'BASE TABLE'
         and ((c.table_schema = 'application' and c.table_name LIKE 'address'))
         

order by c.table_schema, c.table_name, c.ordinal_position;
2024-01-24T23:30:27.150000Z SQL MSSQL: sending query: -- params: including
--         filter-list-to-where-clause including
--         excluding
--         filter-list-to-where-clause excluding
    select schema_name(schema_id) as SchemaName,
           o.name as TableName,
           REPLACE(i.name, '.', '_') as IndexName,
           co.[name] as ColumnName,
           i.is_unique,
           i.is_primary_key,
           i.filter_definition

    from sys.indexes i
         join sys.objects o on i.object_id = o.object_id
         join sys.index_columns ic on ic.object_id = i.object_id
             and ic.index_id = i.index_id
         join sys.columns co on co.object_id = i.object_id
             and co.column_id = ic.column_id

   where schema_name(schema_id) not in ('dto', 'sys')
         and ((schema_name(schema_id) = 'application' and o.name LIKE 'address'))
         

order by SchemaName,
         o.[name],
         i.[name],
         ic.is_included_column,
         ic.key_ordinal;
2024-01-24T23:30:27.170000Z SQL MSSQL: sending query: -- params: dbname
--         including
--         filter-list-to-where-clause including
--         excluding
--         filter-list-to-where-clause excluding
   SELECT
           REPLACE(KCU1.CONSTRAINT_NAME, '.', '_') AS 'CONSTRAINT_NAME'
         , KCU1.TABLE_SCHEMA AS 'TABLE_SCHEMA'
         , KCU1.TABLE_NAME AS 'TABLE_NAME'
         , KCU1.COLUMN_NAME AS 'COLUMN_NAME'
         , KCU2.TABLE_SCHEMA AS 'UNIQUE_TABLE_SCHEMA'
         , KCU2.TABLE_NAME AS 'UNIQUE_TABLE_NAME'
         , KCU2.COLUMN_NAME AS 'UNIQUE_COLUMN_NAME'
         , RC.UPDATE_RULE AS 'UPDATE_RULE'
         , RC.DELETE_RULE AS 'DELETE_RULE'

    FROM INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS RC
         JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE KCU1
              ON KCU1.CONSTRAINT_CATALOG = RC.CONSTRAINT_CATALOG
                 AND KCU1.CONSTRAINT_SCHEMA = RC.CONSTRAINT_SCHEMA
                 AND KCU1.CONSTRAINT_NAME = RC.CONSTRAINT_NAME
         JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE KCU2
              ON KCU2.CONSTRAINT_CATALOG = RC.UNIQUE_CONSTRAINT_CATALOG
                 AND KCU2.CONSTRAINT_SCHEMA = RC.UNIQUE_CONSTRAINT_SCHEMA
                 AND KCU2.CONSTRAINT_NAME = RC.UNIQUE_CONSTRAINT_NAME

   WHERE KCU1.ORDINAL_POSITION = KCU2.ORDINAL_POSITION
         AND KCU1.TABLE_CATALOG = 'axelerator_hannover'
         AND KCU1.CONSTRAINT_CATALOG = 'axelerator_hannover'
         AND KCU1.CONSTRAINT_SCHEMA NOT IN ('dto', 'sys')
         AND KCU1.TABLE_SCHEMA NOT IN ('dto', 'sys')
         AND KCU2.TABLE_SCHEMA NOT IN ('dto', 'sys')

         and ((kcu1.table_schema = 'application' and kcu1.table_name LIKE 'address'))
         

ORDER BY KCU1.CONSTRAINT_NAME, KCU1.ORDINAL_POSITION;
2024-01-24T23:30:27.180000Z ERROR MSSQL ERROR: %dbsqlexec fail
2024-01-24T23:30:27.180000Z LOG You might need to review the FreeTDS protocol version in your freetds.conf file, see http://www.freetds.org/userguide/choosingtdsprotocol.htm
2024-01-24T23:30:27.180000Z LOG report summary reset
       table name     errors       read   imported      bytes      total time       read      write
-----------------  ---------  ---------  ---------  ---------  --------------  ---------  ---------
  fetch meta data          0          0          0                     0.000s    
-----------------  ---------  ---------  ---------  ---------  --------------  ---------  ---------
-----------------  ---------  ---------  ---------  ---------  --------------  ---------  ---------
2024-01-24T23:30:27.190000Z INFO Stopping monitor
#

I'm not an expert on sqlserver, but I believe there is an issue in the last query using kcu1 rather than KCU1 as the table alias. Running a simple query (just via SQL Server Management Studio):

select *
from INFORMATION_SCHEMA.KEY_COLUMN_USAGE KCU1
ORDER BY KCU1.CONSTRAINT_NAME
<returns rows>
select *
from INFORMATION_SCHEMA.KEY_COLUMN_USAGE KCU1
WHERE kcu1.table_name LIKE 'address'

Msg 4104, Level 16, State 1, Line 41
The multi-part identifier "kcu1.table_name" could not be bound.

Removing the including only table names like 'address' in schema 'application' from the ms.load file removes this issue and my migration runs successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants