Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that ingest is using sqlite #289

Open
bartwalczak opened this issue Jan 6, 2023 · 6 comments
Open

Ensure that ingest is using sqlite #289

bartwalczak opened this issue Jan 6, 2023 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@bartwalczak
Copy link

Parsing:

con_params = urlparse.urlparse("postgresql://terracotta:pwd@db-data/terracotta")
print(con_params)

Yields:

ParseResult(scheme='postgresql', netloc='terracotta:pwd@db-data', path='/terracotta', params='', query='', fragment='')

No username, no database, etc.

Therefore the terracotta driver initialization code fails with postgres scheme.

@bartwalczak
Copy link
Author

I'm probably wrong in the above, but I still keep getting the 'invalid database path' exception with a seemingly valid postgres string (and the database is available there, I can use psql to connect to it from that container):

terracotta ingest /data/{emission}_{sector}_{year}.tif -o "postgresql://terracotta:pwd@db-data/terracotta"
 [-] Uncaught exception!
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/terracotta/scripts/cli.py", line 52, in entrypoint
    cli(obj={})
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/terracotta/scripts/ingest.py", line 70, in ingest
    driver = get_driver(output_file)
  File "/usr/local/lib/python3.10/site-packages/terracotta/drivers/__init__.py", line 106, in get_driver
    meta_store=DriverClass(url_or_path),
  File "/usr/local/lib/python3.10/site-packages/terracotta/drivers/postgresql_meta_store.py", line 54, in __init__
    raise ValueError('invalid database path')
ValueError: invalid database path

@dionhaefner
Copy link
Collaborator

The relevant code path is this:

if '/' in self.url.database.strip('/'):
    raise ValueError('invalid database path')

I.e., you are expected to only provide a database name with no additional path components (we really need to improve this error message).

So this should work:

terracotta ingest /data/{emission}_{sector}_{year}.tif -o "postgresql://terracotta:pwd@db-data"

@bartwalczak
Copy link
Author

bartwalczak commented Jan 6, 2023

But then there is no database name in the url?

And shouldn't the slash be trimmed anyway by this database=con_params.path[1:], # remove leading '/' from urlparse:

        url = URL.create(
            drivername=f'{cls.SQL_DIALECT}+{cls.SQL_DRIVER}',
            username=con_params.username or settings.SQL_USER,
            password=con_params.password or settings.SQL_PASSWORD,
            host=con_params.hostname,
            port=con_params.port,
            database=con_params.path[1:],  # remove leading '/' from urlparse
            query=dict(urlparse.parse_qsl(con_params.query))
        )

@bartwalczak
Copy link
Author

Yes, database name must be present:

#0 1.360 Traceback (most recent call last):
#0 1.360   File "/scripts/create.py", line 18, in <module>
#0 1.360     driver = terracotta.get_driver(db_name(), "postgresql")
#0 1.360   File "/usr/local/lib/python3.10/site-packages/terracotta/drivers/__init__.py", line 106, in get_driver
#0 1.360     meta_store=DriverClass(url_or_path),
#0 1.360   File "/usr/local/lib/python3.10/site-packages/terracotta/drivers/postgresql_meta_store.py", line 52, in __init__
#0 1.360     raise ValueError('database must be specified in PostgreSQL path')
#0 1.360 ValueError: database must be specified in PostgreSQL path

@dionhaefner
Copy link
Collaborator

Okay, I get it now. Looks like we both got confused.

The thing is that terracotta ingest only supports SQLite databases and therefore expects a path on your local system, not a database URL. So the URL you pass is interpreted as a file path and gets normalized to postgresql:/terracotta:pwd@db-data.com/terracotta, which then triggers this exception.

The solution is to use the Python API for ingestion (see our docs).

This restriction should clearly be documented better, apologies.

@bartwalczak
Copy link
Author

Thanks. I am going to close this, but I need to log two other issues (I will create them separately with more details):

  • postgres driver ignores supplied database name
  • postgres driver.create fails when database exists

Thanks a lot for your prompt responses and all your work.

@dionhaefner dionhaefner changed the title urllib.parse does not properly parse postgresql scheme Ensure that ingest is using sqlite Jan 6, 2023
@dionhaefner dionhaefner reopened this Jan 6, 2023
@dionhaefner dionhaefner added the enhancement New feature or request label Aug 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants