Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk ingest failing #110

Open
constantinius opened this issue Apr 29, 2024 · 7 comments
Open

Bulk ingest failing #110

constantinius opened this issue Apr 29, 2024 · 7 comments

Comments

@constantinius
Copy link

My bulk ingestion fails with the following exception:

INFO:     172.18.0.2:39582 - "POST /collections/AMEDS_MainCam/bulk_items HTTP/1.0" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/usr/local/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/fastapi/applications.py", line 1106, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.8/site-packages/stac_fastapi/api/middleware.py", line 75, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/middleware/cors.py", line 83, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/brotli_asgi/__init__.py", line 87, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/usr/local/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
    raise e
  File "/usr/local/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/usr/local/lib/python3.8/site-packages/fastapi/routing.py", line 274, in app
    raw_response = await run_endpoint_function(
  File "/usr/local/lib/python3.8/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File "/usr/local/lib/python3.8/site-packages/stac_fastapi/api/routes.py", line 68, in _endpoint
    await func(request_data, request=request), response_class
  File "/app/./stac_fastapi/pgstac/transactions.py", line 189, in bulk_item_insert
    await dbfunc(conn, "create_items", items_to_insert)
  File "/app/./stac_fastapi/pgstac/db.py", line 100, in dbfunc
    return await conn.fetchval(q, *p)
  File "/usr/local/lib/python3.8/site-packages/asyncpg/connection.py", line 715, in fetchval
    data = await self._execute(query, args, 1, timeout)
  File "/usr/local/lib/python3.8/site-packages/asyncpg/connection.py", line 1794, in _execute
    result, _ = await self.__execute(
  File "/usr/local/lib/python3.8/site-packages/asyncpg/connection.py", line 1892, in __execute
    result, stmt = await self._do_execute(
  File "/usr/local/lib/python3.8/site-packages/asyncpg/connection.py", line 1945, in _do_execute
    result = await executor(stmt, None)
  File "asyncpg/protocol/protocol.pyx", line 207, in bind_execute
asyncpg.exceptions.CheckViolationError: no partition of relation "items" found for row
DETAIL:  Partition key of the failing row contains (collection) = (null).

The collection AMEDS_MainCam does exist and registering items one by one works as well. I use ghcr.io/stac-utils/stac-fastapi-pgstac:2.4.11 and ghcr.io/stac-utils/pgstac:v0.7.10.

The packed Items are as follows:

{
    "items": {
        "item-a": {"id": "item-a", ...},
        "item-b": {"id": "item-b", ...}
        ...
    },
    "method": "insert"

Anything I'm doing wrong?

@gbegkas
Copy link

gbegkas commented Apr 30, 2024

Hello, I had the same issue yesterday and after some trial and error I found out that each item needs to have the key-value pair for collection like this on your example:

{
    "items": {
        "item-a": {"id": "item-a", "collection": "collection-id", ...},
        "item-b": {"id": "item-b", "collection": "collection-id", ...}
        ...
    },
    "method": "insert"

@constantinius
Copy link
Author

Thanks @gbegkas, this actually works!

In my opinion, it should not be necessary to write the collection into the items in the packed json, as the bulk ingest route already is on a collection. Similar to the item creation, where the collection is also not required.

@jonhealy1
Copy link
Collaborator

Collection should be required in an item I think: https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md#item-fields

@constantinius
Copy link
Author

Hm, the way I read it is that it is required when it is part of a collection, otherwise it is not allowed. But adding an Item (I don't think it makes a difference if via normal POST <collection>/items or via POST <collection>/bulk_ingest) is the process of making an item part of a collection. So it should be at least allowed to omit it.

@jonhealy1
Copy link
Collaborator

You're right, if an item is not part of a collection it is ok to not have a collection field but if you're posting items to a collection route then clearly they should have the field specified - even though it should be apparent that they belong to a collection. You could make a pr allowing items posted to a collections route to have the collection field added later maybe? I don't know what people with more stac knowledge would think?

@bitner
Copy link
Collaborator

bitner commented May 2, 2024

In pgstac, a collection is required as collection is used both as a foreign key as well as being used as the first level of partitioning.

@constantinius
Copy link
Author

You're right, if an item is not part of a collection it is ok to not have a collection field but if you're posting items to a collection route then clearly they should have the field specified - even though it should be apparent that they belong to a collection. You could make a pr allowing items posted to a collections route to have the collection field added later maybe? I don't know what people with more stac knowledge would think?

I'll try my luck

In pgstac, a collection is required as collection is used both as a foreign key as well as being used as the first level of partitioning.

Totally agree. I think this should be handled by stac-fastapi during ingestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants