Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Matched" key in context - pgstac #28

Open
jonhealy1 opened this issue Feb 17, 2022 · 10 comments
Open

"Matched" key in context - pgstac #28

jonhealy1 opened this issue Feb 17, 2022 · 10 comments
Labels
documentation Improvements or additions to documentation

Comments

@jonhealy1
Copy link
Collaborator

I've started using pgstac and I've noticed that there is no 'matched' key in context. Is this being deprecated? I find it handy to have. Obviously querying things is quicker if you don't have to get a total count.

@drnextgis
Copy link
Contributor

I faced same issue. It is stated here that:

PGStac will estimate the row count using PostgreSQL Explain which leverages the query planner and will do a full count only when the estimated count is less than 10000.

So in theory it should be fast. @bitner could you please elaborate on why matched is missing?

@jonas-eberle
Copy link

@drnextgis @jonhealy1 The context setting in pgstac is off by default. You can change the default value in the pgstac_settings table to "on", e.g.:

update pgstac.pgstac_settings set value='on' where name='context';

See also information about the pgstac settings: https://github.com/stac-utils/pgstac#pgstac-settings

@drnextgis
Copy link
Contributor

Thanks, it did the trick!

@jonhealy1
Copy link
Collaborator Author

jonhealy1 commented Feb 25, 2022

@jonas-eberle @geospatial-jeff In the pgstac stac-fastapi code in app.py it says that the context extension is enabled: extensions = [ TransactionExtension( client=TransactionsClient(), settings=settings, response_class=ORJSONResponse, ), QueryExtension(), SortExtension(), FieldsExtension(), TokenPaginationExtension(), ContextExtension(), ]

https://github.com/stac-utils/stac-fastapi/blob/master/stac_fastapi/pgstac/stac_fastapi/pgstac/app.py

Specifying the context extension here should enable the context extension and there shouldn't be an extra step to get it to work that is only ever going to confuse new users. I think this is still an issue.

@drnextgis
Copy link
Contributor

I do agree with @jonhealy1. Moreover even if ContextExtension is not enabled "context" object is still presented if search results which is also confusing.

@jonhealy1
Copy link
Collaborator Author

I do agree with @jonhealy1. Moreover even if ContextExtension is not enabled "context" object is still presented if search results which is also confusing.

True

@bitner
Copy link
Collaborator

bitner commented Feb 25, 2022

There are two parts to the context - returned and matched. Matched can be very expensive to calculate on large datasets with complicated queries. PGStac will try to use the PostgreSQL estimates for the count, but these estimates can be wayyyyyy off in some circumstances. I'm not in the office right now, but there are multiple ways to control context (not just on or off) that don't necessarily just fall neatly into "context enabled / disabled". As of now, the stac-fastapi-pgstac side of this has pretty much been ignored, but there are a number of knobs that can be adjusted on the pgstac side of things. When I'm back in the office next week, I can add some more documentation on the pgstac side of things and would be happy for anyone with thoughts (or better yet PRs) to be able to match expectations on the stac-fastapi-pgstac side of things.

@jonhealy1
Copy link
Collaborator Author

jonhealy1 commented Feb 27, 2022

The context extension should be shut off in stac-fastapi if it's not enabled in pgstac by default. If people want to use it for complicated queries in large collections then they should be warned that it will affect performance. If they really need to have an exact match count quickly, they may want to look at a nosql backend. I don't think an estimated count that may not be close to the real count is valuable. Turning on the context extension should turn on all of the context features in pgstac. I'm not sure that having some context features enabled is supported by the spec. I think it's either all or none.

@jonhealy1
Copy link
Collaborator Author

jonhealy1 commented Feb 27, 2022

Maybe I'm wrong about the spec - I don't know - but it is confusing with the way it's set up right now

@duckontheweb
Copy link
Contributor

Turning on the context extension should turn on all of the context features in pgstac. I'm not sure that having some context features enabled is supported by the spec. I think it's either all or none.

The only required field for the Context Object is "returned", so the current implementation is compliant as far as I can tell (@philvarner correct me if I'm wrong here). I think there could be some value to having the "returned" value present even without "matched" if a client wants to know how many Features are in the FeatureCollection without having to count them.

Maybe the best solution here is to document this more clearly in the PgSTAC backend so that developers know what they are getting when they deploy the API.

@gadomski gadomski added the documentation Improvements or additions to documentation label Jan 31, 2023
@gadomski gadomski transferred this issue from stac-utils/stac-fastapi May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

7 participants