Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter using string partial matching #227

Open
bevingtona opened this issue Oct 8, 2020 · 7 comments
Open

Filter using string partial matching #227

bevingtona opened this issue Oct 8, 2020 · 7 comments
Labels
enhancement New feature or request

Comments

@bevingtona
Copy link

bevingtona commented Oct 8, 2020

Problem: I'd like to filter a query but I only know part of the string I am looking for...

bcdata::bcdc_query_geodata("bc-parks-ecological-reserves-and-protected-areas") %>%
  filter(stringr::str_detect(PROTECTED_LANDS_NAME, "SUGARBOWL.*PARK*"))

#> Error in stri_detect_regex(string, pattern, negate = negate, opts_regex = opts(pattern)): object 'PROTECTED_LANDS_NAME' not found

Potential solution:
I could download/collect only the column that I want to filter by, detect the sting that I am looking for and then use that string to filter my query.

my_var <- bcdc_query_geodata("bc-parks-ecological-reserves-and-protected-areas") %>%
  collect() %>% 
  st_drop_geometry() %>% 
  select(PROTECTED_LANDS_NAME) %>% 
  filter(stringr::str_detect(PROTECTED_LANDS_NAME, "SUGARBOWL.*PARK")) %>% 
  pull()

bcdc_query_geodata("bc-parks-ecological-reserves-and-protected-areas") %>%
  filter(PROTECTED_LANDS_NAME == my_var)

Problem with this solution: I need to download the geometries of the data set to get the column that I am interested in, which is essentially just downloading the entire dataset, so there is no processing speed advantage.

My question:
Is there a way to drop the geometry before collecting the data? I suspect that this would be very fast and efficient for filtering large datasets using string detection.

@boshek
Copy link
Collaborator

boshek commented Oct 8, 2020

@webgismd do you know if the WFS can return only the attributes and leave the geometry behind?

@boshek boshek added the enhancement New feature or request label Oct 8, 2020
@ateucher
Copy link
Collaborator

ateucher commented Oct 9, 2020

I also wonder if you can partially match a string CQL?

@bevingtona
Copy link
Author

bevingtona commented Oct 9, 2020

I also wonder if you can partially match a string CQL?

@ateucher like this? solr_query=title: sugar
but no idean about how to integrate this :)
https://docs.datastax.com/en/dse/6.0/cql/cql/cql_using/search_index/queryTerms.html

@boshek
Copy link
Collaborator

boshek commented Oct 9, 2020

So this works: https://openmaps.gov.bc.ca/geo/pub/wfs/?SERVICE=WFS&VERSION=2.0.0&REQUEST=GetFeature&outputFormat=application%2Fjson&typeNames=WHSE_TANTALIS.TA_PARK_ECORES_PA_SVW&SRSNAME=EPSG%3A3005&CQL_FILTER=PROTECTED_LANDS_NAME%20like%20%27%25BEAUMONT%25%27

and dbplyr can translate %like% :

dbplyr::translate_sql(PROTECTED_LANDS_NAME %like% "%BEAUMONT%")
#> <SQL> `PROTECTED_LANDS_NAME` like '%BEAUMONT%'

I think there is a path here.

@ateucher
Copy link
Collaborator

ateucher commented Oct 9, 2020

This also works: https://openmaps.gov.bc.ca/geo/pub/wfs/?SERVICE=WFS&VERSION=2.0.0&REQUEST=GetFeature&outputFormat=application/json&typeNames=WHSE_TANTALIS.TA_PARK_ECORES_PA_SVW&SRSNAME=EPSG:3005&CQL_FILTER=(strMatches(PROTECTED_LANDS_NAME, 'SUGARBOWL.*') = true)

Using strMatches from here: https://docs.geoserver.org/stable/en/user/filter/function_reference.html#string-functions.

But this doesn't work:

bcdata::bcdc_query_geodata("bc-parks-ecological-reserves-and-protected-areas") %>%
  filter(CQL("strMatches(PROTECTED_LANDS_NAME, 'SUGARBOWL.*') = 'true'"))

I think there's an issue sending as a POST vs GET request...

@ateucher
Copy link
Collaborator

ateucher commented Oct 9, 2020

Using strMatches we could make a grepl shim if we can figure out how to issue the query via POST. It might need to be added to the url rather than as a query parameter

ateucher added a commit that referenced this issue Apr 8, 2021
@ateucher ateucher changed the title Collect() without geometry (workarround for filter by string) Filter using string partial matching Apr 21, 2021
@ateucher
Copy link
Collaborator

ateucher commented Apr 21, 2021

I changed the title of this issue since it's not actually possible to get just the attributes of a wfs layer...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants