Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trigger a download using multiple predicates #50

Open
damianooldoni opened this issue Feb 15, 2019 · 19 comments
Open

Trigger a download using multiple predicates #50

damianooldoni opened this issue Feb 15, 2019 · 19 comments

Comments

@damianooldoni
Copy link

I would like to trigger a download based on a query structured as follows (just an example):

basisOfRecords in ['HUMAN_OBSERVATION', 'LITERATURE'] 
AND
country 
AND
year >= 1000
AND
year <= 2019
AND
hasCoordinate = TRUE

If I try something like this:

test_download = occurrences.download(['basisOfRecord = OBSERVATION', 
                            'basisOfRecord = LITERATURE',
                            'basisOfRecord = PRESERVED_SPECIMEN',
                            'basisOfRecord = MATERIAL_SAMPLE',
                            'basisOfRecord = UNKNOWN',
                            'basisOfRecord = HUMAN_OBSERVATION',
                            'country = BE',
                            'year >= 1000',
                            'year <= 2019',
                            'hasCoordinate = TRUE'],
                           pred_type = 'and')

I get a valid but empty occurrence.txt file because observations cannot have multiple values of basisOfRecords. This is clearly a query with multiple levels of predicates involved: an OR within basisOfRecord values and a general AND for all query keys.

Via rgbif R package I can do it easily. Here below an example with taxon keys and countries in vectors where values are comma separated:

rgbif::occ_download(
  paste0("taxonKey = ", paste(taxon_keys, collapse = ",")), 
  paste0("country = ", paste(countries, collapse = ",")),
  paste0("hasCoordinate = TRUE")
)

Unfortunately, I cannot pass multiple values in this way to pygbif. I am quite new to pygbif, so probably I miss something. However, I didn't find any example in documentation tackling such situations.

Python version:
3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 18:50:55) [MSC v.1915 64 bit (AMD64)]

pygbif version:

> print(pygbif.__version__):
0.3.0

Any help is welcome. Thanks.

@sckott
Copy link
Collaborator

sckott commented Feb 15, 2019

thanks for your question @damianooldoni

@stijnvanhoey @peterdesmet you two I think did most of the download methods. Any thoughts on the above?

Here's the JSON body that's sent in that example you gave:

{
  "creator": "<hidden>",
  "notification_address": [
    "<hidden>"
  ],
  "send_notification": "true",
  "created": 2019,
  "predicate": {
    "type": "and",
    "predicates": [
      {
        "type": "equals",
        "key": "BASIS_OF_RECORD",
        "value": "OBSERVATION"
      },
      {
        "type": "equals",
        "key": "BASIS_OF_RECORD",
        "value": "LITERATURE"
      },
      {
        "type": "equals",
        "key": "BASIS_OF_RECORD",
        "value": "PRESERVED_SPECIMEN"
      },
      {
        "type": "equals",
        "key": "BASIS_OF_RECORD",
        "value": "MATERIAL_SAMPLE"
      },
      {
        "type": "equals",
        "key": "BASIS_OF_RECORD",
        "value": "UNKNOWN"
      },
      {
        "type": "equals",
        "key": "BASIS_OF_RECORD",
        "value": "HUMAN_OBSERVATION"
      },
      {
        "type": "equals",
        "key": "COUNTRY",
        "value": "BE"
      },
      {
        "type": "greaterThanOrEquals",
        "key": "YEAR",
        "value": "1000"
      },
      {
        "type": "lessThanOrEquals",
        "key": "YEAR",
        "value": "2019"
      },
      {
        "type": "equals",
        "key": "HAS_COORDINATE",
        "value": "TRUE"
      }
    ]
  }
}

does that look as expected?

@sckott
Copy link
Collaborator

sckott commented Feb 22, 2019

any thoughts @stijnvanhoey @peterdesmet ?

@stijnvanhoey
Copy link
Contributor

stijnvanhoey commented Feb 23, 2019

The current GbifDownload class provides the different building blocks required to handle this case by using the object oriented apporach instead of the occurrences.download() shortcut function:

from pygbif.occurrences.download import GbifDownload

# initiate the download class
gbif_query = GbifDownload('xxxxxxxx', 'xxxxxxxxx') # user name and email

# setup the query
gbif_query.add_predicate('COUNTRY', 'BE', predicate_type='equals')
gbif_query.add_predicate('YEAR', 1000, predicate_type='>=')
gbif_query.add_predicate('YEAR', 2019, predicate_type='<=')
gbif_query.add_predicate('hasCoordinate', TRUE, predicate_type='equals')
# add the multiple values predicate:
gbif_invasive.add_iterative_predicate('basisOfRecord', 
                                     ['LITERATURE', 'OBSERVATION', 'PRESERVED_SPECIMEN', 
                                      'MATERIAL_SAMPLE', 'UNKNOWN', 'HUMAN_OBSERVATION'])

# post request download
gbif_query.post_download('xxxxxxxxx', 'xxxxxx')

So, it is more a matter of missing documentation...

@stijnvanhoey
Copy link
Contributor

stijnvanhoey commented Feb 23, 2019

the gbif_invasive.predicates attribute shows the predicates as setup:

> gbif_invasive.predicates

[{'key': 'COUNTRY', 'type': 'equals', 'value': 'BE'},
 {'key': 'YEAR', 'type': 'greaterThanOrEquals', 'value': 1000},
 {'key': 'YEAR', 'type': 'lessThanOrEquals', 'value': 2019},
 {'key': 'hasCoordinate', 'type': 'equals', 'value': True},
 {'predicates': [
   {'key': 'basisOfRecord', 'type': 'equals', 'value': 'HUMAN_OBSERVATION'},
   {'key': 'basisOfRecord', 'type': 'equals', 'value': 'UNKNOWN'},
   {'key': 'basisOfRecord', 'type': 'equals', 'value': 'MATERIAL_SAMPLE'},
   {'key': 'basisOfRecord', 'type': 'equals', 'value': 'PRESERVED_SPECIMEN'},
   {'key': 'basisOfRecord', 'type': 'equals', 'value': 'OBSERVATION'},
   {'key': 'basisOfRecord', 'type': 'equals', 'value': 'LITERATURE'}],
  'type': 'or'}]

and these are combined with the gbif_invasive.main_pred_type (default and) which is combined into the gbif_invasive.payload:

{'created': 2019,
 'creator': <hidden>,
 'notification_address': <hidden>,
 'send_notification': 'true'
 'predicate': {'predicates': [
   {'key': 'COUNTRY', 'type': 'equals', 'value': 'BE'},
   {'key': 'YEAR', 'type': 'greaterThanOrEquals', 'value': 1000},
   {'key': 'YEAR', 'type': 'lessThanOrEquals', 'value': 2019},
   {'key': 'hasCoordinate', 'type': 'equals', 'value': 'TRUE'},
   {'predicates': [
     {'key': 'basisOfRecord', 'type': 'equals', 'value': 'HUMAN_OBSERVATION'},
     {'key': 'basisOfRecord', 'type': 'equals', 'value': 'UNKNOWN'},
     {'key': 'basisOfRecord', 'type': 'equals', 'value': 'MATERIAL_SAMPLE'},
     {'key': 'basisOfRecord', 'type': 'equals', 'value': 'PRESERVED_SPECIMEN'},
     {'key': 'basisOfRecord', 'type': 'equals', 'value': 'OBSERVATION'},
     {'key': 'basisOfRecord', 'type': 'equals', 'value': 'LITERATURE'}],
    'type': 'or'}],
  'type': 'and'}
}

@damianooldoni can you have a check if this is correct and similar to the rgbif request?

@sckott
Copy link
Collaborator

sckott commented Aug 6, 2019

@damianooldoni any time to take a look at question from @stijnvanhoey ? if not, i'll take a look

@damianooldoni
Copy link
Author

yes, @sckott . Actually It totally slipped my mind.
Indeed, the query posted by @stijnvanhoey is similar to the query sent via rgbif. An example in R here below:

countries <- c("BE", "NL")
basis_of_record <- c("HUMAN_OBSERVATION", "LITERATURE")
year_begin <- 1990
year_end <- 1991
rgbif::occ_download(
  paste0("basisOfRecord = ", paste(basis_of_record, collapse = ",")), 
  paste0("country = ", paste(countries, collapse = ",")),
  paste0("hasCoordinate = TRUE"),
  paste0("year >= ", year_begin),
  paste0("year <= ", year_end)
)

which results in following API query:

{
  "type": "and",
  "predicates": [
    {"type": "or", "predicates": [
        {"type": "equals", "key": "BASIS_OF_RECORD", "value": "HUMAN_OBSERVATION"},
        {"type": "equals", "key": "BASIS_OF_RECORD", "value": "LITERATURE"}
      ]},
    {"type": "or", "predicates": [
        {"type": "equals", "key": "COUNTRY", "value": "BE"},
        {"type": "equals", "key": "COUNTRY", "value": "NL"}
      ]},
    {"type": "equals", "key": "HAS_COORDINATE", "value": "TRUE"},
    {"type": "greaterThanOrEquals", "key": "YEAR", "value": "1990"},
    {"type": "lessThanOrEquals", "key": "YEAR", "value": "1991"}]
}

This has same structure of the query posted by @stijnvanhoey: only the order changes (type-key-value vs key-type-value), which doesn't change anything of course.

I will double check the solution provided by @stijnvanhoey and if it works this issue can be closed.

@stijnvanhoey
Copy link
Contributor

As the result is the same, we should improve the documentation of pygbif to make sure this use case is explained to other users as well. Or we could improve the documentation by providing an explanation of the object oriented way of using pygbif more in general?

@sckott
Copy link
Collaborator

sckott commented Aug 7, 2019

+1 to improving docs/adding examples

@damianooldoni
Copy link
Author

I test is again to be completely sure. Yes, documentation should be improved as well. I can give a try.

@damianooldoni
Copy link
Author

I found that this doesn't work:

gbif_query = GbifDownload(xxxxxxx, xxxxxxxxx) # user name and pwd
gbif_query.add_iterative_predicate('basisOfRecord', ['LITERATURE', 'HUMAN_OBSERVATION'])
gbif_query.add_iterative_predicate('taxonKey', [1898286, 1894840])
gbif_query.add_predicate('hasCoordinate', 'TRUE', predicate_type='equals')
gbif_query.post_download(xxxxxxx, xxxxxxxxx) # user name and pwd

while this works:

gbif_query.add_iterative_predicate('BASIS_OF_RECORD', ['LITERATURE', 'HUMAN_OBSERVATION'])
gbif_query.add_iterative_predicate('TAXON_KEY', [1898286, 1894840])
gbif_query.add_predicate('HAS_COORDINATE', 'TRUE', predicate_type='equals')
gbif_query.post_download(xxxxxxx, xxxxxxxxx) # user name and email

This means that the parameters of the shortcut function occurrences.download() are the typical ones (same as the rgbif's ones) while we have to use the "raw" versions of them if we want to build queries with .add_predicate() and .add_iterative_predicate().

This has to be documented as well or, even better I think, should be changed. Converting keys automatically (as in occurrence.download()) allows the user to not change key style (e.g. hasCoordinate vs HAS_COORDINATE) while writing complex queries.
@stijnvanhoey , @sckott : what do you think about?

@sckott
Copy link
Collaborator

sckott commented Aug 8, 2019

converting for the user makes sense, what do you think @stijnvanhoey ?

@sckott
Copy link
Collaborator

sckott commented Aug 30, 2019

@stijnvanhoey ?

@stijnvanhoey
Copy link
Contributor

I'm sorry, I agree that using the Darwin-core terms make much more sense for the user. I would refactor the input of it before updating the documentation.

@sckott
Copy link
Collaborator

sckott commented Sep 13, 2019

thanks @stijnvanhoey - agree we should refactor. Does one of you have time for this? or should I put it on my to do list?

@stijnvanhoey
Copy link
Contributor

I won't be able to do it the coming weeks, so it would rather be November that I can contribute on this. Currently too busy on remake of pandas documentation ;-)

@sckott
Copy link
Collaborator

sckott commented Sep 16, 2019

ok, thanks @stijnvanhoey - pandas docs sounds fun and impt.

I'll probably take a crack at it, but will make sure you two have a look at it

@damianooldoni
Copy link
Author

Thanks @sckott . Just back from two weeks holidays and I don't see time to do it even. Still, available for review. So, ping me if needed.

@glaroc
Copy link

glaroc commented Mar 2, 2022

+1 for adding this to the docs. I had to search through the issues to find this info.

@CecSve
Copy link
Contributor

CecSve commented Feb 17, 2023

duplicate of #104

CecSve added a commit that referenced this issue Feb 22, 2023
Updates to the download functions in the occurrence module to fix issues with predicates, download formats, and nested queries:
#105
#108
#92
#102
#103
#104
#50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants