-
Notifications
You must be signed in to change notification settings - Fork 0
Query string
- Introduction
- Global full text search
- Field-specific keyword search
- Spatial search
- Combining operators
- Further reading
This document explains in more detail the nature and usage of the query string, the element of the query object that defines the query terms. This is valid for both the search and download API methods.
A query string (also called search query) is just a string that contains at most 2000 Unicode characters. There are several ways in which you can refine a search to find exactly what you are looking for.
This is the simplest search option. It provides a basic keyword search that looks for matching text anywhere in a record. The following search for "noturus placidus" is an example of a global full text search, it searches for and retrieves records with both terms "Noturus" and "placidus" in any field:
{"q": "noturus placidus"}Search terms are case insensitive in terms of the content they match (so using "noturus placidus" will retrieve the same results as "Noturus placidus" of "NOTURUS PLACIDUS"). As a slightly more complex example, to search for all records that contain "mvz" (the abbreviation for the Museum of Vertebrate Zoology), "gymnogyps" (the genus of the rare California condor), and "california" anywhere in the record, you could use this query object:
{"q": "mvz gymnogyps california"}Looking for quoted content (like an exact set of terms) or punctuated values (like a URN) is a little tricky. You have to enclose the string you are looking for in escaped quotes (\" at the beginning and end of the term). For example, the following search object looks for any records that contain the exact string urn:occurrence:Arctos:CUMV:Amph:14908:2243803
{"q":"\"urn:occurrence:Arctos:CUMV:Amph:14908:2243803\""}while this one looks for records with "postcranial skeleton" exactly as they are shown here:
{"q":"\"postcranial skeleton\""}You can also limit keyword searches to match only specific values of Darwin Core terms. To do so, provide the name of the Darwin Core term immediately before the search text, in lower-case and separated by a colon (":"). For example, the previous query to retrieve records of the Noturus placidus would retrieve those records with these terms in the scientific name field, but also in the comment fields, or any other field. So, if we wanted to get only those records with genus Noturus and specific epithet placidus, we could use this query:
{"q": "genus:noturus specificepithet:placidus"}Or, suppose we already know the globally unique identifier for an occurrence record (iptrecordid), we could use this query:
{"q": "iptrecordid:7108667e-1483-4d04-b204-6a44a73a5219"}The following Darwin Core terms are indexed and available for searching:
- institutioncode
- collectioncode
- catalognumber
- dctype (dcterms:type)
- license (dcterms:license)
- iptlicense (eml:intellectualRights)
- haslicense (dcterms:license or eml:intellectualRights has a license designated) {'0','1'}
- basisofrecord {PreservedSpecimen, FossilSpecimen, MaterialSample, Occurrence, MachineObservation, HumanObservation}
- isfossil (dwc:basisOfRecord is FossilSpecimen or collection is a paleo collection) {'0','1'}
- hasmedia (has dwc:associatedMedia) {'0','1'}
- iptrecordid (same as dwc:occurrenceID)
- recordedby
- recordnumber
- fieldnumber
- establishmentmeans
- wascaptive (dwc:establishmentMeans or occurrenceRemarks suggests it was captive) {'0','1'}
- wasinvasive (was the organism recorded to be invasive where and when it occurred) {'0','1'}
- sex (standardized sex from original sex field or extracted from elsewhere in the record)
- lifestage (lifeStage from original sex field or extracted from elsewhere in the record)
- preparations
- hastissue (has dwc:preparation that suggests tissue is available) {'0','1'}
- reproductivecondition
- eventdate
- year
- month
- day
- startdayofyear
- enddayofyear
- continent
- country
- stateprovince
- county
- municipality
- island
- islandgroup
- waterbody
- locality
- geodeticdatum
- georeferencedby
- georeferenceverificationstatus
- location (a Google GeoField of the dwc:decimalLatitude, dwc:decimalLongitude)
- mappable (has valid dwc:decimalLatitude, dwc:decimalLongitude) {'0','1'}
- bed
- formation
- group
- member
- typestatus
- hastypestatus (dwc:typeStatus is populated) {'0','1'}
- kingdom
- phylum
- class
- order
- family
- genus
- specificepithet
- infraspecificepithet
- scientificname
- vernacularname
- lengthinmm (length measurement extracted from the record) {number}
- massing (mass measurement extracted from the record) {number}
- hasmass (was a value for mass was extracted?) {'0','1'}
- haslength (was a value for length was extracted?) {'0','1'}
- haslifestage (does the record have life stage) {'0','1'}
- hassex (does the record have sex) {'0','1'}
-
gbifdatasetid (GBIF identifier for the data set)
-
gbifpublisherid (GBIF identifier for the data publishing organization)
-
lastindexed (date the record was most recently indexed into VertNet) {'YYYY-MM-DD'}
-
networks {MaNIS, ORNIS, HerpNET, FishNet, VertNet, Arctos, Paleo}
-
migrator (the version of the migrator used to process the data set) {'YYYY'-'MM'-'DD'}
-
orgcountry (the country where the organization is located)
-
orgstateprovince (the first-level administrative unit where the organization is located)
-
rank (a higher number means the record is more complete with respect to georeferences, scientific names, and event dates) (see rec_rank() in https://github.com/VertNet/post-harvest-processor/blob/master/lib/vn_utils.py) {1-12}
-
vntype {specimen, observation}
-
hashid (a value to distribute records in 10k bins) {0-9998}
-
verbatim_record (the whole record)
NOTE: Because year is a number field, it can be searched using less than/greater than comparison operators ("<", "<=", ">", ">=") in addition to the colon (which is equivalent to "=").
The API allows to search records within a specified distance (in meters) around a given spatial point, represented by a pair of coordinates. This is done using the distance operator, a function that returns the distance in meters between two points, passed as arguments. One of the points should be the location field of the record and the other, the point we want to use as center. Then, we just need to state that we want the distance between these two to be less than a certain value.
Example: search for all records within 2 kilometers of the point 33.529, -105.694:
{"q":"distance(location,geopoint(33.529,-105.694))<2000"}This will first build the geopoint spatial feature from the given coordinates, then calculate the distance between that geopoint and the location field of the records and return only those that match distance<2000.
Query string terms can be combined by using the boolean operators AND, OR, and NOT. If used, they must be written in upper case. NOT must always appear before the value it modifies, while AND and OR should be used between values. If multiple search keywords are provided but no Boolean operators are specified, AND is used by default.
Here are some examples:
Search for records with all three terms "mvz", "gymnogyps" and "california"
{"q": "mvz AND gymnogyps AND california"}Search for records with the term "Noturus" but not "placidus"
{"q": "noturus AND NOT placidus"}Search for records from years 1990 or 1991
{"q": "year:1990 OR year:1991"}Search for georeferenced, 20th-21st century records of the black-footed ferret from either Colorado or Kansas (note the use of parentheses to group together the two possible values for the "stateprovince" field):
{"q":"genus:Mustela specificepithet:nigripes stateprovince:(colorado OR kansas) year>=1900 mappable:1"}If you would like to learn more about query strings, you will want to read the official documentation from Google.