Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Socrata's $query syntax #154

Open
danielOKeefe opened this issue Apr 20, 2018 · 11 comments
Open

Issue with Socrata's $query syntax #154

danielOKeefe opened this issue Apr 20, 2018 · 11 comments
Labels

Comments

@danielOKeefe
Copy link

Some valid Socrata urls do not work with RSocrata, specifically:

'https://data.cityofchicago.org/resource/ktn3-fjzw.csv?$query=select log_no',

Steps to replicate:

read.socrata('https://data.cityofchicago.org/resource/ktn3-fjzw.csv?$query=select log_no',
app_token = '${YOUR_APP_TOKEN}')

danielOKeefe pushed a commit to danielOKeefe/RSocrata that referenced this issue Apr 20, 2018
@benjaminrobinson
Copy link

benjaminrobinson commented May 18, 2018

I've had some success creating my own personal functions for interacting with the Socrata API using the httr and dplyr packages...Super helpful and quick way to access the data. Here's an example:

Contains this function to do multiple gsubs at once:

mgsub <- function (pattern, replacement, x, ...)  {
    if (length(pattern) != length(replacement)) {
        stop("pattern and replacement do not have the same length.")
    }
    result <- x
    for (i in 1:length(pattern)) {
        result <- gsub(pattern[i], replacement[i], result, ...)
    }
    result
}

And this is an example of a function to access data from a specific dataset. Not a generalizable function but I could see if you knew the 4x4 how it could be. Uses httr and dplyr mainly:

getSLBasicInfo471 <- function (limit = 999999999, select = "", where = "", order = "", 
    group = "", having = "", offset = "", q = "", query = "", 
    bom = "", output = "df", nameType = "")  {
    webstring <- paste0("https://opendata.usac.org/resource/7nws-3i6r.csv?")
    if (limit != 999999999) {
        webstring <- paste0(webstring, "&$limit=", limit)
    }
    else {
        webstring <- paste0(webstring, "&$limit=", 999999999)
    }
    if (any(select != "")) {
        webstring <- paste0(webstring, "&$select=", select)
    }
    if (where != "") {
        webstring <- paste0(webstring, "&$where=", where)
    }
    if (order != "") {
        webstring <- paste0(webstring, "&$order=", order)
    }
    if (group != "") {
        webstring <- paste0(webstring, "&$group=", group)
    }
    if (having != "") {
        webstring <- paste0(webstring, "&$having=", having)
    }
    if (offset != "") {
        webstring <- paste0(webstring, "&$offset=", offset)
    }
    if (q != "") {
        webstring <- paste0(webstring, "&$q=", q)
    }
    if (query != "") {
        webstring <- paste0("https://opendata.usac.org/resource/7nws-3i6r.csv?$query=", 
            query)
    }
    if (bom != "") {
        webstring <- paste0(webstring, "&$$bom=", bom)
    }
    webstring <- mgsub(c(" ", "'"), c("%20", "%27"), webstring)
    if (output == "df") {
        usac <- suppressMessages(webstring %>% GET() %>% content %>% 
            as.data.frame)
        }
    return(usac)
    if (output == "url") {
        return(webstring)
    }
}

Is this helpful?

@sunshine52
Copy link

A clue? I am getting a version of this error, when using an otherwise valid $query in RSocrata. It seems to add an extraneous $order=:id" clause at the end of the statement. I wonder if this causing the errors with $query

Steps to duplicate:

Here is query that runs in my browser window but fails in RSocrata:
https://data.cityofnewyork.us/resource/gszd-efwt.csv?$query=select * where violation_location_house='635' and violation_location_street_name='WEST 42 STREET' order by violation_date DESC

q <- "https://data.cityofnewyork.us/resource/gszd-efwt.csv?$query=select * where violation_location_house='635' and violation_location_street_name='WEST 42 STREET' order by violation_date DESC"
df <- read.socrata(q )

Here is Socrata error message

2018-12-18 14:31:26.962 getResponse: Error in httr GET: 400 https://data.cityofnewyork.us/resource/gszd-efwt.csv?%24query=select%20%2A%20where%20violation_location_house%3D%27635%27%20and%20violation_location_street_name%3D%27WEST%2042%20STREET%27%20order%20by%20violation_date%20DESC&$order=:id <---------------
Error in getResponse(validUrl, email, password) : Bad Request (HTTP 400).

When I put the error message url into the browser I get the following error
message | "If $query is used, all options - [$order] should not be specified in $query."
When I delete the &$order clause in browser, query runs fine.

--
 

@benjaminrobinson
Copy link

benjaminrobinson commented Dec 18, 2018

According to the SoQL documentation:
The $query parameter allows you to combine multiple SoQL clauses together into a single parameter, for convenience. Similar to SQL, clauses must be specified in a specific order. It also identifies the SoQL $query as: "A full SoQL query string, all as one parameter."

From my understanding of the query option, its basically a combination of all the other pieces of the SODA so you would have to put the order statement into your query. So instead of saying $query = 'SELECT * WHERE city = 'New York'' for your query and then having $order=state you would have $query=SELECT * WHERE city = 'New York' ORDER BY state' as one complete query. Hope that make sense!

Don't know why your RSocrata query isn't working though...I really just recommend making a flavor of the custom function I have above...

@sunshine52
Copy link

sunshine52 commented Dec 18, 2018 via email

@sunshine52
Copy link

The problem seems to be in the Rsocrata read.socrata function.
It checks for an "$order" parameter and if it doesn't exist and there is no count query, it adds 'order=:id'. This causes the "$query" queries to fail. There may need to be an if clause or an else if clause that finds a "$query statement and leaves that url untouched.

This is 9 lines into read.socrata.
if(!orderTest & !countTest) # sort by Socrata unique identifier
validUrl <- paste(validUrl, if(is.null(parsedUrl$query)) {'?'} else {"&"}, '$order=:id', sep='')

@nicklucius
Copy link
Contributor

nicklucius commented Jan 25, 2019

Hi @sunshine52, thank you for the details!

The dev branch has a PR that should address this issue. Are you able to check if that version of the package resolves the issue? It will be included with our next CRAN release.

@sunshine52
Copy link

sunshine52 commented Jan 25, 2019 via email

@nicklucius
Copy link
Contributor

Good to hear, @sunshine52. When we move the code to the master branch, this issue will close out.

@Ben-Cox
Copy link

Ben-Cox commented Dec 21, 2019

This still seems to be an issue, I've tried the 'nightly' dev version and still get the bad request error in R:
read.socrata("https://data.wa.gov/resource/auvb-4rvk.json?$query=SELECT tagcode WHERE returnyear=2019 AND locationname='1A (BUOY10 - BRIDGE)'")

However this works in my web browser:

https://data.wa.gov/resource/auvb-4rvk.json?$query=SELECT tagcode WHERE returnyear=2019 AND locationname='1A (BUOY10 - BRIDGE)'

The read.socrata function is still pasting &$order=:id to the end of the query string.

@sunshine52
Copy link

sunshine52 commented Dec 21, 2019 via email

@Ben-Cox
Copy link

Ben-Cox commented Dec 23, 2019

Well its working on my machine now too, and I didn't change anything. I'll chock it up to a Christmas miracle. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants