Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source and Rsocrata column names and order are different #163

Open
finestjava opened this issue Mar 1, 2019 · 4 comments
Open

Source and Rsocrata column names and order are different #163

finestjava opened this issue Mar 1, 2019 · 4 comments

Comments

@finestjava
Copy link

finestjava commented Mar 1, 2019

For years I have been using The Chicago Police Department "Crimes 2001 to Present" data set by direct tsv for excel downloads. 'https://data.cityofchicago.org/resource/6zsd-86xi.csv'

Just started using RSocrata for access. Finding that the output is completely different.
adding completely new columns and changing capitalization on others.

What's up with this.

Thanks for some insight.

@tomschenkjr
Copy link
Contributor

tomschenkjr commented Mar 1, 2019 via email

@geneorama
Copy link
Member

Here's one example for food inspections.

CSV query: https://data.cityofchicago.org/resource/4ijn-s7e5.csv?$where=inspection_date>'2018-12-31T00:00:00
CSV columns: 
[1] "Inspection.ID" "DBA.Name" "AKA.Name" "License.." "Facility.Type" 
[6] "Risk" "Address" "City" "State" "Zip" 
[11] "Inspection.Date" "Inspection.Type" "Results" "Violations" "Latitude" 
[16] "Longitude" "Location" 
JSON query: https://data.cityofchicago.org/resource/4ijn-s7e5.json?$where=inspection_date>'2018-12-31T00:00:00'
JSON columns: 
[1] "zip" "address" "city" "violations" 
[5] "latitude" "inspection_date" "dba_name" "aka_name" 
[9] "inspection_id" "risk" "location.latitude" "location.needs_recoding"
[13] "location.longitude" "facility_type" "state" "inspection_type" 
[17] "results" "license_" "longitude"

The change in column order and names combined makes it difficult to compare the two outputs.

Also @levyj this is the example I mentioned

@geneorama
Copy link
Member

Actually this is a duplicate of #32, although this is better worded / documented. I would prefer to keep this one open because it is more active.

@geneorama
Copy link
Member

geneorama commented Apr 5, 2019

I happened to open an issue about this (with Socrata) earlier this week in the course of doing other work.

Socrata does not support column ordering in the JSON response, but we could implement it using the their views endpoint. We have talked about it, and as I recall there was hesitancy to rely on the views endpoint because it's not documented and there's no guarantee that it will always be there for every data set.

I've edited Socrata's response a bit, but it was essentially this:

If you want to set the row order to match the dataset, the views api does have that info and it's what I would recommend in this case. For example https://data.cityofchicago.org/views/4ijn-s7e5/columns would give column specific information.

If to want to confirm that we do not explicitly maintain column ordering in JSON endpoint. For example, if you look at the SODA 2.1 endpoint for the dataset, https://data.cityofchicago.org/resource/cwig-ma7x.json, you can see in this endpoint orders the columns alphabetically. This will be the future state of all dataset endpoints.

This is using my food inspections example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants