Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of columns not consistent for JSON and CSV from Socrata #184

Open
nicklucius opened this issue Oct 23, 2019 · 2 comments
Open

Number of columns not consistent for JSON and CSV from Socrata #184

nicklucius opened this issue Oct 23, 2019 · 2 comments

Comments

@nicklucius
Copy link
Contributor

The test for this started failing recently.

The section of code:

test_that("Warn instead of fail if X-SODA2-* headers are missing", {
  expect_warning(dfCsv <- read.socrata("https://data.healthcare.gov/resource/enx3-h2qp.csv?$limit=1000"),
                info="https://github.com/Chicago/RSocrata/issues/118")
  expect_warning(dfJson <- read.socrata("https://data.healthcare.gov/resource/enx3-h2qp.json?$limit=1000"),
                info="https://github.com/Chicago/RSocrata/issues/118")
  expect_silent(df <- read.socrata("https://odn.data.socrata.com/resource/pvug-y23y.csv"))
  expect_silent(df <- read.socrata("https://odn.data.socrata.com/resource/pvug-y23y.json"))
  expect_equal("data.frame", class(dfCsv), label="class", info="https://github.com/Chicago/RSocrata/issues/118")
  expect_equal("data.frame", class(dfJson), label="class", info="https://github.com/Chicago/RSocrata/issues/118")
  expect_equal(150, ncol(dfCsv), label="columns", info="https://github.com/Chicago/RSocrata/issues/118")
  expect_equal(140, ncol(dfJson), label="columns", info="https://github.com/Chicago/RSocrata/issues/118")
})

The actual failing test message:

>   expect_equal(150, ncol(dfCsv), label="columns", info="https://github.com/Chicago/RSocrata/issues/118")
Error: columns not equal to ncol(dfCsv).
1/1 mismatches
[1] 150 - 146 == 4
https://github.com/Chicago/RSocrata/issues/118

I thought this might be useful, but it didn't help me:

> setdiff(colnames(dfJson), colnames(dfCsv))
[1] "url"   "url.1" "url.2" "url.3"
> setdiff(colnames(dfCsv), colnames(dfJson))
 [1] "network_url"                                          
 [2] "plan_brochure_url"                                    
 [3] "summary_of_benefits_url"                              
 [4] "drug_formulary_url"                                   
 [5] "adult_dental"                                         
 [6] "premium_scenarios"                                    
 [7] "standard_plan_cost_sharing"                           
 [8] "X_73_percent_actuarial_value_silver_plan_cost_sharing"
 [9] "X_87_percent_actuarial_value_silver_plan_cost_sharing"
[10] "X_94_percent_actuarial_value_silver_plan_cost_sharing"

It looks like the URL columns are different in name only, but the other six columns are missing in the JSON. Not sure if this is related to this issue, or if this is something else?

Originally posted by @geneorama in #118 (comment)

@geneorama
Copy link
Member

I don't understand exactly why we're testing the number of columns for #118.

In fact, I don't understand what was special about these data sets was causing what error. I see the comment from @hrect #118 (comment) that these data sets replicate an error, but I don't know why they were causing the error.

@nicklucius do you know?

@nicklucius nicklucius mentioned this issue Oct 23, 2019
@nicklucius
Copy link
Contributor Author

@geneorama this dataset has so many columns that including the names/types in the html header would drive the header size over their limit. So Socrata omits the names/types from the header in this case. That used to break read.socrata() and throw an error. The development in #118 fixed the break so that it warns the user and then coerces to character.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants