New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add DOF Property Charges Balance #290
Comments
Getting started |
I've pushed some (very) rough and ready code to the following branch: https://github.com/nycdb/nycdb/tree/dev-291-dataset-dof-property-charge That has been tested (successfully) on a partial load of 600k or so records (1 percent of the total of 62M, still downloading). However not ready for others to test, due to some apparent underlying weirdness in the existing codebase - which I'd like to run by @austensen (or someone else) before going into much detail here just yet. As to the weirdness - has to do the forced CamelCase munging of column names (which apparently has unintended side effects). Should be easy enough to resolve (sometime in the coming days, after the hackathon) |
Hi @wstlabs ! Do you mind clarifying exactly what you mena about the forced CamelCase munging of column names? Alternatively, there's some documentation on the column name munging we do: https://github.com/nycdb/nycdb/blob/main/src/ADDING_NEW_DATASETS.md#-note- (see the bulleted list "Some examples of how column names are transformed:"), I wonder if this would add enough context to answer your questions. |
Basically, the CC munging seems to conflict with the (what would seem to be more important) explicit field declarations in the dataset config file ( At least my assumption was that the config provides the explicit schema. In presenting an explicit mapping of column names to types -- that definitely would seem to be its purpose. But no, it seems that's not the "real" schema that ends up being used -- or perhaps it is, in terms of column types, but not column names. Which are still automunged internally, per the above description. Here's how it plays out in this case: (1) The raw file contains some field names with underscores, e.g. (2) Which apparently overrides the settings in the config file ( (3) So you'd think "Fine, I'll bring the config file in line with the automunged name then, to make everyone happy". But unfortunatelly, no -- it also apparently wants the field names in the CSV header to matched the automunged names as well (meaning I had to edit the CSV, and change underscored names to CamelCase throughout) -- in order to get the file to load. Which is not the way things are meant to be done, I'm assuming. But at least the file (or a 1 percent sample of it) does load, with close to correct column types -- which is a good sign, in that it seems it should be pretty easy to get this dataset integrated (once the above weirdness is resolved). |
Dept of Finance dataset with how much property's owe to the city. Can be helpful in identifying building under financial distress.
Dataset: https://data.cityofnewyork.us/City-Government/DOF-Property-Charges-Balance/scjx-j6np
dataset/table name:
dof_property_charges
Each task can be completed by a different person - comment below to claim a part of it
The text was updated successfully, but these errors were encountered: