Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dataset: US_Counties10 #28

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

whilei
Copy link

@whilei whilei commented Dec 8, 2023

Thanks for this great lib. I've added US_Counties10, and thought I'd PR in case that's something you'd like to have here too.

du -sh data/US_Counties10.gz 
1.7M    data/US_Counties10.gz

Signed-off-by: ia <isaac.ardis@gmail.com>

add US_Counties10 reference in README

Signed-off-by: ia <isaac.ardis@gmail.com>

fix misnamed func: s/US_Counties/US_Counties10/g

Signed-off-by: ia <isaac.ardis@gmail.com>

README.md: add county to location doc

Signed-off-by: ia <isaac.ardis@gmail.com>
@whilei
Copy link
Author

whilei commented Dec 8, 2023

Reasons you may not want to merge this include that US counties are not exactly well standardized.
For example, Alaska has Boroughs and Municipalities and Census Areas, no counties; Luisiana: parishes, no counties. https://github.com/kjhealy/fips-codes/blob/master/county_fips_master.csv

But, with that said, the feature does in fact add US Counties, so as long as the user is actually expecting Counties -- rather than some ambiguous area smaller than states but larger than cities, conventionally and idiomatically "counties" -- this will return those as they actually are.

A workaround for this could be to group all the possible subdivision types under an assumed alias of County, potentially yielding more conventionally-expected results.

cat data/US_Counties10.geojson | jj -p | grep TYPE | grep -v TYPE_EN | sort | uniq -c
     13         "TYPE": "Borough", 
     11         "TYPE": "Census Area", 
     41         "TYPE": "City", 
      4         "TYPE": "City and Borough", 
   3008         "TYPE": "County", 
      3         "TYPE": "(County Equivalent)", 
      1         "TYPE": "District of Columbia", 
      2         "TYPE": "Municipality", 
     77         "TYPE": "Municipio", 
     64         "TYPE": "Parish", 

@@ -122,6 +122,7 @@ var testdata = []struct {
SubRegion: "Northern America",
Province: "Alaska",
ProvinceCode: "US-AK",
County: "", // unknown
Copy link
Author

@whilei whilei Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#28 (comment)
That's why this is empty. Here, TYPE = Municipality|Borough|Census Area.

@sams96
Copy link
Owner

sams96 commented Dec 8, 2023

Hi, thanks for the PR. I'm not really looking to add more datasets into the lib, but I would be happy to add the counties field to allow you to get that info when using datasets that aren't included. There should probably be a way to get any arbitrary field from the geojson, but I'd be happy to add this for now.

Copy link

codecov bot commented Dec 8, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (9db03ae) 98.58% compared to head (4193ef2) 98.63%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master      #28      +/-   ##
==========================================
+ Coverage   98.58%   98.63%   +0.04%     
==========================================
  Files           2        2              
  Lines         141      146       +5     
==========================================
+ Hits          139      144       +5     
  Misses          1        1              
  Partials        1        1              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@whilei
Copy link
Author

whilei commented Dec 9, 2023

Its fine with me either way, I'm happy to use a fork for what I need it for. I think moving toward arbitrary fields from whatever datasets are provided is a very good idea. It would move the library away from being a provider of data structures (which could become a nightmare, and is the real limiter for adding new datasets), and more towards the business of just handling the geometries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants