Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional methods to base classes to let users support additional sources #77

Open
ghing opened this issue May 18, 2021 · 7 comments

Comments

@ghing
Copy link
Contributor

ghing commented May 18, 2021

This is somewhat related to #2.

I find this project to be extremely useful and a great framework for a task that I have to do often. In my projects, I've found myself using the base classes and concepts from this project when I want to download and process data from other Census Bureau API sources.

However, for non-ACS sources, I find myself entirely reimplementing many of the methods on my geotype downloader classes because the changes in functionality aren't possible by just calling super() and then adding additional logic.

I think adding these methods to BaseGeoTypeDownloader could make adding additional data sources easier, both in this project, and for other users in their own projects:

  • BaseGeoTypeDownloader.get_api_client(): This would be called from the constructor to set sefl.api and allow subclasses to specify a customized subclass of census.Census that supports additional API endpoints.
  • BaseGeoTypeDownloader.get_field_type_map(): This would be similar to BaseGeoTypeDownloader.get_raw_field_map() except it would map from raw field names to types that would be passed to pd.Series.astype(). Like BaseGeoTypeDownloader.get_raw_field_map(), this would be called from BaseGeoTypeDownloader.process() when setting the column types after reading in the raw table. The implementation could check for the existence of a FIELD_TYPES attribute on the table configuration class, and if that doesn't exist, default to the existing logic for ACS tables that checks the field name suffix. Adding the ability to explicitly set type conversions allows supporting non-ACS tables that might have field names that don't have the same suffix convention as ACS tables.
@ghing
Copy link
Contributor Author

ghing commented May 18, 2021

Here are some examples from customized subclasses I've implemented for my own data loading project to support additional sources. They might be useful to understand what I'm talking about in this issue.

@palewire
Copy link
Contributor

palewire commented May 18, 2021

I love the idea of integrating features you've added, but I think we'd probably be best off taking things on a case by case basis, with a clear vision for what new use case the individual change would allow.

Is there a feature addition you would propose for the end user? Is it supporting a data source beyond ACS? Something else?

@ghing
Copy link
Contributor Author

ghing commented May 18, 2021

@palewire to clarify, this is less adding a feature or support for a specific data source in the CLI, and instead making backward-compatible changes to the Python API that would make it easier for users of the Python API to add support for other data sources from the Census Bureau's API in their own projects. These changes may also make it easier to add support for additional sources in this tool (i.e. #2).

I ran into the need for this when writing code to download and process data from the self-response rate endpoint. That support definitely doesn't need to be in this library/the CLI, but it would be great to make it easier to use code and conventions in this package to support consuming data from other Census API endpoints.

The changes I describe above address these two needs (for Python API users, not CLI users):

  • The ability to use a custom subclass of census.Census to add support for additional Census Bureau API endpoints that aren't currently supported by the census package.
  • The ability to properly interpret the field types for tables that don't follow the ACS' field name suffix convention

I'm not sure whether the approaches I've taken in my code are the best way to address these needs, but I wanted to document them in case you all have run into this internally when thinking about how to pull data from the Census API for sources that aren't ACS tables.

@palewire
Copy link
Contributor

palewire commented May 18, 2021

Gotcha. I'm not opposed to such changes, I just want them to be pegged to new features for the user of this library, which I think could bring some focus to the work. That way the edits aren't academic but are integrated with the code here from the start. In other words, I don't want to prematurely optimize.

For instance, if we set the goal of integrating the three and one year samples from the ACS into this library, could adding that feature naturally also include some of the refactoring you propose?

@ghing
Copy link
Contributor Author

ghing commented May 18, 2021

Supporting other ACS releases wouldn't require these changes. Supporting decennial tables, like sf1, would require a way to specify field types on a per source/table basis.

A hook to support a different client class wouldn't be required by either of these additions. That's only needed for supporting API data sources that aren't supported by the census package.

@palewire
Copy link
Contributor

Got it. With the decennial census coming out this year, maybe it's a good time to figure out SF1. Have you integrated it downstream in any of your stuff?

@ghing
Copy link
Contributor Author

ghing commented May 18, 2021

I haven't integrated SF1 yet, but I'm likely going to be using some tables (e.g. P1) soon. I'll update this issue with any relevant findings or bits of example code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants