Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Approx median fixes #20

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Conversation

sastoudt
Copy link
Contributor

Deals with #17 and #18

  • allows moe inputs, triggers simulation of new moe
  • clears up use of jam values (returned instead of None when median falls in upper or lower bin)
  • plenty of warnings/errors to aid with jam value usage along with tests for these cases

@sastoudt
Copy link
Contributor Author

Added a quick fix for #24

README.rst Outdated
.. code-block:: python

>>> moe_example = [
dict(min=math.nan, max=9999, n=6, moe=1),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to advise leaving first min and the last max null, we need to be totally consistent and do that in every single example. I also think that we should the default built in None type variable rather than nan so that we're hewing as closely as possible to the standard Python library.

README.rst Outdated
@@ -131,25 +131,27 @@ Approximating medians

Estimate a median and approximate the margin of error. Follows the U.S. Census Bureau's official guidelines for estimation. Useful for generating medians for measures like household income and age when aggregating census geographies.

Expects a list of dictionaries that divide the full range of data values into continuous categories. Each dictionary should have three keys:
Expects a list of dictionaries that divide the full range of data values into continuous categories. Each dictionary should have three keys with an optional fourth key for margin of error inputs:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we first describe this input. We need to be emphatically clear that the first min and the last max must be None type objects and briefly explain why.

README.rst Outdated
dict(min=200000, max=math.nan, n=18, moe=10)
]
>>> import numpy
>>> numpy.random.seed(711355)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the seed in example unless we are the point about reproducibility.

Find the value for the dataset you are estimating by referring to `the bureau's reference material <https://www.census.gov/programs-surveys/acs/technical-documentation/pums/documentation.html>`_.

If you have an associated "jam values" for your dataset provided in the `American Community Survey's technical documentation <https://www.documentcloud.org/documents/6165752-2017-SummaryFile-Tech-Doc.html#document/p20/a508561>`_, input the pair as a list to the `jam_values` keyword argument.
Then if the median falls in the first or last bin, the jam value will be returned instead of `None`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a jam values example here that does not use the simulation method.

README.rst Outdated
]
>>> import numpy
>>> numpy.random.seed(711355)
>>> census_data_aggregator.approximate_median(moe_example, design_factor=1, sampling_percentage=5*2.5, simulations=50, jam_values=[2499, 200001])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example probably doesn't need the jam value inputs


.. code-block:: python
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we believe that the moe-based method is superior, we should make it the first and default example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants