Skip to content

Commit

Permalink
publish v0.5.1 (#141)
Browse files Browse the repository at this point in the history
* update package version number as well

* Allow non-binary incidence (#123)

* Allow non-binary incidence

* style

* update tests to pass

* add some progress indication

* tidy up validation script, use histogram for a histogram

* fix render and some typos

* increment version

* deprecate py2.7

* Multiprocess (#130)

* [Bugfix] Allow seed and meta geography to be the same (#139)

* Fixes bug where if the seed geography is the same as the meta_geography, pandas has a small panic attack and the run will fail.

* add cytoolz to the "requirements"

* fix another activitysim change

* Absolute bounds (#136)


* adding upper/lower bounds to weighting use case

* #137, #134, #133, #131

Co-authored-by: Jamie Cook <jamie.cook@veitchlister.com.au>
Co-authored-by: Blake Rosenthal <blake.rosenthal@rsginc.com>
Co-authored-by: Ben Stabler <bstabler@users.noreply.github.com>
Co-authored-by: Leah Flake <leah.flake@rsginc.com>
  • Loading branch information
5 people committed Aug 26, 2021
1 parent b664d22 commit 47ece66
Show file tree
Hide file tree
Showing 12 changed files with 70 additions and 48 deletions.
2 changes: 1 addition & 1 deletion .travis.yml
Expand Up @@ -16,7 +16,7 @@ install:
- conda info -a
- conda create -q -n test-environment python=$TRAVIS_PYTHON_VERSION
- conda activate test-environment
- conda install pytest pytest-cov coveralls pycodestyle
- conda install pytest pytest-cov coveralls pycodestyle cytoolz
- pip install .
- pip freeze

Expand Down
5 changes: 3 additions & 2 deletions docs/application_configuration.rst
Expand Up @@ -320,7 +320,7 @@ These settings control the functionality of the PopulationSim algorithm. The set
| | | The maximum expansion factor may have to be adjusted upwards if the target |br| |
| | | is much greater than the seed number of households. |br| |
+--------------------------------------+------------+---------------------------------------------------------------------------------+
| MAX_BALANCE_ITERATIONS_SIMULTANEOUS | Integer | Number of simultaneous list balancer iterations |
| MAX_BALANCE_ITERATIONS_SIMULTANEOUS | Integer | Number of list balancer iterations. The default may be more than is needed. |
+--------------------------------------+------------+---------------------------------------------------------------------------------+


Expand Down Expand Up @@ -693,7 +693,7 @@ This sections describes the settings that are configured differently for the *re

**Input Data Tables for repop mode**

The repop mode runs over an existing synthetic population and uses the data pipeline (HDF5 file) from the regular run as an input. User should copy the HDF5 file from the regular outputs to the *output* folder of the repop set up. The data input which needs to be specified in this setting is the control data for the subset of geographies to be modified. Input tables for the repop mode can be specified in the same manner as regular mode. However, only one geography can be controlled. In the example below, TAZ controls are specified. The controls specified in TAZ_control_data do not have to be consistent with the controls specified in the data used to control the initial population. Only those geographic units to be repopulated should be specified in the control data (for example, TAZs 314 through 317).
The repop mode runs over an existing synthetic population and uses the data pipeline (HDF5 file) from the regular run as an input. User should copy the HDF5 file from the regular outputs to the *output* folder of the repop set up. The data input which needs to be specified in this setting is the control data for the subset of geographies to be modified. Input tables for the repop mode can be specified in the same manner as regular mode. However, only one geography can be controlled and the geography must be the lowest in "geographies" setting. In the example below, TAZ controls are specified. The controls specified in TAZ_control_data do not have to be consistent with the controls specified in the data used to control the initial population. Only those geographic units to be repopulated should be specified in the control data (for example, TAZs 314 through 317).

::

Expand All @@ -713,6 +713,7 @@ The repop mode runs over an existing synthetic population and uses the data pipe
| Attribute | Description |
+===========================+=============================================================+
| repop_control_file_name | Name of the CSV control specification file for repop mode |
| | Must include total_hh_control field |
+---------------------------+-------------------------------------------------------------+


Expand Down
10 changes: 8 additions & 2 deletions docs/getting_started.rst
Expand Up @@ -12,7 +12,13 @@ This page describes how to install and run PopulationSim with the provided examp
Installation
------------

1. Install `Anaconda 64bit Python 3 <https://www.anaconda.com/distribution/>`__. Anaconda Python is required for PopulationSim.
1. It is recommended that you install and use a *conda* package manager
for your system. One easy way to do so is by using `Anaconda 64bit Python 3 <https://www.anaconda.com/distribution/>`__,
although you should consult the `terms of service <https://www.anaconda.com/terms-of-service>`__
for this product and ensure you qualify (as of summer 2021, businesses and
governments with over 200 employees do not qualify for free usage). If you prefer
a completely free open source *conda* tool, you can download and install the
appropriate version of `Miniforge <https://github.com/conda-forge/miniforge#miniforge3>`__.

2. If you access the internet from behind a firewall, then you will need to configure your proxy server. To do so, create a .condarc file in your Anaconda installation folder (i.e. ``C:\ProgramData\Anaconda3``), such as:

Expand Down Expand Up @@ -62,7 +68,7 @@ ActivitySim
ActivitySim depends + some handy Python installation management tools.

For more information on Anaconda and ActivitySim, see ActivitySim's `getting started
<https://activitysim.github.io/activitysim/gettingstarted.html#anaconda>`__ guide.
<https://activitysim.github.io/activitysim/gettingstarted.html>`__ guide.


Run Examples
Expand Down
15 changes: 0 additions & 15 deletions docs/software.rst
Expand Up @@ -224,18 +224,3 @@ Contribution Guidelines

PopulationSim development follows the same `development guidelines <https://activitysim.github.io/activitysim/development.html>`__ as ActivitySim.


Release Notes
-------------

* v0.3 - first release
* v0.3.1 - allow zones with zero households
* v0.3.2 - fix bug in mult-integerizer with total_hh_parent_control_index
* v0.3.3 - add disgnostic printouts on assert fail in mult_integerizer
* v0.3.4 - add survey weighting use case
* v0.3.5 - add Python 3.5+ support
* v0.4 - transfer to ActivitySim.org
* v0.4.1 - package updates
* v0.4.2 - validation script in Python
* v0.4.3 - allow non-binary incidence
* v0.5 - support for multiprocessing
3 changes: 2 additions & 1 deletion example_survey_weighting/configs/settings.yaml
Expand Up @@ -18,7 +18,8 @@ USE_SIMUL_INTEGERIZER: True
USE_CVXPY: False
max_expansion_factor: 4 # Default is 30
min_expansion_factor: 0.5

absolute_upper_bounds: 20000
absolute_lower_bounds: 1

# Geographic Settings
# ------------------------------------------------------------------
Expand Down
21 changes: 18 additions & 3 deletions populationsim/balancer.py
Expand Up @@ -242,6 +242,7 @@ def np_balancer(
def do_balancing(control_spec,
total_hh_control_col,
max_expansion_factor, min_expansion_factor,
absolute_upper_bound, absolute_lower_bound,
incidence_df, control_totals, initial_weights):

# incidence table should only have control columns
Expand All @@ -262,14 +263,21 @@ def do_balancing(control_spec,

if min_expansion_factor:

# number_of_households in this seed geograpy as specified in seed_controlss
# number_of_households in this seed geograpy as specified in seed_controls
number_of_households = control_totals[total_hh_control_index]

total_weights = initial_weights.sum()
lb_ratio = min_expansion_factor * float(number_of_households) / float(total_weights)

lb_weights = initial_weights * lb_ratio
lb_weights = lb_weights.clip(lower=0)

if absolute_lower_bound:
lb_weights = lb_weights.clip(lower=absolute_lower_bound)
else:
lb_weights = lb_weights.clip(lower=0)

elif absolute_lower_bound:
lb_weights = initial_weights.clip(lower=absolute_lower_bound)

else:
lb_weights = None
Expand All @@ -283,7 +291,14 @@ def do_balancing(control_spec,
ub_ratio = max_expansion_factor * float(number_of_households) / float(total_weights)

ub_weights = initial_weights * ub_ratio
ub_weights = ub_weights.round().clip(lower=1).astype(int)

if absolute_upper_bound:
ub_weights = ub_weights.round().clip(upper=absolute_upper_bound, lower=1).astype(int)
else:
ub_weights = ub_weights.round().clip(lower=1).astype(int)

elif absolute_upper_bound:
ub_weights = ub_weights.round().clip(upper=absolute_upper_bound, lower=1).astype(int)

else:
ub_weights = None
Expand Down
4 changes: 4 additions & 0 deletions populationsim/steps/final_seed_balancing.py
Expand Up @@ -68,6 +68,8 @@ def final_seed_balancing(settings, crosswalk, control_spec, incidence_table):

max_expansion_factor = settings.get('max_expansion_factor', None)
min_expansion_factor = settings.get('min_expansion_factor', None)
absolute_upper_bound = settings.get('absolute_upper_bound', None)
absolute_lower_bound = settings.get('absolute_lower_bound', None)

relaxation_factors = pd.DataFrame(index=seed_controls_df.columns.tolist())

Expand All @@ -86,6 +88,8 @@ def final_seed_balancing(settings, crosswalk, control_spec, incidence_table):
total_hh_control_col=total_hh_control_col,
max_expansion_factor=max_expansion_factor,
min_expansion_factor=min_expansion_factor,
absolute_lower_bound=absolute_lower_bound,
absolute_upper_bound=absolute_upper_bound,
incidence_df=seed_incidence_df,
control_totals=seed_controls_df.loc[seed_id],
initial_weights=seed_incidence_df['sample_weight'])
Expand Down
4 changes: 4 additions & 0 deletions populationsim/steps/initial_seed_balancing.py
Expand Up @@ -65,6 +65,8 @@ def initial_seed_balancing(settings, crosswalk, control_spec, incidence_table):

max_expansion_factor = settings.get('max_expansion_factor', None)
min_expansion_factor = settings.get('min_expansion_factor', None)
absolute_upper_bound = settings.get('absolute_upper_bound', None)
absolute_lower_bound = settings.get('absolute_lower_bound', None)

# run balancer for each seed geography
weight_list = []
Expand All @@ -82,6 +84,8 @@ def initial_seed_balancing(settings, crosswalk, control_spec, incidence_table):
total_hh_control_col=total_hh_control_col,
max_expansion_factor=max_expansion_factor,
min_expansion_factor=min_expansion_factor,
absolute_upper_bound=absolute_upper_bound,
absolute_lower_bound=absolute_lower_bound,
incidence_df=seed_incidence_df,
control_totals=seed_controls_df.loc[seed_id],
initial_weights=seed_incidence_df['sample_weight'])
Expand Down
4 changes: 4 additions & 0 deletions populationsim/steps/repop_balancing.py
Expand Up @@ -60,6 +60,8 @@ def repop_balancing(settings, crosswalk, control_spec, incidence_table):

max_expansion_factor = settings.get('max_expansion_factor', None)
min_expansion_factor = settings.get('min_expansion_factor', None)
absolute_upper_bound = settings.get('absolute_upper_bound', None)
absolute_lower_bound = settings.get('absolute_lower_bound', None)

# run balancer for each low geography
low_weight_list = []
Expand Down Expand Up @@ -101,6 +103,8 @@ def repop_balancing(settings, crosswalk, control_spec, incidence_table):
total_hh_control_col=total_hh_control_col,
max_expansion_factor=max_expansion_factor,
min_expansion_factor=min_expansion_factor,
absolute_upper_bound=absolute_upper_bound,
absolute_lower_bound=absolute_lower_bound,
incidence_df=seed_incidence_df,
control_totals=low_controls_df.loc[low_id],
initial_weights=initial_weights)
Expand Down
10 changes: 5 additions & 5 deletions populationsim/steps/setup_data_structures.py
Expand Up @@ -111,11 +111,11 @@ def add_geography_columns(incidence_table, households_df, crosswalk_df):
# add seed_geography col to incidence table
incidence_table[seed_geography] = households_df[seed_geography]

# add meta column to incidence table
seed_to_meta = \
crosswalk_df[[seed_geography, meta_geography]] \
.groupby(seed_geography, as_index=True).min()[meta_geography]
incidence_table[meta_geography] = incidence_table[seed_geography].map(seed_to_meta)
# add meta column to incidence table (unless it's already there)
if seed_geography != meta_geography:
tmp = crosswalk_df[list({seed_geography, meta_geography})]
seed_to_meta = tmp.groupby(seed_geography, as_index=True).min()[meta_geography]
incidence_table[meta_geography] = incidence_table[seed_geography].map(seed_to_meta)

return incidence_table

Expand Down
38 changes: 20 additions & 18 deletions populationsim/tests/run_mp.py
Expand Up @@ -17,56 +17,58 @@

def setup_dirs():

configs_dir = os.path.join(os.path.dirname(__file__), 'configs')
mp_configs_dir = os.path.join(os.path.dirname(__file__), 'configs_mp')
configs_dir = os.path.join(os.path.dirname(__file__), "configs")
mp_configs_dir = os.path.join(os.path.dirname(__file__), "configs_mp")
inject.add_injectable("configs_dir", [mp_configs_dir, configs_dir])

output_dir = os.path.join(os.path.dirname(__file__), 'output')
output_dir = os.path.join(os.path.dirname(__file__), "output")
inject.add_injectable("output_dir", output_dir)

data_dir = os.path.join(os.path.dirname(__file__), 'data')
data_dir = os.path.join(os.path.dirname(__file__), "data")
inject.add_injectable("data_dir", data_dir)

tracing.config_logger()

tracing.delete_output_files('csv')
tracing.delete_output_files('txt')
tracing.delete_output_files('yaml')
tracing.delete_output_files("csv")
tracing.delete_output_files("txt")
tracing.delete_output_files("yaml")


def regress():

expanded_household_ids = pipeline.get_table('expanded_household_ids')
expanded_household_ids = pipeline.get_table("expanded_household_ids")
assert isinstance(expanded_household_ids, pd.DataFrame)
taz_hh_counts = expanded_household_ids.groupby('TAZ').size()
taz_hh_counts = expanded_household_ids.groupby("TAZ").size()
assert len(taz_hh_counts) == TAZ_COUNT
assert taz_hh_counts.loc[100] == TAZ_100_HH_COUNT

# output_tables action: skip
output_dir = inject.get_injectable('output_dir')
assert not os.path.exists(os.path.join(output_dir, 'households.csv'))
assert os.path.exists(os.path.join(output_dir, 'summary_DISTRICT_1.csv'))
output_dir = inject.get_injectable("output_dir")
assert not os.path.exists(os.path.join(output_dir, "households.csv"))
assert os.path.exists(os.path.join(output_dir, "summary_DISTRICT_1.csv"))


def test_mp_run():

setup_dirs()

# Debugging ----------------------
run_list = mp_tasks.get_run_list()
mp_tasks.print_run_list(run_list)
# --------------------------------

# do this after config.handle_standard_args, as command line args may override injectables
injectables = ['data_dir', 'configs_dir', 'output_dir']
# do this after config.handle_standard_args, as command line args
# may override injectables
injectables = ["data_dir", "configs_dir", "output_dir"]
injectables = {k: inject.get_injectable(k) for k in injectables}

# pipeline.run(models=run_list['models'], resume_after=run_list['resume_after'])
mp_tasks.run_multiprocess(injectables)

mp_tasks.run_multiprocess(run_list, injectables)
pipeline.open_pipeline('_')
pipeline.open_pipeline("_")
regress()
pipeline.close_pipeline()


if __name__ == '__main__':
if __name__ == "__main__":

test_mp_run()
2 changes: 1 addition & 1 deletion setup.py
Expand Up @@ -5,7 +5,7 @@

setup(
name='populationsim',
version='0.5',
version='0.5.1',
description='Population Synthesis',
author='contributing authors',
author_email='ben.stabler@rsginc.com',
Expand Down

0 comments on commit 47ece66

Please sign in to comment.