Develop (#132)

* deprecate py2.7 * Multiprocess (#130)
ActivitySim · Apr 23, 2021 · b664d22 · b664d22
1 parent e990822
commit b664d22
Show file tree

Hide file tree

Showing 52 changed files with 622 additions and 289 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,6 +1,7 @@
 sandbox/
 regress/
 example_test_no_integerizing/
+example_mtc/
 .idea
 .ipynb_checkpoints
 

diff --git a/.travis.yml b/.travis.yml
@@ -3,24 +3,27 @@ language: python
 sudo: false
 
 python:
-- '2.7'
-- '3.6'
 - '3.7'
+- '3.8'
 
 install:
-- wget http://repo.continuum.io/miniconda/Miniconda-3.7.0-Linux-x86_64.sh -O miniconda.sh
+- wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
 - bash miniconda.sh -b -p $HOME/miniconda
-- export PATH="$HOME/miniconda/bin:$PATH"
+- source "$HOME/miniconda/etc/profile.d/conda.sh"
 - hash -r
 - conda config --set always_yes yes --set changeps1 no
 - conda update -q conda
-- conda create -q -n test-environment python=$TRAVIS_PYTHON_VERSION future
-- source activate test-environment
+- conda info -a
+- conda create -q -n test-environment python=$TRAVIS_PYTHON_VERSION
+- conda activate test-environment
 - conda install pytest pytest-cov coveralls pycodestyle
 - pip install .
+- pip freeze
+
 script:
 - pycodestyle populationsim
 - py.test --cov populationsim --cov-report term-missing
+
 after_success:
 - coveralls
 # Build docs

diff --git a/LICENSE.txt b/LICENSE.txt
@@ -1,3 +1,5 @@
+BSD 3-Clause License
+
 PopulationSim
 Contributions Copyright (C) by the contributing authors
 

diff --git a/MANIFEST.in b/MANIFEST.in
@@ -1,5 +1,4 @@
 include ez_setup.py
-include README.rst
 graft example_calm
 graft example_calm_repop
 graft example_survey_weighting

diff --git a/docs/application_configuration.rst b/docs/application_configuration.rst
diff --git a/docs/conf.py b/docs/conf.py
@@ -20,9 +20,9 @@
 # -- Get Package Version --------------------------------------------------
 with open("../setup.py") as file:
     lines = file.readlines()
-    for l in lines:
-        if "version" in l:
-            VERSION = l.replace("version='", "").replace("',", "").replace(" ", "")
+    for line in lines:
+        if "version" in line:
+            VERSION = line.replace("version='", "").replace("',", "").replace(" ", "")
             print("package version: " + VERSION)
 
 # If extensions (or modules to document with autodoc) are in another directory,

diff --git a/docs/getting_started.rst b/docs/getting_started.rst
@@ -27,34 +27,35 @@ Installation
 
 ::
 
-  conda create -n popsim python=3.7
+  conda create -n popsim python=3.8 
 
-  #Windows
+  # Windows
   activate popsim
 
-  #Mac
+  # Mac
   conda activate popsim
 
 4. Get and install the PopulationSim package on the activated conda Python environment:
 
 ::
 
+  # best to use the conda version of pytables for consistency with activitysim
+  conda install pytables
+
   pip install populationsim
 
 
-.. _anaconda_notes :
+.. _activitysim :
 
-Python 2 or 3?
-~~~~~~~~~~~~~~~
+ActivitySim
+~~~~~~~~~~~
 
 .. note::
 
-  PopulationSim is a 64bit Python 2 or 3 library that uses a number of packages from the
+  PopulationSim is a 64bit Python 3 library that uses a number of packages from the
   scientific Python ecosystem, most notably `pandas <http://pandas.pydata.org>`__
-  and `numpy <http://numpy.org>`__. It relies heavily on the
-  `ActivitySim <https://activitysim.github.io>`__ package. Both ActivitySim and PopulationSim
-  currently support Python 2, but Python 2 will be `retired <https://pythonclock.org/>`__ at the
-  end of 2019 so Python 3 is recommended.
+  and `numpy <http://numpy.org>`__. It also relies heavily on the
+  `ActivitySim <https://activitysim.github.io>`__ package.
 
   The recommended way to get your own scientific Python installation is to
   install 64 bit Anaconda, which contains many of the libraries upon which
@@ -67,7 +68,17 @@ Python 2 or 3?
 Run Examples
 ------------
 
-There are three examples for running PopulationSim, two created using data from the Corvallis-Albany-Lebanon Modeling (CALM) region in Oregon and the other using data from the Metro Vancouver region in British Columbia. The `example_calm`_ set-up runs PopulationSim in base mode, where a synthetic population is created for the entire modeling region. This takes approximately 12 minutes on a laptop with an Intel i7-4800MQ CPU @ 2.70GHz and 16 GB of RAM. The `example_calm_repop`_ set-up runs PopulationSim in the *repop* mode, which updates the synthetic population for a small part of the region. The `example_survey_weighting`_ set-up runs PopulationSim for the case of developing final weights for a household travel survey. More information on the configuration of PopulationSim can be found in the **Application & Configuration** section.
+There are four examples for running PopulationSim, three created using data from the 
+Corvallis-Albany-Lebanon Modeling (CALM) region in Oregon and the other using data from 
+the Metro Vancouver region in British Columbia. 
+
+1. The `example_calm`_ set-up runs PopulationSim,  where a synthetic population is created single-processed for the entire modeling region. 
+
+2. The `example_calm_mp`_ set-up runs PopulationSim `multi-processed <http://docs.python.org/3/library/multiprocessing.html>`_, where a synthetic population is created for the entire modeling region by simultaneously balancing results using multiple processors on your computer, thereby reducing runtime.
+
+3. The `example_calm_repop`_ set-up runs PopulationSim in the *repop* mode, which updates the synthetic population for a small part of the region. 
+
+4. The `example_survey_weighting`_ set-up runs PopulationSim for the case of developing final weights for a household travel survey. More information on the configuration of PopulationSim can be found in the **Application & Configuration** section.
 
 Example_calm
 ~~~~~~~~~~~~
@@ -84,6 +95,22 @@ Follow the steps below to run **example_calm** set up:
 
   * Review the outputs in the *output* folder
 
+Example_calm_mp
+~~~~~~~~~~~~~~~
+
+Follow the steps below to run **example_calm_mp** multiprocessed set up:
+
+  * Open a command prompt in the example_calm folder
+  * In ``configs_mp\setting.yaml``, set ``num_processes: 2`` to a reasonable number of processors for your machine
+  * Run the following commands:
+
+  ::
+
+   activate popsim
+   python run_populationsim.py -c configs_mp -c configs
+
+  * Review the outputs in the *output* folder
+
 Example_calm_repop
 ~~~~~~~~~~~~~~~~~~
 

diff --git a/docs/index.rst b/docs/index.rst
@@ -5,64 +5,75 @@
 Introduction
 =============
 
-PopulationSim is an open platform for population synthesis and survey weighting.  It emerged from Oregon DOT's desire to 
-build a shared, open, platform that could be easily adapted for statewide, regional, and urban 
-transportation planning needs.
+PopulationSim is an open platform for population synthesis and survey weighting.  It emerged from
+`Oregon DOT <https://www.oregon.gov/odot>`_'s desire to build a shared, open, platform that could 
+be easily adapted for statewide, regional, and urban transportation planning needs.
 
 What is population synthesis?
 -----------------------------
-Activity Based Models (ABMs) operate in a micro-simulation framework , wherein the travel choices of person and household decision-making agents are predicted by applying Monte Carlo methods to behavioral models. This requires a data set of households and persons representing the entire population in the modeling region. Population synthesis refers to the process used to create this data.
-
-The required inputs to population synthesis are a population sample and marginal distributions. The population 
-sample is commonly referred to as the *seed or reference sample* and the marginal distributions are referred to 
-as *controls or targets*. **The process of expanding the seed sample to match the marginal distribution 
-is termed population synthesis.** The software tool which implements this population synthesis process 
+Activity based travel demand models such as `ActivitySim <http://www.activitysim.org>`_ operate at an individual
+level, wherein the travel choices of person and household decision-making agents are predicted by applying 
+Monte Carlo methods to behavioral models. This requires a data set of households and persons representing 
+the entire population in the modeling region. Population synthesis refers to the process used to create this data.
+
+The required inputs to population synthesis are a population sample and marginal distributions (or control totals). 
+The population sample is commonly referred to as the *seed or reference sample* and the marginal distributions are 
+commonly referred to as *controls or targets*. **The process of expanding the seed sample to match the marginal 
+distribution is termed population synthesis.** The software tool which implements this population synthesis process 
 is termed as a **Population Synthesizer**.
 
 What does a Population Synthesizer produce?
 -------------------------------------------
 The objective of a population synthesizer is to generate a synthetic population for 
-a modeling region. The main outputs from a population synthesizer include lists of persons and households 
-representing the entire population of the modeling region. These databases include household and person-level 
-attributes of interest. Examples of attributes at the household level include household income, household size, housing type, and number of vehicles. Examples of person attributes include  
+a modeling region. The main outputs from a population synthesizer include tables of persons and households 
+representing the entire population of the modeling region. These tables also include household and person-level 
+attributes of interest. Examples of attributes at the household level include household income, household size, housing 
+type, and number of vehicles. Examples of person attributes include  
 age, gender, work\school status, and occupation. Depending on the use case, a population synthesizer may also 
 produce multi-way distribution of demographic variables at different geographies to be used as an input 
-to aggregate travel models. In the case of PopulationSim specifically, an additional option is also included to 
-modify an existing regional synthetic population for a smaller geographical area. In this case, the outputs are a modified list of persons and households.
+to aggregate (four-step) travel models. In the case of PopulationSim specifically, an additional option is also included to 
+modify an existing regional synthetic population for a smaller geographical area. In this case, the outputs are a modified 
+set of persons and households.
 
 How does a population synthesizer work?
 ---------------------------------------
 The main inputs to a population synthesizer are disaggregate population samples and marginal control
-distributions. In the United States, the disaggregate population sample is typically obtained from the Census Public Use Microdata Sample (PUMS), but other sources, such as a household travel survey, can also be used. The seed sample should 
-include demographic variables corresponding to each marginal control termed as *controlled variables* (e.g., 
-household size, household income, etc.). The seed sample could also include other variables of interest but not 
-necessarily controlled via marginal controls. These are termed as *uncontrolled variables*. The seed sample should also include an initial weight on each household record. 
-
-Base-year marginal distributions of person and household-level attributes of interest are available from Census. For future years, marginal distributions are either held constant, or forecasted.  Marginal distributions can be for both household or person level variables and are specified at a specific geography (e.g., Block Groups, Traffic Analysis Zone or County). PopulationSim allows controls to be specified at multiple geographic levels. 
-
-The objective of a population synthesizer is to 
-generate household weights which satisfies the marginal control distributions. This is achieved by use of 
-a data fitting technique. The most common fitting technique used by various population synthesizers is the 
-Iterative Proportional Fitting (IPF) procedure. Generally, the IPF procedure is used to obtain joint distributions of demographic 
-variables. Then, random sampling from PUMS generates the baseline synthetic population. 
+distributions. In the United States, the disaggregate population sample is typically obtained from the `Census Public Use 
+Microdata Sample (PUMS) <https://www.census.gov/programs-surveys/acs/microdata.html>`_, but other sources, such as a household 
+travel survey, can also be used. The seed sample should include demographic variables corresponding to each marginal control 
+termed as *controlled variables* (e.g., household size, household income, etc.). The seed sample could also include other 
+variables of interest but not necessarily controlled via marginal controls. These are termed as *uncontrolled variables*. 
+The seed sample should also include an initial weight on each household record. 
+
+Current year marginal distributions of person and household-level attributes of interest are available from Census. For 
+future years, marginal distributions are either held constant, or forecasted.  Marginal distributions can be for both 
+household or person level variables and are specified at a specific geography (e.g., Block Groups, Traffic Analysis Zone 
+or County). PopulationSim allows controls to be specified at multiple geographic levels. 
+
+The objective of a population synthesizer is to generate household weights which satisfies the marginal control 
+distributions. This is achieved by use of a data fitting technique. The most common fitting technique used by various 
+population synthesizers is the Iterative Proportional Fitting (IPF) procedure. Generally, the IPF procedure is used 
+to obtain joint distributions of demographic  variables. Then, random sampling from PUMS generates the baseline synthetic 
+population. 
 
 One of the limitations of the simple IPF method is that it does not incorporate both household and person 
 level attributes simulatenously. Some population synthesizers use a heuristic algorithm called the 
 Iterative Proportional Updating Algorithm (IPU) to incorporate both person and household-level variables in the fitting procedure. 
 
-Besides IPF, entropy 
-maximization algorithms have been used as a fitting technique. In most of the entropy based methods, 
+Besides IPF, entropy maximization algorithms have been used as a fitting technique. In most of the entropy based methods, 
 the relative entropy is used as the objective function. The relative entropy based optimization ensures 
 that the least amount of new information is introduced in finding a feasible solution. The base entropy 
 is defined by the initial weights in the seed sample. The weights generated by the entropy maximization 
 algorithm preserves the distribution of initial weights while matching the marginal controls. This is an 
-advantage of the entropy maximization based procedures over the IPF based procedures. PopulationSim uses the entropy maximization based list balancing to match controls specified at various geographic levels.
+advantage of the entropy maximization based procedures over the IPF based procedures. PopulationSim uses the entropy maximization 
+based list balancing to match controls specified at various geographic levels.
 
-Once the final weights 
-have been assigned, seed sample is expanded using these weights to generate a synthetic population. Most 
+Once the final weights have been assigned, the seed sample is expanded using these weights to generate a synthetic population. Most 
 population synthesizers create distributions using final weights and employ random sampling to expand the
 seed sample. PopulationSim uses Linear Programming to convert the final weights to integer values and expands 
-the seed sample using these integer weights. For detailed description of PopulationSim algorithm, please refer to the TRB paper link in the :ref:`docs` section. For information on software implementation refer to :ref:`core_components` and :ref:`model_steps`. To learn more about PopulationSim application and configuration, please follow the content index below. 
+the seed sample using these integer weights. For detailed description of PopulationSim algorithm, please refer to the TRB paper 
+link in the :ref:`docs` section. For information on software implementation refer to :ref:`core_components` and :ref:`model_steps`. To 
+learn more about PopulationSim application and configuration, please follow the content index below. 
 
 How does population synthesis work for survey weighting?
 --------------------------------------------------------

diff --git a/docs/software.rst b/docs/software.rst
@@ -9,8 +9,8 @@ This page describes the PopulationSim software implementation and how to contrib
 
 The implementation starts with
 the ActivitySim framework, which serves as the foundation for the software.  The framework, as briefly described
-below, includes features for data pipeline management, expression handling, testing, etc.  Built upon the
-framework are additional core components for population synthesis such as balancers and integerizers.
+below, includes features for data pipeline management, expression handling, multiprocessing, testing, etc.  Built upon 
+the framework are additional core components for population synthesis such as balancers and integerizers.
 Built upon the population synthesis core components are the model steps that make up a PopulationSim run,
 such as the inputs pre-processor, setting up the data strucutres, doing the initial seed balancing, etc.
 
@@ -42,7 +42,8 @@ being implemented in the ActivitySim framework means:
 * Model Orchestrator
 
   * `ORCA <https://github.com/UDST/orca>`__ is used for running the overall model system and for defining dynamic data tables, columns, and injectables (functions). ActivitySim wraps ORCA functionality to make a Data Pipeline tool, which allows for re-starting at any model step.
-
+  * Support for `multiprocessing <http://docs.python.org/3/library/multiprocessing.html>`_ to reduce runtime
+
 * Expressions
 
   * Model expressions are in CSV files and contain Python expressions, mainly pandas/numpy expression that operate on the input data tables. This helps to avoid modifying Python code when making changes to the model calculations.
@@ -236,4 +237,5 @@ Release Notes
   * v0.4 - transfer to ActivitySim.org
   * v0.4.1 - package updates
   * v0.4.2 - validation script in Python
-  * v0.4.3 - allow non-binary incidence 
+  * v0.4.3 - allow non-binary incidence 
+  * v0.5 - support for multiprocessing