Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run PUF-based tests on PRs? #2479

Open
MattHJensen opened this issue Aug 25, 2020 · 6 comments
Open

How to run PUF-based tests on PRs? #2479

MattHJensen opened this issue Aug 25, 2020 · 6 comments

Comments

@MattHJensen
Copy link
Contributor

MattHJensen commented Aug 25, 2020

Currently, only CPS-based tests (and other tests not requiring external tests data) are run automatically on PRs to GH.

This issue is to encourage the exploration of running PUF-based tests on PRs as well.

Running the full test suite on PRs would give contributors greater ownership of their commits, it would result in fewer problems/bugs founds during the release process, and it would be a significant step towards adopting continuous delivery or continuous deployment (to be discussed further in a forthcoming issue).

Challenges:

  1. Tests will run longer and require more resources.
  2. The automated testing will be a new security threat for the PUF.

(1) is unlikely to present a significant problem, but it is something to be aware of.

For (2), a primary concern is that someone without access to the PUF could write a new test that prints or otherwise discloses secure data and then use the automated tests, triggered by a PR, to release those data.

One approach to resolving (2) would be that the tests only run automatically on PRs from whitelisted developers; for other developers, a reviewer would need to manually trigger the tests. We would need to find a testing service / automation service that offers this. There may be other solutions. All ideas regarding automation services or other solutions welcome!

@hdoupe
Copy link
Collaborator

hdoupe commented Aug 25, 2020

@MattHJensen I dropped a similar comment in @Peter-Metz's state-taxdata project, but you could read the PUF file from the OSPC AWS bucket like C/S does. To do this, you'd just need to set your AWS access variables as Github secrets. And then you can access them from a Github Actions run.

At some point, I read that only users with collaborator-level access to a repository have the ability to read GH secrets. However, I'm having trouble finding that link right now. There is this quote from the Github Actions docs (emphasis is mine):

You can use and read encrypted secrets in a workflow file if you have access to edit the file. For more information, see "Access permissions on GitHub."

However, I'd like to test this out before making the PUF available.

GitHub will redact secret values if they are printed, too. There's some more info here: https://docs.github.com/en/actions/configuring-and-managing-workflows/creating-and-storing-encrypted-secrets#naming-your-secrets

@MaxGhenis
Copy link
Contributor

Could the synthetic PUF help?

@jdebacker
Copy link
Member

The challenges of data privacy seem hard to overcome with outputs from tests being printed to the pytest log files on Travis.

What is the reason behind testing with the PUF exactly? Do these tests pick up (1) errors in the logic? (2) errors related to using variables that don't exist in the PUF? (3) errors in assuming certain values for variables that might differ in the PUF vs CPS? (4) something else?

I think if you identify the issues that are behind your motivation for wanting to include the PUF-based tests, you could perhaps write tests that check against such issues without using the PUF.

E.g., I don't see very good testing of the internal logic of Tax-Calculator. test_calcfunctions.py mostly checks number of arguments, but does not test that given certain values for inputs, certain output values are returned. If each function were tested in that way, errors in tax logic would be caught quickly, without the need to use any particular dataset.

@rickecon
Copy link
Member

We have had this question in OG-USA both with regard to the CI computing time restraint and with regard to the PUF. And we have an additional question about storing data files used only for testing outside of the GitHub repository.

  1. CI Time constraint. Travis CI allows for two processors and 50 minutes of testing per pull request. The full OG-USA testing suite takes between 12-13 hours using 7 processors on a fast machine. So our CI battery is a select group of tests. For major refactoring, we run the full suite of tests on either Jason's or my local machine. Those tests offline tests include tests that use the PUF.

  2. Proprietary data in testing. The only way we see that you can use the PUF in your testing is if the tests are all run remotely on a protected server. Compute.studio can do this. Most most of our other PSLmodels projects do not currently pay for that service. For right now, Jason and I just decided to run all our big refactoring pull requests through the full testing suite manually. But as projects mature, we will need to invest in some more sophisticated infrastructure. And to @MaxGhenis' comment, a synthetic PUF would solve this privacy problem.

  3. Storing testing data off repo. OG-USA has GBs of non-private data used for verification in our unit testing suite. If we had a good way to store this off of the repository and just use the requests library to load it into tests, that would streamline the memory footprint of the repo. However, it takes time to download data, which would eat into our Travis CI compute time limit described in point 1. Furthermore, a package probably doesn't need all the testing data along with it.

cc: @MattHJensen @jdebacker @hdoupe

@MattHJensen
Copy link
Contributor Author

MattHJensen commented Aug 28, 2020

@jdebacker asked why the PUF is used in testing:

What is the reason behind testing with the PUF exactly? Do these tests pick up (1) errors in the logic? (2) errors related to using variables that don't exist in the PUF? (3) errors in assuming certain values for variables that might differ in the PUF vs CPS? (4) something else?

The PUF tests help with (1) and (3) in that we keep track of baseline and can see deviations form them, but those purposes are also backstopped with our validation scripts against TAXSIM, the CPS-based tests, and some other unit testing. As you note, additional unit testing could be added as a further backstop, but I don't see that as a panacea because there's nothing that would prevent logic errors introduced in calcfunctions from being duplicated in the unit tests. The TAXSIM validation scripts and comparison of PUF-based results against other modeling outfits provides 3rd party validation, if incomplete and imperfect.

We also use PUF for (2) to identify which parameters are "compatible" with the PUF, i.e., which parameters influence PUF-based results. See https://github.com/PSLmodels/Tax-Calculator/blob/3b9f4642ba53ed8e5d01155f543ad5d0e6f2f526/taxcalc/tests/test_compatible_data.py.

Mostly we use the PUF for (4), "something else." Specifically, we keep an extensive record of baseline and reform results for Tax-Calculator when used with the puf.csv. These results give users a historical review of the Tax-Calculator + TaxData integrated capabilities and allow users to roughly compare Tax-Calculator baseline and reform against other sources like CBO. We rely on the reform comparisons regularly for development purposes when adding new parameters and reform capabilities.
See https://github.com/PSLmodels/Tax-Calculator/blob/master/taxcalc/tests/test_reforms.py,
https://github.com/PSLmodels/Tax-Calculator/blob/master/taxcalc/tests/test_puf_var_stats.py, https://github.com/PSLmodels/Tax-Calculator/blob/master/taxcalc/tests/test_pufcsv.py.

@MattHJensen
Copy link
Contributor Author

MattHJensen commented Sep 9, 2020

Any reactions to this approach?

  1. Set up a private repository that mirrors Tax-Calculator.
  2. Configure testing in the private mirror to only run PUF-based tests.
  3. Tax-Calculator maintainers check the private repo before merging PRs.

Downsides (and mitigating factors).

  • Would require figuring out mirroring with an additional wrinkle of different test configs. (But this will be good knowledge for PSL)
  • Expenses may be associated with testing for the private repo. (But OSPC can likely fund it)
  • Requires maintainers to check tests in two repos before merging. (But that's not much work)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants