Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Versioning TaxData #338

Open
andersonfrailey opened this issue Jul 1, 2020 · 4 comments
Open

Versioning TaxData #338

andersonfrailey opened this issue Jul 1, 2020 · 4 comments

Comments

@andersonfrailey
Copy link
Collaborator

One of the requirements for PSL acceptance is to adopt a versioning system. Additionally, if we do end up making a standalone taxdata package/CS app we'll need to specify versions as well. Semantic versioning is probably the default choice, but I don't think it's obvious what would fall into each release category given the nature of taxdata. I've listed my ideas for how we would categorize releases, would be great to get other's input as well.

Major Release

  • Significant methodology changes e.g. adopting a new linear programming model. totally changing how we conduct statistical matching or tax unit creation, new imputation methods, etc.
  • Changes in primary PUF or CPS versions. Though if we are able to generalize the code base to work with any year of the CPS and PUF this would not be relevant and adding support for new years would fall into the minor change category as mentioned below

Minor Release

  • Adding new variables
  • Minor changes to tax unit creation logic
  • Adding support for creating files from different years of the PUF and CPS, assuming this doesn't break
    If we were in a situation where we adopted new imputation methods to add new variables to the files would this count as a major or minor change?

"Patch" Release

  • Actual bug patches
  • Updating which CBO projections we use to calculate growth rates

Alternatively, we could adopt an alternative versioning system such as calendar versioning, though I'm not sure how that would work with conda.

No matter the system we choose, I think our first "release" should come after PR #332 is merged and before we start any major refactoring.

@MattHJensen @MaxGhenis @Peter-Metz @hdoupe @chusloj

@MattHJensen
Copy link
Contributor

MattHJensen commented Jul 2, 2020

Love the push to get TaxData versioned.

I just reread the semver guidelines. I wonder if TaxData versioning might fit more easily once we start thinking of it as an API to generate datasets rather than as collection of datasets or as a routine to generate specific datasets.

If we think of TaxData as an API to generate datasets, then everything follows from semver:

A major release is anything that breaks backwards compatibility, i.e., takes away from or redefines the API.
A minor release is anything that adds an enhancement to the API or makes major changes to private code.
A patch release is anything that fixes a bug.

With this thinking, I'd update @andersonfrailey's table like this:

Major Release

  • Changes in primary Removing the capability to use PUF or CPS versions. Though if we are able to generalize the code base to work with any year of the CPS and PUF this would not be relevant and adding support for new years would fall into the minor change category as mentioned below

Minor Release

  • Significant methodology changes enhancements e.g. adopting adding a new linear programming model. totally changing how we conduct Updating statistical matching or tax unit creation, new imputation methods, etc.
  • Adding new variables
  • Minor changes to tax unit creation logic that don't break the API
  • Adding support for creating files from different years of the PUF and CPS, assuming this doesn't break
    If we were in a situation where we adopted new imputation methods to add new variables to the files would this count as a major or minor change?
  • Adding new imputation methods for new variables w/o breaking the existing API
  • Updating which CBO projections we use to calculate growth rates (until we can parameterize the choice of cbo projections, after which removing an option would necessitate a major release)

"Patch" Release

  • Actual bug patches
    * Updating which CBO projections we use to calculate growth rates

If we were to adopt this API centric view for TaxData, the production datasets themselves still need to be versioned, but we already have a system of versioning puf.csv and cps.csv by date. That system of versioning could certainly be updated, but --if we adopt the API centric view for TaxData -- any such updates could be seen as a mostly separate issue from versioning TaxData.

@andersonfrailey
Copy link
Collaborator Author

Thanks for your comments, @MattHJensen. I like your ideas for an API centric view of TaxData. With those in mind, I still think our first release should happen once we've updated to the latest CBO projections. Then we can start doing some of the refactoring discussed in #336 to make a more formal and flexible API.

@MattHJensen
Copy link
Contributor

MattHJensen commented Jul 3, 2020

With those in mind, I still think our first release should happen once we've updated to the latest CBO projections.

This make sense and sounds great. If we wait until after the public API is established before incrementing to 1.0.0., then we can use 0.y.z during the period of establishing the public API and not need to worry about distinguishing between between major and minor releases. Anything that is a bugfix would increment z, and anything else would increment y. That is, until we establish a public API, at which time we would increment x from 0 to 1 (and have a Zoom party).

Is this in line with your thinking @andersonfrailey? If so, then as far as PSL inclusion goes, I think TaxData can just say it is using semver.

@andersonfrailey
Copy link
Collaborator Author

@MattHJensen, this all sounds good to me. I'm working on a rough sketch of what the public API would look like based on the Tax-Data Generator doc you shared in #336 and some of my own ideas. I'll open an issue laying out some ideas sometime this week before I start working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants