Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Before I Sleep: How to be assertive about not testing your data science pipeline #21

Open
utterances-bot opened this issue Oct 21, 2023 · 1 comment

Comments

@utterances-bot
Copy link

Before I Sleep: How to be assertive about not testing your data science pipeline

https://milesmcbain.com/posts/assertive-programming-for-pipelines/

Copy link
Contributor

Great post!

I've never considered using {testthat} for assertive programming. What I like most about this idea is that you're not using a new tool / package to write assertions. No new api to learn. Bonus: {testthat} is very well documented.

I'm a big fan of assertive programming, particularly in {targets} pipelines. I've found this to be a timesaver, not only in avoiding wasted compute, but also (and more importantly) in debugging.

I find it useful to validate inputs and sometimes outputs of targets. For input validation/assertions, you're making explicit what your assumptions of your inputs are. Re-running pipelines with new (external) data can result in explicit errors, or worse silently incorrect results, or a failure further down the pipeline that is difficult to diagnose.

Output assertions, at least how I've used them, are much more like a unittest; you're explicitly checking that your code does what you think it does and allows for specific failure messages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants