Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sn/ignore bad dates #17

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Sn/ignore bad dates #17

wants to merge 4 commits into from

Conversation

snakch
Copy link

@snakch snakch commented Jun 10, 2022

#14

The purpose of this PR is to add a bad_date optional argument to the partition function which allows the user to specify date that should be dropped. This should be useful in a number of downstream applications.

@snakch snakch requested a review from oxinabox June 10, 2022 17:44
@@ -33,7 +33,7 @@ struct PeriodicSelector <: DateSelector
end


function Iterators.partition(dates::StepRange{Date, Day}, s::PeriodicSelector)
function Iterators.partition(dates::StepRange{Date, Day}, s::PeriodicSelector; bad_dates=[])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest renaming this to exclude and making default arg typed.

Here and everywhere else

Suggested change
function Iterators.partition(dates::StepRange{Date, Day}, s::PeriodicSelector; bad_dates=[])
function Iterators.partition(dates::StepRange{Date, Day}, s::PeriodicSelector; exclude=Date[])

Comment on lines 84 to 85
@test !any(is_bad.(validation))
@test !any(is_bad.(holdout))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really a performance issue here, but in general best practice is:

Suggested change
@test !any(is_bad.(validation))
@test !any(is_bad.(holdout))
@test !any(is_bad, validation)
@test !any(is_bad, holdout)

as it avoids allocations

@@ -118,4 +118,16 @@
@test initial_sets.validation ∩ later_sets.holdout == Set()
end
end

@testset "1 week period, 1 day stride, remove the first two holdout dates" begin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@testset "1 week period, 1 day stride, remove the first two holdout dates" begin
@testset "1 week period, 1 day stride, remove the first holdout dates" begin

Copy link
Contributor

@mzgubic mzgubic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this approach better than adding another selector (as suggested in #14)?

@oxinabox
Copy link
Member

oxinabox commented Jun 10, 2022

this is equivelent to

map(d->filter(!in(bad_dates), d), partition(dates, selector))

I wonder if we might want to instead define a

filter_partitions(by, parts) = map(d->filter(by, d), parts)

which could be used for this and other things as well

@snakch
Copy link
Author

snakch commented Jun 13, 2022

Why is this approach better than adding another selector (as suggested in #14)?

I considered doing another selector, but modifying partition felt more natural, since excluding dates is something one might want regardless of date selection technique.
On the downside, it is definitely more invasive than defining a new selector, so for other services that use DateSelectors your suggestion might be best overall. Happy to rework this entirely if we decide to go this way

this is equivelent to

map(d->filter(!in(bad_dates), d), partition(dates, selector))

I wonder if we might want to instead define a

filter_partitions(by, parts) = map(d->filter(by, d), parts)

which could be used for this and other things as well

Yes this feels like possibly a good in-between? Again the issue is that client modules that use partition might need to be redefined to allow filter_partitions, so it's maybe a bit more disruptive.

@mzgubic
Copy link
Contributor

mzgubic commented Jun 13, 2022

I considered doing another selector, but modifying partition felt more natural, since excluding dates is something one might want regardless of date selection technique.

Oh, I assumed that selectors are composable, but that does not seem to be the case.

Again the issue is that client modules that use partition might need to be redefined to allow filter_partitions, so it's maybe a bit more disruptive.

Hmm, I'm not sure I understand this entirely. Would it not be possible to just do:

validation, holdout = filter_partitions(!in(excluded_dates), partition(...))

in client code?

@snakch
Copy link
Author

snakch commented Jun 13, 2022

Oh, I assumed that selectors are composable, but that does not seem to be the case. I'll open an issue.

They are, i'd actually misread the issue. I think that solution is much better after some thought, so I'll overhaul the PR in a bit. No need to open another issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants