-
Notifications
You must be signed in to change notification settings - Fork 24
Conversation
Will help later with docker setup.
Added requirements.txt.
Can you add the dependencies to |
It would also be nice to have this for concordance. |
benchmark_originality.py
Outdated
|
||
|
||
def check_original(_: int): | ||
submission_1, submission_2 = gen_submission(), gen_submission() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we benchmark each user against, say, 1000 other users rather than generate new pairs each time? There may be some potential gains in pre-processing the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you include e.g. 50 user submissions to the repo? Then the test can add random noise to them to create a few thousands more 'almost unique' during setup. Could be a gziped file to be decompressed during setup as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. I'll create an issue for it, but for the time being, generating them in the way you're doing it looks fine (except that we should check each user against 1000 other users).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized my description above may not be very clear. I mean that the benchmark should capture this sort of optimization: #12
The concordance benchmark is sampling in batched from the example data and adding normally distributed noise to create larger benchmark data. If the PR is accepted for the bounty my numerai account is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, except the one issue.
benchmark_concordance.py
Outdated
from concordance import has_concordance | ||
from concordance import get_sorted_split | ||
|
||
N_SAMPLES = 100_000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As much as I like the new number format, we use Python 3.5 internally, so this should be 100000
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, runtime.txt
said 3.6.1
sot that's why. I'll change the number format.
Thanks! I've sent 30 NMR to |
Simple benchmark for
original()
. More benchmarking could be done iforiginality.py
abstracts away the managers or by implementing in-memory versions of them.Requires two new dependencies,
multiprocessing
andrandomstate
(drop-in replacement for numpy'sRandomState
that plays well withmultiprocessing
).