Skip to content
This repository has been archived by the owner on Dec 19, 2023. It is now read-only.

Benchmark #9

Merged
merged 14 commits into from Sep 15, 2017
Merged

Benchmark #9

merged 14 commits into from Sep 15, 2017

Conversation

sovaa
Copy link

@sovaa sovaa commented Sep 13, 2017

Simple benchmark for original(). More benchmarking could be done if originality.py abstracts away the managers or by implementing in-memory versions of them.

Requires two new dependencies, multiprocessing and randomstate (drop-in replacement for numpy's RandomState that plays well with multiprocessing).

@philipcmonk
Copy link
Contributor

Can you add the dependencies to setup.py and requirements.txt?

@philipcmonk
Copy link
Contributor

It would also be nice to have this for concordance.



def check_original(_: int):
submission_1, submission_2 = gen_submission(), gen_submission()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we benchmark each user against, say, 1000 other users rather than generate new pairs each time? There may be some potential gains in pre-processing the data.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include e.g. 50 user submissions to the repo? Then the test can add random noise to them to create a few thousands more 'almost unique' during setup. Could be a gziped file to be decompressed during setup as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. I'll create an issue for it, but for the time being, generating them in the way you're doing it looks fine (except that we should check each user against 1000 other users).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized my description above may not be very clear. I mean that the benchmark should capture this sort of optimization: #12

@sovaa
Copy link
Author

sovaa commented Sep 14, 2017

The concordance benchmark is sampling in batched from the example data and adding normally distributed noise to create larger benchmark data.

If the PR is accepted for the bounty my numerai account is sovaa.

Copy link
Contributor

@philipcmonk philipcmonk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, except the one issue.

from concordance import has_concordance
from concordance import get_sorted_split

N_SAMPLES = 100_000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As much as I like the new number format, we use Python 3.5 internally, so this should be 100000.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, runtime.txt said 3.6.1 sot that's why. I'll change the number format.

@philipcmonk philipcmonk merged commit af92b69 into numerai:master Sep 15, 2017
@philipcmonk
Copy link
Contributor

Thanks! I've sent 30 NMR to sovaa. It should be in your account now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants