Skip to content
This repository has been archived by the owner on Dec 19, 2023. It is now read-only.

Concordance update #26

Closed
wants to merge 5 commits into from
Closed

Conversation

cpurta
Copy link
Contributor

@cpurta cpurta commented Oct 17, 2017

This is in reference to #4 to fix the concordance. This is using the suggestion that test targets are used in the concordance check. We are pulling a sample of the test tournament data that is configurable via a method parameter so that the issue of data leakage on the test set is solved.

The metrics is the percent difference between the validation log_loss and the test log_loss and is considered to be concordant if that difference is under 10% and that threshold can change if so desired.

This unfortunately breaks the concordance_benchmark testing script that is in place since there are not targets on the test set. So in order to test this we may need to add a past tournaments validation data that includes targets on the test set as the sample data in order for the tests to work.

Also may need to break up the get_competition_split function into one that is similar to get_competition_variables_from_df so that we can split the sample validation data into validation, and test without downloading the dataset.

Seems reasonable to ask for ~50 NMR for this PR since this is roughly half of the work laid out in the improvements pdf

Numerai username: cpurta

cpurta and others added 5 commits September 13, 2017 17:08
Update with master branch in original fork
Update with numerai/submission-criteria master changes
Update with master from base fork
Allowing concordance to be a measure of the percent difference in log
loss between the validation and test sets. This assumes that the test
set gathered from the competition data will have test targets. We are
only pulling a random 1/4 of the test set as a comparision against the
submission data. This is so that the "data leakage" mentioned in the pdf
is avoided.
@zoso95
Copy link
Contributor

zoso95 commented Oct 26, 2017

This seems like a pretty dangerous change for a couple of reasons

  1. This isn't actually checking model outputs, just performance. This means that a user could basically submit two wildly different models and have them be concordant because they both happen to have a low logloss
  2. There is a reasonable amount of non-iid ness in our data. This is why we were doing the clustering, but this doesn't take that into account.

@cpurta
Copy link
Contributor Author

cpurta commented Nov 3, 2017

Yeah that does make sense. Tested that out and confirm that one could game concordance using that technique. Seems like using test targets should be ruled out from a concordance test then and the PDF should be updated to rule out that as a solution.

@cpurta cpurta closed this Nov 3, 2017
xanderdunn added a commit that referenced this pull request Nov 11, 2017
Removed user_id and competition_id from the API.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants