This repository has been archived by the owner on Dec 19, 2023. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is in reference to #4 to fix the concordance. This is using the suggestion that test targets are used in the concordance check. We are pulling a sample of the test tournament data that is configurable via a method parameter so that the issue of data leakage on the test set is solved.
The metrics is the percent difference between the validation log_loss and the test log_loss and is considered to be concordant if that difference is under 10% and that threshold can change if so desired.
This unfortunately breaks the concordance_benchmark testing script that is in place since there are not targets on the test set. So in order to test this we may need to add a past tournaments validation data that includes targets on the test set as the sample data in order for the tests to work.
Also may need to break up the
get_competition_split
function into one that is similar toget_competition_variables_from_df
so that we can split the sample validation data into validation, and test without downloading the dataset.Seems reasonable to ask for ~50 NMR for this PR since this is roughly half of the work laid out in the improvements pdf
Numerai username: cpurta