-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sampling architecture #81
base: updating_schema_inference
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Possibly consider the changes now or on the next iteration.
## TODO: For future | ||
#data_sampling: | ||
## Possible methods: time, count, fraction | ||
## starttime and endtime format is dd-mm-yyyy hh:mm:ss in UTC timezone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be useful to be able to specify an array of ranges? This way if you wanted a value single range you could just specify that and if you wanted a sequence of ranges this could also be provided.
# count: 2 | ||
#method: userid | ||
#config: | ||
# userids: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to keep fixed on subjectID or userID? For historical reasons, SubjectID is the name we chose for the main ID we use on the platform (with potentially UserID being introduced later when we have the self-enrollment portal). It may not result in too much confusion, as this is more on the analysis side, but I'd point this out perhaps it is sensible to be consistent.
Added user and data sampling mechanisms. User sampling mechanism contains option to choose users by fraction, count and IDs Data sampling mechanisms include choosing data between time ranges, by count and by fraction.