Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampling architecture #81

Open
wants to merge 11 commits into
base: updating_schema_inference
Choose a base branch
from

Conversation

Hsankesara
Copy link
Member

Added user and data sampling mechanisms. User sampling mechanism contains option to choose users by fraction, count and IDs Data sampling mechanisms include choosing data between time ranges, by count and by fraction.

@Hsankesara Hsankesara requested a review from afolarin May 1, 2024 10:38
Copy link
Member

@afolarin afolarin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Possibly consider the changes now or on the next iteration.

## TODO: For future
#data_sampling:
## Possible methods: time, count, fraction
## starttime and endtime format is dd-mm-yyyy hh:mm:ss in UTC timezone
Copy link
Member

@afolarin afolarin May 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be useful to be able to specify an array of ranges? This way if you wanted a value single range you could just specify that and if you wanted a sequence of ranges this could also be provided.

# count: 2
#method: userid
#config:
# userids:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to keep fixed on subjectID or userID? For historical reasons, SubjectID is the name we chose for the main ID we use on the platform (with potentially UserID being introduced later when we have the self-enrollment portal). It may not result in too much confusion, as this is more on the analysis side, but I'd point this out perhaps it is sensible to be consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants