Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code review notes #26

Open
nialldcms opened this issue Jul 21, 2020 · 0 comments
Open

Code review notes #26

nialldcms opened this issue Jul 21, 2020 · 0 comments
Assignees

Comments

@nialldcms
Copy link

Correct data being collected

  • Only issues I've noticed is that the next in many of the collected is that the text fields for the subreddit submissions and the comments do not seem to contain the agreed search terms, e.g. '5g coronavirus' - why is this? We should only be collected data on the relevant search terms.

    For example, this submission appears in the table. Is this just data pertaining to a test run of the code?

  • On the submissions data, I think we want to collect upvotes and downvotes instead of score. Score is the sum of these but doesn't give you a sense of the controversy score, and these data separated out might provide better insight into how any disinformation if regarded on Reddit - and might be useful if we want to do any bespoke ML training later.

  • Further to this point we should be careful about how we use this data if we are collecting just "newest possible" and how we handle duplicates, as this data changes as a function of time. Do we want to change any functionality to account for this?

  • Data in comments table can be related back to the original submission by removal of the comment ID from the URL string.

Duplicates code

This looks to be functioning properly but we might want to consider if the old data should be updated with newer data, e.g. if a row with id X appears in in the bq table already, should this old row in the bq table be replaced with newly sourced data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants