Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal feedback #1

Open
andrewpbray opened this issue Nov 13, 2019 · 0 comments
Open

proposal feedback #1

andrewpbray opened this issue Nov 13, 2019 · 0 comments

Comments

@andrewpbray
Copy link
Contributor

These are three thoughtful and interesting proposals.

The first struck me as a bit humdrum, but I think there's actually quite a bit of clever work that you could do here. One interesting angle is would be to run a model at several different levels of nesting, where you fit a bunch of models, each time controlling for more and more things. It'd be interesting to track how the residuals for each school shift as you move from one model to the next (and what does a residual mean in this context?). There also another rich data set called the College Scorecard that you could draw upon. I think this project is the best bet.

The other two projects get into the wide world of text-as-data, which can take quite a bit of wrangling before you can get usable features out of. Of the two, the "toxic" one seems more promising if the data exists and if you're able to annotate it without too much trouble. I think this one could work.

The twitter project would be difficult I think, because you'd want to be able to make a strong causal claim about twitter use effecting company performance, but you'd just be working with observational data and there's likely a complicated web of causation that will muddy any clear conclusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant