Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we mine bioguide for biographical info? #572

Open
wilson428 opened this issue Jun 14, 2018 · 4 comments
Open

Should we mine bioguide for biographical info? #572

wilson428 opened this issue Jun 14, 2018 · 4 comments

Comments

@wilson428
Copy link
Member

Per #571, I found that the best way to figure out whether a member retired and didn't run for a different office was to crawl every page on bioguide and identify unique phrases that matched existing lists. I'm sure it's not perfect, but so far I haven't found any false positives or negatives.

I wonder if it's worth trying to cautiously mine more information from these little squibs. For example, we were debating at work today whether more or fewer members have been previously elected to lower offices before reaching Congress than in past terms. Military service might also be possible to tease out, though at a glance it's worded differently during different wars.

Could be a fun NLP project if carefully, carefully spot-checked. Happy to provide the raw text of the squibs if anyone wants to have a look.

@dwillis
Copy link
Member

dwillis commented Jun 14, 2018

I think this would be worthwhile, at least as a guide for updating. FWIW, we do this manually.

@wilson428
Copy link
Member Author

I'll take a hack if time ever permits. Now for the hard part: Are we still a Python shop or can I use Node? Be gentle.

@JoshData
Copy link
Member

See #304 where I came up with an over-engineered solution. :)

@wilson428
Copy link
Member Author

I saw that after I wrote this! Will play around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants