Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the TMT speak dfr-browser #51

Open
senderle opened this issue Jan 22, 2017 · 1 comment
Open

Make the TMT speak dfr-browser #51

senderle opened this issue Jan 22, 2017 · 1 comment

Comments

@senderle
Copy link
Owner

Andrew Goldstone's dfr-browser produces lovely visualizations, and it appears to require only some .json input. It would be nice if the TMT could generate that input.

@senderle
Copy link
Owner Author

senderle commented Feb 4, 2017

After looking closely at Goldstone's prepare-data script, I think this should be doable. That script requires just three things to start: the raw mallet state, gzipped (output-state.gz in the current TMT naming scheme), the ID field from MALLET's standard doc-topics output (doc-topic.txt in the current TMT naming scheme), and a metadata file.

There are specific requirements for the metadata, and that's the only major complication, since we can't expect any particular kind of metadata from any particular project. However, I think we can work around most of that as long as we enforce this one guarantee: the first column of the metadata file must be matchable to the list of file IDs that you get from cut -f 2 doc-topic.txt > ids.txt. That's not so different from the system we're already using; we reinterpret "ID" as "filename," but the two approaches are -- I believe -- functionally indistinguishable. We need to verify that, but if that's correct, then this will be pretty easy!

Though perhaps tedious... since it will involve translating prepare-data into one or more Java classes... wah wah.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant