You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Andrew Goldstone's dfr-browser produces lovely visualizations, and it appears to require only some .json input. It would be nice if the TMT could generate that input.
The text was updated successfully, but these errors were encountered:
After looking closely at Goldstone's prepare-data script, I think this should be doable. That script requires just three things to start: the raw mallet state, gzipped (output-state.gz in the current TMT naming scheme), the ID field from MALLET's standard doc-topics output (doc-topic.txt in the current TMT naming scheme), and a metadata file.
There are specific requirements for the metadata, and that's the only major complication, since we can't expect any particular kind of metadata from any particular project. However, I think we can work around most of that as long as we enforce this one guarantee: the first column of the metadata file must be matchable to the list of file IDs that you get from cut -f 2 doc-topic.txt > ids.txt. That's not so different from the system we're already using; we reinterpret "ID" as "filename," but the two approaches are -- I believe -- functionally indistinguishable. We need to verify that, but if that's correct, then this will be pretty easy!
Though perhaps tedious... since it will involve translating prepare-data into one or more Java classes... wah wah.
Andrew Goldstone's dfr-browser produces lovely visualizations, and it appears to require only some
.json
input. It would be nice if the TMT could generate that input.The text was updated successfully, but these errors were encountered: