Make the TMT speak dfr-browser #51

senderle · 2017-01-22T19:01:34Z

Andrew Goldstone's dfr-browser produces lovely visualizations, and it appears to require only some .json input. It would be nice if the TMT could generate that input.

The text was updated successfully, but these errors were encountered:

senderle · 2017-02-04T18:11:34Z

After looking closely at Goldstone's prepare-data script, I think this should be doable. That script requires just three things to start: the raw mallet state, gzipped (output-state.gz in the current TMT naming scheme), the ID field from MALLET's standard doc-topics output (doc-topic.txt in the current TMT naming scheme), and a metadata file.

There are specific requirements for the metadata, and that's the only major complication, since we can't expect any particular kind of metadata from any particular project. However, I think we can work around most of that as long as we enforce this one guarantee: the first column of the metadata file must be matchable to the list of file IDs that you get from cut -f 2 doc-topic.txt > ids.txt. That's not so different from the system we're already using; we reinterpret "ID" as "filename," but the two approaches are -- I believe -- functionally indistinguishable. We need to verify that, but if that's correct, then this will be pretty easy!

Though perhaps tedious... since it will involve translating prepare-data into one or more Java classes... wah wah.

senderle added the enhancement label Jan 24, 2017

senderle mentioned this issue Feb 4, 2017

Find a filename column even when it isn't the first column and normalize the CSV #35

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the TMT speak dfr-browser #51

Make the TMT speak dfr-browser #51

senderle commented Jan 22, 2017

senderle commented Feb 4, 2017 •

edited

Make the TMT speak dfr-browser #51

Make the TMT speak dfr-browser #51

Comments

senderle commented Jan 22, 2017

senderle commented Feb 4, 2017 • edited

senderle commented Feb 4, 2017 •

edited