Skip to content

DuBois v.2

Latest
Compare
Choose a tag to compare
@bmschmidt bmschmidt released this 03 Oct 18:13
· 12 commits to master since this release

Two changes, plus pulling master to be up-to-date with the pandas branch now that it's proved its worth in local production. Renaming Pandas to DuBois, because he's an author and I just taught him.

Also removing "alpha" and "beta" from all future pre v1.0 releases; instability should be assumed.

1. Adding a new syntactic option to drop groups from the comparison set.

So ordinarily a query like {"groups":["year","library"],"counttype":["TextPercent"]} will give for each interaction of year and library the number of texts that come from that particular library in that year. That's not interesting. (By definition, it will always be 100%.

On the other hand,

  • {"groups":["year","*library"],"counttype":["TextPercent"]} will drop the library grouping on the superset and give the percentage of all texts for that year that come from the library, so each column will sum to 100%;
  • {"groups":["*year","library"],"counttype":["TextPercent"]} will drop the year superset and give the percentage of all texts for that library that come from that year and library.
  • * {"groups":["*year","*library"],"counttype":["TextPercent"]} will drop both and give the percentage of all texts for the library defined by search_limits or constrain_limits contained in each cell: the sum of all the TextPercent cells in the entire return set should be 100. (Though it may not be if year or library is undefined for some items).

Combining this syntax with that for defining a separate compare_limits will produce some pretty nonsensical queries, so it's generally better to do just one or the other.

2. Support for the topic-model extension.

Allows really fine-grained analysis of Mallet topic models at the token level. Blog Post forthcoming, hopefully.