Full text / txt download of the reviews #20

alexhebing · 2020-06-16T05:45:24Z

When creating the test corpus (i.e. The DInner and Harry Potter), Haidee explicitly asked for a txt version of the corpus, i.e. a file for each review that contains only the review text, and some metadata in the filename. I assume this makes it easier to work with (subsets of) the data in applications like Voyant (etc).

I can, and probably will, share the full corpus with Haidee and Gys-Walt, including txts, once the scraping is done. However, given the number of titles, I expect to scrape over 100.000 reviews. This makes selecting the txts for a subset virtually impossible.

Question: is it conceivable / do-able to add a full text download to I-analyzer, that would allow downloading a subset of reviews / documents? There is also a script I developed for @JosedeKruif that can do this type of thing (here), but the disadvantage of this is that customers would have to run python locally (and setup virtualenv etc). @BeritJanssen : what do you think, is a txt download from I-analyzer feasible?

BeritJanssen · 2020-06-16T14:49:57Z

This is certainly possible and shouldn't be hard to achieve. However, how to package this kind of behaviour on the frontend is a harder question. Potentially, we could make an extra corpus setting which means the csv download functionality will be replaced by a download of txts as zip. Should we discuss this tomorrow?

alexhebing added the question Further information is requested label Jun 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full text / txt download of the reviews #20

Full text / txt download of the reviews #20

alexhebing commented Jun 16, 2020

BeritJanssen commented Jun 16, 2020

Full text / txt download of the reviews #20

Full text / txt download of the reviews #20

Comments

alexhebing commented Jun 16, 2020

BeritJanssen commented Jun 16, 2020