You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When creating the test corpus (i.e. The DInner and Harry Potter), Haidee explicitly asked for a txt version of the corpus, i.e. a file for each review that contains only the review text, and some metadata in the filename. I assume this makes it easier to work with (subsets of) the data in applications like Voyant (etc).
I can, and probably will, share the full corpus with Haidee and Gys-Walt, including txts, once the scraping is done. However, given the number of titles, I expect to scrape over 100.000 reviews. This makes selecting the txts for a subset virtually impossible.
Question: is it conceivable / do-able to add a full text download to I-analyzer, that would allow downloading a subset of reviews / documents? There is also a script I developed for @JosedeKruif that can do this type of thing (here), but the disadvantage of this is that customers would have to run python locally (and setup virtualenv etc). @BeritJanssen : what do you think, is a txt download from I-analyzer feasible?
The text was updated successfully, but these errors were encountered:
This is certainly possible and shouldn't be hard to achieve. However, how to package this kind of behaviour on the frontend is a harder question. Potentially, we could make an extra corpus setting which means the csv download functionality will be replaced by a download of txts as zip. Should we discuss this tomorrow?
When creating the test corpus (i.e. The DInner and Harry Potter), Haidee explicitly asked for a txt version of the corpus, i.e. a file for each review that contains only the review text, and some metadata in the filename. I assume this makes it easier to work with (subsets of) the data in applications like Voyant (etc).
I can, and probably will, share the full corpus with Haidee and Gys-Walt, including txts, once the scraping is done. However, given the number of titles, I expect to scrape over 100.000 reviews. This makes selecting the txts for a subset virtually impossible.
Question: is it conceivable / do-able to add a full text download to I-analyzer, that would allow downloading a subset of reviews / documents? There is also a script I developed for @JosedeKruif that can do this type of thing (here), but the disadvantage of this is that customers would have to run python locally (and setup virtualenv etc). @BeritJanssen : what do you think, is a txt download from I-analyzer feasible?
The text was updated successfully, but these errors were encountered: