Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full text / txt download of the reviews #20

Open
alexhebing opened this issue Jun 16, 2020 · 1 comment
Open

Full text / txt download of the reviews #20

alexhebing opened this issue Jun 16, 2020 · 1 comment
Labels
question Further information is requested

Comments

@alexhebing
Copy link
Contributor

When creating the test corpus (i.e. The DInner and Harry Potter), Haidee explicitly asked for a txt version of the corpus, i.e. a file for each review that contains only the review text, and some metadata in the filename. I assume this makes it easier to work with (subsets of) the data in applications like Voyant (etc).

I can, and probably will, share the full corpus with Haidee and Gys-Walt, including txts, once the scraping is done. However, given the number of titles, I expect to scrape over 100.000 reviews. This makes selecting the txts for a subset virtually impossible.

Question: is it conceivable / do-able to add a full text download to I-analyzer, that would allow downloading a subset of reviews / documents? There is also a script I developed for @JosedeKruif that can do this type of thing (here), but the disadvantage of this is that customers would have to run python locally (and setup virtualenv etc). @BeritJanssen : what do you think, is a txt download from I-analyzer feasible?

@alexhebing alexhebing added the question Further information is requested label Jun 16, 2020
@BeritJanssen
Copy link
Member

This is certainly possible and shouldn't be hard to achieve. However, how to package this kind of behaviour on the frontend is a harder question. Potentially, we could make an extra corpus setting which means the csv download functionality will be replaced by a download of txts as zip. Should we discuss this tomorrow?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants