Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Divide Input" option needs better documentation. #65

Open
shawngraham opened this issue Nov 23, 2017 · 9 comments
Open

"Divide Input" option needs better documentation. #65

shawngraham opened this issue Nov 23, 2017 · 9 comments
Assignees

Comments

@shawngraham
Copy link

Just a small question regarding the 'divide input into n-word chunks' option in the advanced setting. When I run that on say a 4-gram, I understand what's going on from the point of view of input - but in terms of the output, the topic keywords say are individual words again? A student was asking me this, expecting that the keywords would also be 4-grams, and so I figured, good question...

Thanks! Really appreciate all the work you've done with this tool.

@senderle
Copy link
Owner

senderle commented Nov 27, 2017 via email

@senderle senderle changed the title concerning n-grams "Divide Input" option needs better documentation. Nov 28, 2017
@senderle senderle self-assigned this Nov 28, 2017
@shawngraham
Copy link
Author

Ah! that actually solves another problem I'd been having! Bigrams - yes, that's what my student and I were expecting. Thank you for explaining that.

@shawngraham
Copy link
Author

woops, maybe you didn't want this closed.

@shawngraham shawngraham reopened this Nov 29, 2017
@senderle
Copy link
Owner

Ha! I didn't realize that openers had the power to close -- should have though.

Yeah I'll leave this open for now. If you believe there's such a thing as a documentation bug -- and I do -- then this is definitely one. Of course, you could argue that this issue could serve as the documentation, making this a self-resolving issue. But once it's closed, it no longer shows up in the default search, and nobody can find it.

So the moment you close it, the bug is back, and it has to be reopened. :)

@shawngraham
Copy link
Author

Sorry about that!

I went and tried it again, armed with my new knowledge of how it works. In the results, when I opened the metadata.csv, a number of my documents were no longer present; that is to say, no results recorded for them. I had n set for 1000, so I thought perhaps the missing ones were smaller and somehow got folded into the previous 1000-chunk, but no, the missing ones should have been split into three or four chunks at least. So I'm not sure what's going on there... I can't seem to see the commonality between the documents that get dropped.

@senderle
Copy link
Owner

Hm, that is more worrying. It's always easier to debug these things with the data that causes the problem -- is it something you can share with me privately or is it too sensitive?

I have also run into bugs in which the output from one model interferes with the input to a new model. I thought I had caught all of them but it's possible I didn't. Could you try re-running the model with a new blank output directory and an input directory that you are 100% sure contains just the original, correct files?

If the same results appear, I'll need some kind of minimal example to reproduce and correct the bug.

@senderle
Copy link
Owner

I'm also going to open this as a new issue, since it's distinct from the documentation problem.

@shawngraham
Copy link
Author

I'll give these things a try; the data is not mine to share, unfortunately, but I'll see what I can do. Thanks!

@senderle
Copy link
Owner

senderle commented Jan 3, 2018

@shawngraham any developments with this? I'd love to fix the problem if I possibly can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants