-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't require a period at the end of a line for Scansion().scan_text() #1247
Comments
Hi @sjhuskey I hope my radio silence doesn't come off as me not appreciating your bug reports. I truly do, but as I explained in another issues, I simply lack the time to work on these properly. Would you consider submitting a pull request? You could give it a shot and I would make light edits if necessary. |
No worries, @kylepjohnson. I understand about having to manage several projects, and I'm grateful for the resource that you and the CLTK team have created. Yes, I can probably come up with a patch for this, but probably not for a few weeks. |
The (quick and dirty) solution for this would be to just put "\n" in the list of stops recognised by the _tokenizer function. For me this gives and then with a \n manually added at the end it would give: But this would make it a bit of a pain to run the module on an other blocks of text with odd formatting, e.g. those copied from the TLG or Diogenes with Teubner line divisions. Maybe a class argument to set the scansion module to poetry mode? |
Yes, I had been thinking about that quick and dirty solution, and I had the same qualms about it. I think your long-term solution is best. I'll give that a try. |
I finally had a chance to try the long-term solution suggested by @SDCLA. I'm using Aeschylus Ag. 55–59 as a
I did this:
The output is I inserted a period at the end of each line of
At least it treated each line individually, but the scansion is still incorrect. I'm going to think about this some more. |
Is your feature request related to a problem? Please describe.
The
Scansion().scan_text()
method won't produce a result unless the text string to be scanned ends in a period ('.').Describe the solution you'd like
It would be helpful to allow the user to specify the delimiter. In my case, I'd like to use the newline character (e.g.,
\n
) so that I can scan multiple lines of poetry.Describe alternatives you've considered
I have resorted to inserting a period at the end of every line, but that's tedious. It's also problematic if you're trying to study the relationship between line breaks and sentence termination.
Additional context
None.
The text was updated successfully, but these errors were encountered: