Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't remove existing hyphens when use.hyphening #311

Open
salim-b opened this issue Oct 14, 2017 · 2 comments
Open

don't remove existing hyphens when use.hyphening #311

salim-b opened this issue Oct 14, 2017 · 2 comments

Comments

@salim-b
Copy link

salim-b commented Oct 14, 2017

Consider the following reprex:

library(dplyr)
library(pander)

data_frame(a = "This a hopefully _self-explanatory_ example of unduly removed hyphens.") %>% 
    pandoc.table(split.cells = 20, use.hyphening = TRUE)
#> 
#> -------------------
#>          a         
#> -------------------
#>  This a hopefully  
#>  _selfexplanatory_ 
#>  example of unduly 
#>  removed hyphens.  
#> -------------------

In the output table the hyphen of the word self-explanatory gets removed (because the parameter rm.hyph of koRpus::hyphen() is left at it's default value of TRUE).

I'm not familiar with the code and therefore didn't submit a pull request (yet). But I guess it would be enough to add the argument rm.hyph = FALSE to the following line of helpers.R:

koRpus::hyphen(s, hyph.pattern = 'en.us', quiet = TRUE)@hyphen[1, 2]

What do you think? Alternatively, if you see any benefit/use case in having the hyphenator removing existing hyphens beforehand (I don't), an additional parameter could be introduced which passes the the option on to koRpus::hyphen.

@daroczig
Copy link
Member

Good catch, thanks! I'd be open to this, good idea, but not sure if that works:

> koRpus::hyphen('foobar and foo self-explanatory ok', hyph.pattern = 'en.us', rm.hyph = TRUE)@hyphen[1, 2] 
Hyphenation (language: en.us)
[1] "foo-bar an-d f-oo s-el-fexpl-an-atory ok"
> koRpus::hyphen('foobar and foo self-explanatory ok', hyph.pattern = 'en.us', rm.hyph = FALSE)@hyphen[1, 2] 
Hyphenation (language: en.us)
[1] "foo-bar an-d f-oo s-el-fexpl-an-atory ok"

Any ideas?

@salim-b
Copy link
Author

salim-b commented Oct 14, 2017

Any ideas?

Well, I'm not familiar at all with the koRpus package, but maybe I was wrong with the assumption that the parameter rm.hyph was responsible for the unduly removed hyphens.

Anyway, as far as I understand it, the hyphen() function expects a character vector of words, not sentences.

Consider this modification of your first example:

suppressPackageStartupMessages({
  library(dplyr)
  library(magrittr)
  library(stringr)
})


"foobar and foo self-explanatory ok" %>% str_split(pattern = " ") %>% unlist() %>% 
  koRpus::hyphen(hyph.pattern = "en.us", rm.hyph = TRUE, quiet = TRUE) %>% 
  slot("hyphen") %$% word
#> [1] "foo-bar"             "and"                 "foo"                
#> [4] "self-ex-plana-to-ry" "ok"

Now interestingly, if rm.hyph is set to FALSE, the hyphenation isn't correct anymore:

suppressPackageStartupMessages({
  library(dplyr)
  library(magrittr)
  library(stringr)
})


"foobar and foo self-explanatory ok" %>% str_split(pattern = " ") %>% unlist() %>% 
  koRpus::hyphen(hyph.pattern = "en.us", rm.hyph = FALSE, quiet = TRUE) %>% 
  slot("hyphen") %$% word
#> [1] "foo-bar"             "and"                 "foo"                
#> [4] "self-e-xplan-at-ory" "ok"

So it might have it's reason that the default value is TRUE... 😜

Do I get it right that you're currently feeding whole sentences to the hyphen() function in helpers.R? If so, splitting the sentences into words beforehand (and leaving rm.hyph at it's default value) might solve the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants