don't remove existing hyphens when `use.hyphening` #311

salim-b · 2017-10-14T01:17:55Z

Consider the following reprex:

library(dplyr)
library(pander)

data_frame(a = "This a hopefully _self-explanatory_ example of unduly removed hyphens.") %>% 
    pandoc.table(split.cells = 20, use.hyphening = TRUE)
#> 
#> -------------------
#>          a         
#> -------------------
#>  This a hopefully  
#>  _selfexplanatory_ 
#>  example of unduly 
#>  removed hyphens.  
#> -------------------

In the output table the hyphen of the word self-explanatory gets removed (because the parameter rm.hyph of koRpus::hyphen() is left at it's default value of TRUE).

I'm not familiar with the code and therefore didn't submit a pull request (yet). But I guess it would be enough to add the argument rm.hyph = FALSE to the following line of helpers.R:

pander/R/helpers.R

Line 404 in 32e0f75

koRpus::hyphen(s, hyph.pattern = 'en.us', quiet = TRUE)@hyphen[1, 2]

What do you think? Alternatively, if you see any benefit/use case in having the hyphenator removing existing hyphens beforehand (I don't), an additional parameter could be introduced which passes the the option on to koRpus::hyphen.

The text was updated successfully, but these errors were encountered:

daroczig · 2017-10-14T22:22:53Z

Good catch, thanks! I'd be open to this, good idea, but not sure if that works:

> koRpus::hyphen('foobar and foo self-explanatory ok', hyph.pattern = 'en.us', rm.hyph = TRUE)@hyphen[1, 2] 
Hyphenation (language: en.us)
[1] "foo-bar an-d f-oo s-el-fexpl-an-atory ok"
> koRpus::hyphen('foobar and foo self-explanatory ok', hyph.pattern = 'en.us', rm.hyph = FALSE)@hyphen[1, 2] 
Hyphenation (language: en.us)
[1] "foo-bar an-d f-oo s-el-fexpl-an-atory ok"

Any ideas?

salim-b · 2017-10-14T23:08:22Z

Any ideas?

Well, I'm not familiar at all with the koRpus package, but maybe I was wrong with the assumption that the parameter rm.hyph was responsible for the unduly removed hyphens.

Anyway, as far as I understand it, the hyphen() function expects a character vector of words, not sentences.

Consider this modification of your first example:

suppressPackageStartupMessages({
  library(dplyr)
  library(magrittr)
  library(stringr)
})


"foobar and foo self-explanatory ok" %>% str_split(pattern = " ") %>% unlist() %>% 
  koRpus::hyphen(hyph.pattern = "en.us", rm.hyph = TRUE, quiet = TRUE) %>% 
  slot("hyphen") %$% word
#> [1] "foo-bar"             "and"                 "foo"                
#> [4] "self-ex-plana-to-ry" "ok"

Now interestingly, if rm.hyph is set to FALSE, the hyphenation isn't correct anymore:

suppressPackageStartupMessages({
  library(dplyr)
  library(magrittr)
  library(stringr)
})


"foobar and foo self-explanatory ok" %>% str_split(pattern = " ") %>% unlist() %>% 
  koRpus::hyphen(hyph.pattern = "en.us", rm.hyph = FALSE, quiet = TRUE) %>% 
  slot("hyphen") %$% word
#> [1] "foo-bar"             "and"                 "foo"                
#> [4] "self-e-xplan-at-ory" "ok"

So it might have it's reason that the default value is TRUE... 😜

Do I get it right that you're currently feeding whole sentences to the hyphen() function in helpers.R? If so, splitting the sentences into words beforehand (and leaving rm.hyph at it's default value) might solve the issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

don't remove existing hyphens when `use.hyphening` #311

don't remove existing hyphens when `use.hyphening` #311

salim-b commented Oct 14, 2017

daroczig commented Oct 14, 2017

salim-b commented Oct 14, 2017

don't remove existing hyphens when use.hyphening #311

don't remove existing hyphens when use.hyphening #311

Comments

salim-b commented Oct 14, 2017

daroczig commented Oct 14, 2017

salim-b commented Oct 14, 2017

don't remove existing hyphens when `use.hyphening` #311

don't remove existing hyphens when `use.hyphening` #311