Skip to content
This repository has been archived by the owner on May 14, 2018. It is now read-only.

Add function to deal with UTF-8 characters #2

Open
karthik opened this issue Nov 6, 2013 · 3 comments
Open

Add function to deal with UTF-8 characters #2

karthik opened this issue Nov 6, 2013 · 3 comments
Assignees

Comments

@karthik
Copy link
Owner

karthik commented Nov 6, 2013

No description provided.

@hadley
Copy link

hadley commented Mar 28, 2014

Here's something that I wrote a couple of years ago:

non_ascii <- function(x) {
  any(charToRaw(x) > 0x7F)
}

0x7F = 127 - any values higher than that imply non-ascii.

@karthik karthik self-assigned this Mar 31, 2014
@karthik
Copy link
Owner Author

karthik commented Mar 31, 2014

working on this now

@harrysouthworth
Copy link

According to it's roxygen
#' This test will check every column in a data.frame for possible unicode characters.
But doesn't it just test the column names, not the contents of the columns?
ut8 <- simplify2array(lapply(colnames(dat),non_ascii))

I just wrote the following which I /think/ check the contents of the columns. (Presumably, you do want to test the column names as well, though.) It's just a wrapper for Hadley's function.

non_ascii_cols <- function(x) {
x <- x[, sapply(x, function(X) is.character(X) | is.factor(X))]
x[, sapply(x, is.factor)] <- apply(x[, sapply(x, is.factor)], 2, as.character)

res <- sapply(x, function(X) apply(matrix(X, ncol=1), 1, function(Z) non_ascii(Z) ))
apply(res, 2, function(X) sum(X) > 0)
}

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants