Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a parallel option #47

Open
trinker opened this issue Jul 26, 2017 · 5 comments
Open

Add a parallel option #47

trinker opened this issue Jul 26, 2017 · 5 comments

Comments

@trinker
Copy link
Owner

trinker commented Jul 26, 2017

A parallel option that runs sentiment and sentiment_by on multiple cores

@trinker
Copy link
Owner Author

trinker commented Dec 18, 2017

Dump everything out to temp rds and read back to the clusters...add a library arg

@trinker
Copy link
Owner Author

trinker commented Feb 10, 2018

Initial attempts leads to error on Windows (parallel seems to be using an old version of R and throws an error with regard to Rcpp being the wrong version fixed this by using newer version of R on path but now an error related to sentimentr indicating still an old version???). Maybe need to remove all R from path??

if (!require("pacman")) install.packages("pacman")
pacman::p_load(sentimentr, parallel, textshape, dplyr)


chunk_size <- 1e5
dir.create('data')

dat <- combine_data() %>%
    {.[rep(seq_len(nrow(.)), 100),]} %>%
    sample_n(nrow(.)) %>%
    split_index({inds <- chunk_size * 1:round(nrow(.)/chunk_size, 0); inds[inds < nrow(.)]})

tic <- Sys.time()

cl <- makeCluster(mc <- getOption("cl.cores", detectCores() - 2))

clusterEvalQ(cl, {
    library(sentimentr)
    library(lexicon)
})


parLapply(cl, dat, function(x){

    gc()

    senti_dat <- sentimentr::get_sentences(x)
    senti_dat <- sentimentr::sentiment_by(senti_dat)

    outfile <- sprintf('data/file_%s.rds', sample(1:100000))
    saveRDS(senti_dat, outfile)

}) %>%
    invisible()

stopCluster(cl)

Sys.time() - tic

Results in:

Error in checkForRemoteErrors(val) : 
  6 nodes produced errors; first error: 'get_sentences' is not an exported object from 'namespace:sentimentr'

@trinker
Copy link
Owner Author

trinker commented Sep 24, 2018

http://appliedpredictivemodeling.com/blog/2018/1/17/parallel-processing

Is either of the following a better way to run parallel code:

https://github.com/r-lib/callr
https://github.com/r-lib/processx

A OS independent solution is needed. Re investigate available solutions and reach out to the R community for current best practices.

@trinker
Copy link
Owner Author

trinker commented Sep 24, 2018

Here's where I ask the R community: https://twitter.com/tylerrinker/status/1044364197797265408

@bkmgit
Copy link

bkmgit commented Oct 29, 2020

Some other packages:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants