Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

removing custom stopwords #276

Open
kjvow opened this issue Oct 17, 2022 · 1 comment
Open

removing custom stopwords #276

kjvow opened this issue Oct 17, 2022 · 1 comment

Comments

@kjvow
Copy link

kjvow commented Oct 17, 2022

I am trying to remove some common words from my Swedish corpus, apart from the Snowball-stopwords, but the textProcessor keeps missing them. I've tried both to create a character vector including the stopwords (customstopwords="stopwords") as well as having the stopwords in a dataframe (customstopwords="stopwords$V1") but they are not removed from the corpus. I've also tried both with and without quotation-marks.

Anyone know what the problem is?

textProcessor(data$ARTICLE, metadata=data, language = "swe", customstopwords="stopwords")

@oguzozbay
Copy link

oguzozbay commented Feb 23, 2023

below I can import my stopwords =>

library(readxl)
stopwords_oguz_for_STM <- read_excel("stopwords_oguz_for_STM.xlsx",
sheet = "stopwords_final", col_types = c("text", "skip")) # stopwords imported from an excel file

below I created an new column named "replacements_step_1_no_stop " in which my stopwords will be deleted =>

M <- M %>%
mutate(replacements_step_1_no_stop = text_for_STM %>%
tm::removeWords(words = stopwords_oguz_for_STM$stop_word)

NOTES: text_for_STM => Name of the column to be analysed with STM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants