removing custom stopwords #276

kjvow · 2022-10-17T18:09:59Z

I am trying to remove some common words from my Swedish corpus, apart from the Snowball-stopwords, but the textProcessor keeps missing them. I've tried both to create a character vector including the stopwords (customstopwords="stopwords") as well as having the stopwords in a dataframe (customstopwords="stopwords$V1") but they are not removed from the corpus. I've also tried both with and without quotation-marks.

Anyone know what the problem is?

textProcessor(data$ARTICLE, metadata=data, language = "swe", customstopwords="stopwords")

oguzozbay · 2023-02-23T07:55:03Z

below I can import my stopwords =>

library(readxl)
stopwords_oguz_for_STM <- read_excel("stopwords_oguz_for_STM.xlsx",
sheet = "stopwords_final", col_types = c("text", "skip")) # stopwords imported from an excel file

below I created an new column named "replacements_step_1_no_stop " in which my stopwords will be deleted =>

M <- M %>%
mutate(replacements_step_1_no_stop = text_for_STM %>%
tm::removeWords(words = stopwords_oguz_for_STM$stop_word)

NOTES: text_for_STM => Name of the column to be analysed with STM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

removing custom stopwords #276

removing custom stopwords #276

kjvow commented Oct 17, 2022

oguzozbay commented Feb 23, 2023 •

edited

removing custom stopwords #276

removing custom stopwords #276

Comments

kjvow commented Oct 17, 2022

oguzozbay commented Feb 23, 2023 • edited

oguzozbay commented Feb 23, 2023 •

edited