Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
3 changed files
with
144 additions
and
40 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
--- | ||
output: github_document | ||
--- | ||
|
||
# NorSentLex <img src="man/figures/norsentlex.png" align="right" width="120"/> | ||
|
||
<!-- badges: start --> | ||
<!-- [![CRAN Version](http://www.r-pkg.org/badges/version/stortingscrape)](https://cran.r-project.org/package=stortingscrape) --> | ||
[![Github Version](https://img.shields.io/github/r-package/v/martigso/NorSentLex?color=yellowgreen)](https://github.com/martigso/NorSentLex) | ||
<!-- [![Downloads](http://cranlogs.r-pkg.org/badges/stortingscrape)](https://cran.r-project.org/package=stortingscrape) --> | ||
<!-- [![Total Downloads](http://cranlogs.r-pkg.org/badges/grand-total/stortingscrape?color=orange)](https://cran.r-project.org/package=stortingscrape) --> | ||
<!-- [![R-CMD-check](https://github.com/martigso/stortingscrape/actions/workflows/check-standard.yaml/badge.svg)](https://github.com/martigso/stortingscrape/actions/workflows/check-standard.yaml) --> | ||
<!-- badges: end --> | ||
|
||
|
||
This repository is a R-format version of the [Norwegian sentiment lexicons](https://github.com/ltgoslo/norsentlex) as shown in Barnes et.al (2019). | ||
|
||
## Installation | ||
|
||
The package can be installed by using the `install_github()` function from the `devtools` package in R: | ||
|
||
|
||
```{r, eval=FALSE} | ||
devtools::install.github("martigso/NorSentLex") | ||
library(NorSentLex) | ||
?nor_fullform_sent | ||
?nor_lemma_sent | ||
``` | ||
|
||
## Structure and usage | ||
|
||
The package mirrors the structure of the vanilla [NorSentLex](https://github.com/ltgoslo/norsentlex) repository, but in a typical R type format. There are two available datasets: `nor_fullform_sent` and `nor_lemma_sent`. These can be easily loaded in R: | ||
|
||
```{r} | ||
data("nor_fullform_sent", package = "NorSentLex") | ||
data("nor_lemma_sent", package = "NorSentLex") | ||
``` | ||
|
||
The data are structured as follows: | ||
|
||
| Token form | Sentiment | POS | | ||
|:---------- |:----------|:----| | ||
| Fullform | Positive <br> Negative |TBD| | ||
| Lemma | Positive <br><br><br><br> Negative | adjective <br> noun <br> participle adjective <br> verb <br> adjective <br> noun <br> participle adjective <br> verb | | ||
|
||
|
||
|
||
### Fullform | ||
|
||
The fullform data contains a list with one element ("positive") of 6103 positive fullform tokens and one element ("negative") of 14839 negative fullform tokens. These can be extracted by name after loading the data into R (see above): | ||
|
||
```{r} | ||
nor_fullform_sent$positive |> | ||
head() | ||
nor_fullform_sent$negative |> | ||
head() | ||
``` | ||
|
||
### Lemma | ||
|
||
The lemmatized part of the data contain a list element for positive and negative lexicons for each of the following parts-of-speech: adjective, noun, participle adjective, and verb: | ||
|
||
```{r} | ||
names(nor_lemma_sent) | ||
``` | ||
|
||
These lexicons can also be extracted by calling the names within the list: | ||
|
||
```{r} | ||
nor_lemma_sent$lemma_noun_positive |> | ||
tail() | ||
``` | ||
|
||
## References | ||
|
||
Barnes et al. (2019) Lexicon information in neural sentiment analysis: a multi-task learning approach. Proceedings of the 22nd Nordic Conference on Computational Linguistics. Turku, Finland [ACL Anthology](https://www.aclweb.org/anthology/W19-6119/) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes