Skip to content

Commit

Permalink
merged new features of dev into main
Browse files Browse the repository at this point in the history
  • Loading branch information
ChristophLeonhardt committed Apr 19, 2023
2 parents c71de29 + 745aebc commit 40a007b
Show file tree
Hide file tree
Showing 14 changed files with 885 additions and 377 deletions.
12 changes: 8 additions & 4 deletions DESCRIPTION
@@ -1,8 +1,8 @@
Package: LinkTools
Type: Package
Title: LinkTools
Version: 0.0.1.9003
Date: 2023-02-07
Version: 0.0.1.9004
Date: 2023-04-19
Author: Christoph Leonhardt
Maintainer: Christoph Leonhardt <someemail@someemailasdasdasdadsadasd.de>
Description: This package facilitates the linkage of datasets via shared unique identifiers. Four steps are integrated into this package: a) the preparation of datasets which should be linked, i.e. the transformation into a comparable format and the assignment of shared unique identifiers, b) the merge of datasets based on these identifiers, c) the encoding or enrichment of the data with three output formats (data.table, XML or CWB). In addition, d), the package includes a wrapper for the Named Entity Linking of textual data based on DBPedia Spotlight.
Expand All @@ -22,12 +22,16 @@ Imports:
rhandsontable,
stringr
Suggests:
btmp,
knitr,
devtools,
DT
DT,
testthat (>= 3.0.0),
withr
VignetteBuilder: knitr
LazyData: yes
License: GPL (>= 3)
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.2
RoxygenNote: 7.2.3
Config/testthat/edition: 3
1 change: 1 addition & 0 deletions NAMESPACE
Expand Up @@ -10,6 +10,7 @@ importFrom(data.table,":=")
importFrom(data.table,data.table)
importFrom(data.table,is.data.table)
importFrom(data.table,rleid)
importFrom(data.table,rleidv)
importFrom(data.table,setnames)
importFrom(data.table,setorder)
importFrom(fuzzyjoin,fuzzy_join)
Expand Down
14 changes: 14 additions & 0 deletions NEWS.md
@@ -1,3 +1,17 @@
[2023-04-19]
* addressed a quite comprehensive issue in `external_attribute_to_region_matrix()` that potentially obscured speakers which were not matched, making them unavailable for both the fuzzy matching and manual inspection (issue #14)
* introduced tests
* modified vignette, uses GERMAPARLMINI as sample data and the `btmp` package for linking
* removed data and data-raw which were containing the external data now provided by the `btmp` package
* made the addition more robust for subcorpora

[2023-04-12]
* starting rework to address more diverging dataset-text data-combination
* adding the `additional_attributes` argument to `create_attribute_region_datatable()` to make manual inspection more meaningful
* made `check_and_add_missing_values()` more flexible by passing `check_for_groups` as a list and adding the `negate` argument
* added capability to use more than one fuzzy matched variable in `fuzzy_join_missing_values()`
* made more explicit use of fuzzyjoin::stringdist_join()

[2023-02-07]
# v0.0.1.9003
* Added `Depends: R (>= 3.5.0)` to DESCRIPTION to avoid warning when build the package #2.
Expand Down
600 changes: 351 additions & 249 deletions R/LTDataset.R

Large diffs are not rendered by default.

15 changes: 1 addition & 14 deletions R/LinkTools.R
Expand Up @@ -14,23 +14,10 @@
#' (data.table, XML or CWB).
#'
#' d) the package includes a wrapper for the Named Entity Linking of textual
#' data based on DBPedia Spotlight.
#' data based on DBpedia Spotlight.
#' @keywords package
#' @docType package
#' @aliases LinkTools LinkTools-package
#' @name LinkTools-package
#' @rdname LinkTools-package
NULL

#' Stammdaten with WikiData-IDs
#'
#' A minimized version of the Stammdaten of the German Bundestag of the 13th and
#' 14th legislative period with added WikiData IDs retrieved via the Wikidata Query
#' Service and added party affiliations specific for the legislative period retrieved
#' from Wikipedia. For preparation see bt_stammdaten.
#' @source https://www.bundestag.de/services/opendata (Creation Date 2021-11-04)
#' @source https://de.wikipedia.org/wiki/Liste_der_Mitglieder_des_Deutschen_Bundestages_(13._Wahlperiode) (Information Retrieved on 2021-11-23)
#' @source https://de.wikipedia.org/wiki/Liste_der_Mitglieder_des_Deutschen_Bundestages_(14._Wahlperiode) (Information Retrieved on 2021-11-23)
#' @docType data
#' @keywords datasets
"stammdaten_wikidatafied_2022_02_01_min"
23 changes: 0 additions & 23 deletions data-raw/stammdaten_wikidatafied_2022-02-01.R

This file was deleted.

Binary file removed data/stammdaten_wikidatafied_2022_02_01_min.rda
Binary file not shown.
163 changes: 140 additions & 23 deletions man/LTDataset.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

15 changes: 15 additions & 0 deletions man/LinkTools-Package.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/LinkTools-package.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 40a007b

Please sign in to comment.