Skip to content

Commit

Permalink
Merge branch 'main' of https://github.com/PolMine/LinkTools
Browse files Browse the repository at this point in the history
  • Loading branch information
Andreas Blätte authored and Andreas Blätte committed Apr 23, 2023
2 parents a42eca9 + 31aa59b commit b0cc7ff
Show file tree
Hide file tree
Showing 16 changed files with 1,457 additions and 535 deletions.
17 changes: 11 additions & 6 deletions DESCRIPTION
@@ -1,14 +1,15 @@
Package: LinkTools
Type: Package
Title: LinkTools
Version: 0.0.1.9003
Date: 2023-02-07
Version: 0.0.1.9005
Date: 2023-04-22
Author: Christoph Leonhardt
Maintainer: Christoph Leonhardt <someemail@someemailasdasdasdadsadasd.de>
Description: This package facilitates the linkage of datasets via shared unique identifiers. Four steps are integrated into this package: a) the preparation of datasets which should be linked, i.e. the transformation into a comparable format and the assignment of shared unique identifiers, b) the merge of datasets based on these identifiers, c) the encoding or enrichment of the data with three output formats (data.table, XML or CWB). In addition, d), the package includes a wrapper for the Named Entity Linking of textual data based on DBPedia Spotlight.
Maintainer: Christoph Leonhardt <christoph.leonhardt@uni-due.de>
Description: This package facilitates the linkage of datasets. Once finished, four steps should be integrated into this package: a) the preparation of datasets which should be linked, i.e. the transformation into a comparable format and the assignment of new information, in particular shared unique identifiers, b) the merge of datasets based on these identifiers, c) the encoding or enrichment of the data with different output formats (data.table, XML or CWB). In addition, d), the package includes a wrapper for the Named Entity Linking of textual data based on DBpedia Spotlight.
Depends:
R (>= 3.5.0)
Imports:
cli,
data.table,
cwbtools,
polmineR,
Expand All @@ -22,12 +23,16 @@ Imports:
rhandsontable,
stringr
Suggests:
btmp,
knitr,
devtools,
DT
DT,
testthat (>= 3.0.0),
withr
VignetteBuilder: knitr
LazyData: yes
License: GPL (>= 3)
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.2
RoxygenNote: 7.2.3
Config/testthat/edition: 3
Binary file added LinkTools_interactive_matching_gui_README.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 7 additions & 0 deletions NAMESPACE
Expand Up @@ -4,12 +4,19 @@ export(LTDataset)
importFrom(R6,R6Class)
importFrom(RcppCWB,cl_cpos2struc)
importFrom(RcppCWB,cl_struc2str)
importFrom(cli,cli_abort)
importFrom(cli,cli_alert_info)
importFrom(cli,cli_alert_success)
importFrom(cli,cli_alert_warning)
importFrom(cli,cli_progress_message)
importFrom(cli,cli_progress_update)
importFrom(cwbtools,registry_file_parse)
importFrom(cwbtools,s_attribute_encode)
importFrom(data.table,":=")
importFrom(data.table,data.table)
importFrom(data.table,is.data.table)
importFrom(data.table,rleid)
importFrom(data.table,rleidv)
importFrom(data.table,setnames)
importFrom(data.table,setorder)
importFrom(fuzzyjoin,fuzzy_join)
Expand Down
20 changes: 20 additions & 0 deletions NEWS.md
@@ -1,3 +1,23 @@
V0.0.1.9005
* replaced `\code{}` tags in the documentation
* reduced verbosity of intermediate steps
* introduced a check if the external dataset contains NA values in significant columns before fuzzy matching
* introduced messages from the `cli` package

[2023-04-19]
* addressed a quite comprehensive issue in `external_attribute_to_region_matrix()` that potentially obscured speakers which were not matched, making them unavailable for both the fuzzy matching and manual inspection (issue #14)
* introduced tests
* modified vignette, uses GERMAPARLMINI as sample data and the `btmp` package for linking
* removed data and data-raw which were containing the external data now provided by the `btmp` package
* made the addition more robust for subcorpora

[2023-04-12]
* starting rework to address more diverging dataset-text data-combination
* adding the `additional_attributes` argument to `create_attribute_region_datatable()` to make manual inspection more meaningful
* made `check_and_add_missing_values()` more flexible by passing `check_for_groups` as a list and adding the `negate` argument
* added capability to use more than one fuzzy matched variable in `fuzzy_join_missing_values()`
* made more explicit use of fuzzyjoin::stringdist_join()

[2023-02-07]
# v0.0.1.9003
* Added `Depends: R (>= 3.5.0)` to DESCRIPTION to avoid warning when build the package #2.
Expand Down

0 comments on commit b0cc7ff

Please sign in to comment.