Skip to content

Releases: PolMine/RcppCWB

Miniscule Step

01 Jul 19:49
Compare
Choose a tag to compare
  • The configure script now covers the case of Power PCs. Files for the power pc scenario have been added to src/cwb/config/platform; darwin-64 has been renamed to darwin-x86_64 as a matter of consistency #79.
  • Warning "variable 'nr_targets' set but not used" for files newly reported by Apple clang version 14.0.3 (clang-1403.0.22.14.1) is addressed #83.
  • Misleading indentation warning issued by clang-15 addressed #85.
  • cwb_encode(), cwb_makeall(), cwb_huffcode() and cwb_compress_rdx() perform tilde expansion on filename provided by argument registry, avoiding a crash #84.

Red Feather

15 Jun 11:07
Compare
Choose a tag to compare
  • New function region_to_strucs() to get minimumum and maximum struc of s-attribute within region provided. Works also for nested s-attributes.
  • New function region_matrix_to_struc_matrix().
  • Functions cl_cpos2lbound() and cl_cpos2rbound() return NA if corpus position is outside stru for given s-attribute. #78.
  • Functions cl_cpos2lbound() and cl_cpos2rbound() are exposed directly from C++ without R wrappers, improving performance. Using the environment variable 'CORPUS_REGISTRY' if argument registry is handled implicitly now.

Houseboat

01 Sep 09:45
Compare
Choose a tag to compare
  • Fixed package configuration that prevented that compiler is used for compiling
    CWB C scripts as intended #66.
  • Adding '-luuid' to PKG_FLAGS in Makevars solves linker issue FOLDERID_ #67.
  • GitHub Actions now working for Windows #47.

Seven Sisters

30 Mar 19:47
Compare
Choose a tag to compare
  • The example for corpus_data_dir() dir not work as intended without
    explicitly setting the registry argument. Fixed.
  • New functions corpus_info_file(), corpus_full_name(),
    corpus_p_attributes(), corpus_s_attributes(), corpus_properties() and
    corpus_property() to retrieve registry file data.
  • New function corpus_registry_dir().
  • The path to the info file in the registry file of the REUTERS corpus was
    broken. Fixed.

Wolpertinger

01 Feb 11:20
Compare
Choose a tag to compare

New Features

  • The CWB code is updated to v3.4.33 / r1690 (#29). Automated patches that have been developed are a safeguard that it will be painless in the future to align RcppCWB with upstream CWB development.
  • The C code in the files cwb-huffcode.c, cwb-compress-rdx.c and cwb-makeall.c was not in line with the CWB version of the rest of the code (v3.4.14 / SVN revision 1069) but rather v2.2.b99 or v3.0.0. All code changes up to v3.4.14 were reconstructed and implemented (#35). Note that cwb-encode.c was at CWB v3.4.14, as the encoding functionality was exposed at a later stage.
  • A new function cwb_version() will report the version of the CWB source code.
  • The cwb_encode() function now has a previously missing argument encoding to state the encoding of the corpus to be indexed.
  • Reduced number of example *.vrt-files to one to keep package size below 5GB.

Minor Improvements

  • Encoding a cropus using cwb_encode() now assumes implicitly that input files are XML files and remove blank lines and leading and trailing whitespace. This is equivalent to the option "-xsB" of the command line utility cwb-encode.
  • The C++ code of cwb_encode() is now a patch of the main() function of cwb-encode.c, so that code in the *.cpp file can be limited to a slim wrapper, limiting the risk that the code in RcppCWB looses touch with CWB upstream development.
  • Header files _eval.h, _globalvars.h and _cl.h in the ./src directory are autogenerated files now, not to be edited by hand.
  • The C++ code of the cqp_drop_subcorpus() function is temporarily disabled to ensure that the package can be built (#34).

Jaberwocky

14 Dec 03:06
Compare
Choose a tag to compare
  • Fixed a mishandling of paths on Windows in check_corpus() that would trigger resetting the registry unintendendly and potentially falsely.
  • To avoid a compiler warning (unused variable) issued by Rcpp solved by Rcpp v1.0.7, this version of Rcpp is now required (#22).
  • In use_tmp_dir(), normalizePath() is applied on the tempdir() result to avoid confusion with symbolic links on macOS.
  • New unit test for cwb_encode() (not yet run on Windows).
  • A C-level inconsistency in cqp_get_registry() that would sometimes result in a wrong return value (i.e. registry path) has been fixed (#14).
  • To avoid an unintended behavior of cwb_makeall(), an internal check is performed whether the corpus has been loaded already and whether the home directory of the loaded corpus and defined in the registry file are identical (#31).
  • The link to the TXM project has been removed from the documentation to avoid the error 'SSL certificate problem: unable to get local issuer certificate' (#32).
  • The cl_delete_corpus() function crashed when trying to delete a corpus that has not been loaded (#33). The function now aborts gracefully returning 0 when trying to delete a corpus that has not been loaded.
  • A new function corpus_is_loaded() can be used to check whether a corpus is loaded.

Mole Paw

28 Jun 11:46
Compare
Choose a tag to compare

New Features

  • Encode XML (vrt file format) with new function cwb_encode() that exposes functionality of cwb-encode CWB utility.
  • Functions cl_cpos2lbound() and cl_cpos2rbound() will now accept an integer vector with length > 1 as argument cpos and return a vector with the same length. Useful to speed up iterated queries for left and right boundaries of regions (#19).
  • A new function cl_struc_values() exposes the corresponding C function of the Corpus Library (CL). The previous implicit assumption that all structural attributes have values can thus be tested. Intended to work with annotations of sentences and paragraphs, i.e. common structural attributes that do usually not have values.
  • A new function corpus_data_dir() will derive the data directory from the internal C representation of a corpus.
  • New function s_attr_regions() will derive regions defined by a structural attribute from the *.rng file. Fastest option for large corpora.
  • New functions s_attr_is_sibling() and s_attr_is_descendent() test the sibling/descendent relationship of structural attributes.

Minor Improvements

  • Function check_corpus() now includes checks whether the registry provided (argument registry) is identical with the registry defined internally by CQP. The registry is reset if directories are not identical.
  • Minor adjustments of configure script for aarch64, adding -fPIC to CFLAGS so that this flag will be used when Linux default configuration is used as fallback.
  • The implementation of the s_attribute_decode() method was incomplete for method "Rcpp". This alternative to the "pure R" approach is now implemented (#2).
  • The unused file 'setpaths.R' has been removed from the tools directory (#10).
  • The argument method previously setting "wininet" in ./tools/winlibs.R is omitted to avoid the warning "the 'wininet' method is deprecated for http:// and https:// URLs" on Windows.
  • The configure script will print the libdirs derived using pcre-config and link against libintl on macOS by default.

Dune Ride

04 Feb 14:15
Compare
Choose a tag to compare
  • If RcppCWB is compiled on macOS, the package configure script checks the architecture of the machine and ensures that (if glib-2.0 is not yet present) a version of glib-2.0 compiled for Apple Silicon/the M1 chip is loaded in case an amd64 architecture is detected.
  • The package configure script now uses pcre-config to locate header files of PCRE.
  • The configure script checks whether pcre has been compiled with Unicode properties support. If not, a warning is issued that also explains the recommended solution to use '--enable-unicode-properties' when calling configure.

Sunrise

09 Jul 06:59
Compare
Choose a tag to compare
  • To avoid warnings when running R CMD check, the http://pcre.org is used rather than https://pcre.org in the DESCRIPTION and the README file.
  • To overcome a somewhat dirty solution for multiple symbol definitions, adding the 'fcommon' flag to the CFLAGS in the configure script has been removed. The C code has been modified such that multiple symbol definitions are omitted.
  • The macOS image used for test on Travis CI is now 'xcode9.4'
  • On Solaris, the configure script would define the flag "-Wl,--allow-multiple-definition" to be passed to the linker flags. The rework of the CWB includes and the inclusion of the header file 'env.h' makes it possible to drop this flag. It was defined at a confusing place anyway.
  • Using the compiler desired by the user (in Makeconf, Makevars file) is now there for all OSes.
  • If pkg-config is not present on macOS, a warning is issued; the user gets the advice to use the brew package manager to install pkg-config.
  • There is an explicit check in the configure script whether the dependencies ncurses, pcre and glib-2.0 are present. If not, a telling error with installation instructions is displayed.
  • When unloading the package, the dynamic library RcppCWB.so is unloaded.
  • When loading the package, CQP is initialized by default (call cqp_initialize())

v0.2.7

15 Jan 12:05
Compare
Choose a tag to compare

RcppCWB 0.2.7

  • If glib-2.0 is not present on macOS, binaries of the static library and
    header files are downloaded from a GitHub repo. This prepares to get RcppCWB
    pass macOS checking on CRAN machines.
  • A slight modification of the C code will now prevent previous crashes resulting
    from a faulty CQP syntax. The solution will not yet be effective for Windows
    systems until we have recompiled the libcqp static library that is downloaded
    during the installation process.
  • A new C++-level function 'check_corpus' checks whether a given corpus is
    available and is used by the check_corpus()-function. Problems with
    the previous implementation that relied on files in the registry directory to
    ensure the presence of a corpus hopefully do not occur.
  • Calling the 'find_readline.perl' utility script is omitted on macOS, so
    previous warning messages when running the makefile do not show up any more.