Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tidy up developer functions #2069

Open
kbenoit opened this issue Feb 26, 2021 · 3 comments
Open

Tidy up developer functions #2069

kbenoit opened this issue Feb 26, 2021 · 3 comments

Comments

@kbenoit
Copy link
Collaborator

kbenoit commented Feb 26, 2021

These have changed a lot recently and I want to get my head clearly around these functions, and how we package them together and document them. I'm starting this issue to flag it but will continue to develop the notes here.

Functions affected:

  • object2id()
  • object2fixed()
  • pattern2id()
  • pattern2fixed()
  • index_types()
  • index() (aka locate())
@kbenoit
Copy link
Collaborator Author

kbenoit commented Mar 3, 2021

Should also address #2062

@kbenoit
Copy link
Collaborator Author

kbenoit commented Mar 20, 2021

So what I meant in this PR is that especially for object2*() and pattern2*() functions, these are important building blocks of our functionality that could be useful by other developers (or by our future selves or other quanteda developers). These function in similar ways but it's not clear which should be used when.

NOTE: We don't necessarily need these tidied up before v3 release, since they are internal, but I think that tidying them up could help met the goal you expressed of promoting core functions for developers. For instance if we create a developer vignette and talk about some of our internal functions and structures.

> pattern <- list(c("^a$", "^b"), c("c"), c("d"))
> types <- c("A", "AA", "B", "BB", "BBB", "C", "CC")
> pattern2fixed(pattern, types, "regex", case_insensitive = TRUE)
[[1]]
[1] "A" "B"

[[2]]
[1] "A"  "BB"

[[3]]
[1] "A"   "BBB"

[[4]]
[1] "C"

[[5]]
[1] "CC"

> object2fixed(pattern, types, "regex", case_insensitive = TRUE)
$`^a$ ^b`
[1] "A" "B"

$`^a$ ^b`
[1] "A"  "BB"

$`^a$ ^b`
[1] "A"   "BBB"

$c
[1] "C"

$c
[1] "CC"

I wonder why we do not consolidate them in pattern2*() since the input objects are also valid inputs listed in ?pattern.

Also the 2id functions are like an lapply(match(). The return for ?match():

match: An integer vector giving the position in table of the first match if there is a match, otherwise nomatch.
Would it make more sense to describe the function more this way? and potentially name it to reflect the similarity with match?

@koheiw
Copy link
Collaborator

koheiw commented Mar 20, 2021

object2*() takes various objects like dictionary and collocations. It depends on pattern2*().

*2id is the underlying function that returns positions in the type vector, so pattern2id is the mother of all the functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants