Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detecting precision of values in lookup list #254

Open
jhoetter opened this issue May 16, 2023 · 0 comments
Open

Detecting precision of values in lookup list #254

jhoetter opened this issue May 16, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@jhoetter
Copy link
Member

Is your feature request related to a problem? Please describe.
I love lookup lists, but i typically just have one lookuplist per label. When I collect all values in just one list, it can happen that a few values actually cause a bad performance. E.g. recently, I labeled the word "riot" to be negative, and then built a labeling function that looks for these words. because of "riot", i also hit words such as "patriot" (just an example).

In general, lookup lists don't always have a 100% precision. Adding words can make them worse, especially if they are very short and can be part of other words as well that have a different meaning.

Describe the solution you'd like
When I add new values to a lookup list, I'd love to see how precise the association of the value to the given label actually is on item-level. For instance, I want to see that "riot" has a precision of 0.5 in my lookup list.

In general, I just want to have some help that tells me if an item in a lookup list shouldn't be in there.

Describe alternatives you've considered
Just theoretically, I could add a labeling function for each and every item of the lookup list and thus calculate the stats. It's clear that I don't want to do that, especially because of the I/O.

What I could do, however, is to run an analysis on a lookup list on demand (e.g. when I actively request the calculation) that calculates the precision-stats for every item in a list individually, given that the label has the same name as the lookup list (or alternatively, i could enter the label stats i want to analyze). The computationally expensive part is not to run the stats individually, but it is to gather the data and put them in the containerized envs. So this should be possible afaik.

Additional context
-

@jhoetter jhoetter added the enhancement New feature or request label May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant