Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in extractor-enum.js with original text indexes #1331

Open
alberchou opened this issue Jul 5, 2023 · 1 comment
Open

Bug in extractor-enum.js with original text indexes #1331

alberchou opened this issue Jul 5, 2023 · 1 comment

Comments

@alberchou
Copy link

Good afternoon,
I was having an issue with repeated tokens (I want to recognize operations over a query) and I think that the function extract(srcInput) on extractor-enum.js has a little bug, the originalTextIndex is being increased by token length but not by the separators.

For example:

  1. You have the following entity to be recognized: sum
  2. You process the following sentence: I want the sum of something1, sum of something2, sum of something3... , sum of something10
  3. When the number of split characters (space or ,) is not taken into account, it causes that there are values repeated in the originalPositionMap dictionary.

I'm using version 4.27.0:
npm list node-nlp
`-- node-nlp@4.27.0

It's happening in extractor-enum.js line 306 to 322 (async extract(srcInput))

Best regards.

@alberchou
Copy link
Author

alberchou commented Jul 5, 2023

I think that changing this:
originalTextIndex += tokenizeResult.tokens[i].length;

to this:
originalTextIndex = originaltextPos + tokenizeResult.tokens[i].length;

may solve the problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant