Bug in extractor-enum.js with original text indexes #1331

alberchou · 2023-07-05T15:35:30Z

Good afternoon,
I was having an issue with repeated tokens (I want to recognize operations over a query) and I think that the function extract(srcInput) on extractor-enum.js has a little bug, the originalTextIndex is being increased by token length but not by the separators.

For example:

You have the following entity to be recognized: sum
You process the following sentence: I want the sum of something1, sum of something2, sum of something3... , sum of something10
When the number of split characters (space or ,) is not taken into account, it causes that there are values repeated in the originalPositionMap dictionary.

I'm using version 4.27.0:
npm list node-nlp
`-- node-nlp@4.27.0

It's happening in extractor-enum.js line 306 to 322 (async extract(srcInput))

Best regards.

alberchou · 2023-07-05T15:54:51Z

I think that changing this:
originalTextIndex += tokenizeResult.tokens[i].length;

to this:
originalTextIndex = originaltextPos + tokenizeResult.tokens[i].length;

may solve the problem

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in extractor-enum.js with original text indexes #1331

Bug in extractor-enum.js with original text indexes #1331

alberchou commented Jul 5, 2023

alberchou commented Jul 5, 2023 •

edited

Bug in extractor-enum.js with original text indexes #1331

Bug in extractor-enum.js with original text indexes #1331

Comments

alberchou commented Jul 5, 2023

alberchou commented Jul 5, 2023 • edited

alberchou commented Jul 5, 2023 •

edited