Feature: Improve the calculation of similarity scores between answers and correct solutions #2

blmage · 2020-05-15T10:41:42Z

Currently, the similarity between answers and correct solutions is computed as-is, with only Unicode normalization being applied. Therefore, accented letters and their unaccented counterparts are considered completely different characters.

While this is desirable when the user enters a "perfect" answer with regards to accents, it turns out that the results can get quite random in the contrary case.

A solution would be to compute two similarity scores, applying more or less normalization, then averaging them in a consistent way.

blmage · 2020-06-21T08:01:58Z

Rather leave the choice to the user of what is significant and what is not, as an option (see #25). This could include:

case,
accents,
~~punctuation,~~
~~spaces,~~
word order (using an adapted version of the diff package, or probably rather the SentenceSimilarity package - benchmark this on big lists of solutions to check whether this is a no-go).

tobiornottobi · 2020-09-01T08:24:18Z

In my experience the order is completely off. There have been absurd sentences at the top (without any noticeable similarity) when the alphabetical sort gave me much more similar answers.

blmage · 2020-09-05T07:41:15Z

@tobiornottobi Could you please send one or two screenshots with examples of such behavior?

I'm only aware of this happening with missing or different diacritics, but I'll increase the priority of this issue if this happens to be more widespread.

Thanks!

tobiornottobi · 2020-09-05T13:52:53Z

@blmage Yes, I can. One thing I have to add: I wasn't sure if .* sort↓ button toggles the other option or says which option is currently active. The results weren't sorted alphabetically, so maybe it's actually the alphabetical sort that is broken for me.
I haven't gotten absurd suggestions this time – because the accepted answers are all reasonable and similar, but I still don't understand the order.
This is neither sorted by similarity nor alphabetically. Unless only the first word is taken into account.

This makes sense similarity-wise:

I'll try to remember making a screenshot in the future.

blmage · 2020-09-08T11:43:24Z

@tobiornottobi Thanks for the screenshots!

The UI reflects the current state, so when "Alphabetical sort ↓" is displayed, solutions are/should be sorted alphabetically and in descending order.

The order on the first screenshot seems correct, apart from the two solutions at the top, but I couldn't reproduce the same result in isolation (when testing the comparison algorithm, "ä" comes before "b", as expected).

Could you point me to a skill in the Norwegian tree that uses a lot of accented words? (I'll try to reproduce it from there instead)

tobiornottobi · 2020-10-23T14:34:10Z

@blmage Thank you. :)
The screenshot was from the Swedish tree. I can't search at the moment unfortunately.

blmage · 2020-10-26T11:31:30Z

My bad! In the case of Swedish then, this seems to be the expected behavior:

In addition to the basic twenty-six letters, A–Z, the Swedish alphabet includes Å, Ä, and Ö at the end. They are distinct letters in Swedish, and are sorted after Z as shown above.

Wikipedia

blmage added the enhancement New feature or request label May 15, 2020

blmage self-assigned this May 15, 2020

blmage changed the title ~~Improve the calculation of similarity scores between answers and correct solutions~~ Feature: Improve the calculation of similarity scores between answers and correct solutions Jun 6, 2020

blmage added this to the 2.4.0 milestone Jun 14, 2020

blmage mentioned this issue Jul 2, 2020

Feature: add some customization options #25

Open

blmage mentioned this issue Jul 14, 2020

Idea: regroup solutions by similarity score ranges and display only the most relevant solutions at first #66

Closed

blmage added the question Further information is requested label Sep 5, 2020

blmage removed this from the 3.1.0 milestone Nov 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Improve the calculation of similarity scores between answers and correct solutions #2

Feature: Improve the calculation of similarity scores between answers and correct solutions #2

blmage commented May 15, 2020

blmage commented Jun 21, 2020 •

edited

tobiornottobi commented Sep 1, 2020

blmage commented Sep 5, 2020

tobiornottobi commented Sep 5, 2020

blmage commented Sep 8, 2020

tobiornottobi commented Oct 23, 2020

blmage commented Oct 26, 2020

Feature: Improve the calculation of similarity scores between answers and correct solutions #2

Feature: Improve the calculation of similarity scores between answers and correct solutions #2

Comments

blmage commented May 15, 2020

blmage commented Jun 21, 2020 • edited

tobiornottobi commented Sep 1, 2020

blmage commented Sep 5, 2020

tobiornottobi commented Sep 5, 2020

blmage commented Sep 8, 2020

tobiornottobi commented Oct 23, 2020

blmage commented Oct 26, 2020

Wikipedia

blmage commented Jun 21, 2020 •

edited