Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce order for results with same confidence #10

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

xWTF
Copy link

@xWTF xWTF commented Jan 30, 2023

Hi gogs developers,
The current implementation utlizes go routines to speed up detection, which makes perfect sense.

But the consistency of result is not guaranteed when multiple detectors returning same confidence.

POC:

  1. Encode "ノエル" with Shift_JIS => "\x83m\x83G\x83\x8b"
  2. Try to detect with DetectBest
  3. The result is randomlly picked from one of the following: Shift_JIS, GB18030 and Big5
    Because they all have the same confidence 10
  4. Try to detect with DetectAll
  5. The result order is not consist between runs 😢
  6. For the same byte sequence, decoding with different charset obviously results in different content.
  7. And this breaks apps willing to detect whether the content has changed 💥

Fix:

  1. Introduce Result.order field
  2. Sort the result (or replace the result in DetectBest) based on confidence, if the confidence is same, sort based on order
  3. This guarantees the consistency of result
  4. Although the encoding detected MAY NOT BE CORRECT, the output is ALWAYS SAME for same input

@xWTF xWTF deleted the branch gogs:master February 8, 2023 09:50
@xWTF xWTF closed this Feb 8, 2023
@xWTF xWTF deleted the master branch February 8, 2023 09:50
@xWTF xWTF restored the master branch February 8, 2023 09:50
@xWTF xWTF deleted the master branch February 8, 2023 09:51
@xWTF xWTF restored the master branch February 8, 2023 09:52
@xWTF xWTF reopened this Feb 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant