Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如果验证码最后两位相同,似乎一定识别错误 #60

Open
devmonkeyx opened this issue Jul 22, 2020 · 4 comments
Open

如果验证码最后两位相同,似乎一定识别错误 #60

devmonkeyx opened this issue Jul 22, 2020 · 4 comments

Comments

@devmonkeyx
Copy link

你好,在decode函数里,判断的逻辑貌似如果最后一位和前一位相同,就不会加到结果中,这样貌似导致最后两位相同的验证码一定无法识别?
例如:6666,识别过程像下面这样
a: -6-6-6--6
s: 666
最后输出的就是666

@ypwhs
Copy link
Owner

ypwhs commented Jul 22, 2020

是的,你发现了一个bug。原来的写法是这样的:

def decode(sequence):
    a = ''.join([characters[x] for x in sequence])
    s = ''.join([x for j, x in enumerate(a[:-1]) if x != characters[0] and x != a[j+1]])
    if len(s) == 0:
        return ''
    if a[-1] != characters[0] and s[-1] != a[-1]:
        s += a[-1]
    return s

这个函数正确的写法是这样的:先去重,再去除空格:

def decode(sequence):
    a = ''.join([characters[x] for x in sequence])
    s = []
    last = None
    for x in a:
        if x != last:
            s.append(x)
            last = x
    s2 = ''.join([x for x in s if x != characters[0]])
    return s2
a = ['-', '6', '-', '6', '-', '6', '-', '-', '6']
s = ['-', '6', '-', '6', '-', '6', '-', '6']
s2 = '6666'

@AzureSkyHuHu
Copy link

tensor([9, 0, 9, 0, 9, 9], device='cuda:0')
&0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
['8', '&', '8', '&', '8']
感觉这个写法好像还是有问题
我这里面识别 2222 最后还是222 好像不一定是 ['-', '6', '-', '6', '-', '6', '-', '-', '6']这种格式 还是我自己本身有问题.

@ypwhs
Copy link
Owner

ypwhs commented Sep 12, 2020

def decode(sequence):
    a = ''.join([characters[x] for x in sequence])
    s = []
    last = None
    for x in a:
        if x != last:
            s.append(x)
            last = x
    s2 = ''.join([x for x in s if x != characters[0]])
    return s2

characters = '&0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
sequence = [9, 0, 9, 0, 9]
decode(sequence)
# output: 888

@Doghole
Copy link

Doghole commented Feb 7, 2021

For each given label, you'd better allocate a separator between each character and at the begin and end of the label before you feed a label to the model.
For example, if your label are 'ABCD', and assume your blank is '-' at index 0 of your characters, after inserting a separator '-', your label will be '-A-B-C-D-'.
Notice that after inserting, your label_length will no longer be 4 but 9 (with 4 characters and 5 blanks).
While decoding a sequence, change your code like below:

# Utilss.py
def decode_target(sequence, characters):
    s = [characters[x] for x in sequence if x != 0]
    return s

# main.py
def decode_target(sequence):
    return ''.join([characters[x] for x in sequence if x != 0]).replace(' ', '')

And keep decode_output as what it is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants