如果验证码最后两位相同，似乎一定识别错误 #60

devmonkeyx · 2020-07-22T07:06:35Z

你好，在decode函数里，判断的逻辑貌似如果最后一位和前一位相同，就不会加到结果中，这样貌似导致最后两位相同的验证码一定无法识别？
例如：6666，识别过程像下面这样
a: -6-6-6--6
s: 666
最后输出的就是666

ypwhs · 2020-07-22T12:36:33Z

是的，你发现了一个bug。原来的写法是这样的：

def decode(sequence):
    a = ''.join([characters[x] for x in sequence])
    s = ''.join([x for j, x in enumerate(a[:-1]) if x != characters[0] and x != a[j+1]])
    if len(s) == 0:
        return ''
    if a[-1] != characters[0] and s[-1] != a[-1]:
        s += a[-1]
    return s

这个函数正确的写法是这样的：先去重，再去除空格：

def decode(sequence):
    a = ''.join([characters[x] for x in sequence])
    s = []
    last = None
    for x in a:
        if x != last:
            s.append(x)
            last = x
    s2 = ''.join([x for x in s if x != characters[0]])
    return s2

a = ['-', '6', '-', '6', '-', '6', '-', '-', '6']
s = ['-', '6', '-', '6', '-', '6', '-', '6']
s2 = '6666'

AzureSkyHuHu · 2020-09-05T15:24:03Z

tensor([9, 0, 9, 0, 9, 9], device='cuda:0')
&0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
['8', '&', '8', '&', '8']
感觉这个写法好像还是有问题
我这里面识别 2222 最后还是222 好像不一定是 ['-', '6', '-', '6', '-', '6', '-', '-', '6']这种格式还是我自己本身有问题.

ypwhs · 2020-09-12T09:56:59Z

def decode(sequence):
    a = ''.join([characters[x] for x in sequence])
    s = []
    last = None
    for x in a:
        if x != last:
            s.append(x)
            last = x
    s2 = ''.join([x for x in s if x != characters[0]])
    return s2

characters = '&0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
sequence = [9, 0, 9, 0, 9]
decode(sequence)
# output: 888

Doghole · 2021-02-07T10:17:37Z

For each given label, you'd better allocate a separator between each character and at the begin and end of the label before you feed a label to the model.
For example, if your label are 'ABCD', and assume your blank is '-' at index 0 of your characters, after inserting a separator '-', your label will be '-A-B-C-D-'.
Notice that after inserting, your label_length will no longer be 4 but 9 (with 4 characters and 5 blanks).
While decoding a sequence, change your code like below:

# Utilss.py
def decode_target(sequence, characters):
    s = [characters[x] for x in sequence if x != 0]
    return s

# main.py
def decode_target(sequence):
    return ''.join([characters[x] for x in sequence if x != 0]).replace(' ', '')

And keep decode_output as what it is.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

如果验证码最后两位相同，似乎一定识别错误 #60

如果验证码最后两位相同，似乎一定识别错误 #60

devmonkeyx commented Jul 22, 2020

ypwhs commented Jul 22, 2020 •

edited

AzureSkyHuHu commented Sep 5, 2020

ypwhs commented Sep 12, 2020

Doghole commented Feb 7, 2021 •

edited

如果验证码最后两位相同，似乎一定识别错误 #60

如果验证码最后两位相同，似乎一定识别错误 #60

Comments

devmonkeyx commented Jul 22, 2020

ypwhs commented Jul 22, 2020 • edited

AzureSkyHuHu commented Sep 5, 2020

ypwhs commented Sep 12, 2020

Doghole commented Feb 7, 2021 • edited

ypwhs commented Jul 22, 2020 •

edited

Doghole commented Feb 7, 2021 •

edited