Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about the input shape #3

Open
LiangHao92 opened this issue Oct 22, 2018 · 11 comments
Open

about the input shape #3

LiangHao92 opened this issue Oct 22, 2018 · 11 comments

Comments

@LiangHao92
Copy link

I found your model has the certain size of input, so, how can your recognize images with uncertain size? Like a 64*500 image, if resize the image, it main destroy its aspect ratio and influence the result, is it?

@sbillburg
Copy link
Owner

The input size is set by you before starting the training, and it's fixed. Once you train a model in one input shape, than rest inputs should be in the same size, including training dataset and test dataset.

My method is, set a aspect ratio like width:height = 5:1, and only a few inputs are bigger than this ratio, I resize them to 5:1. The neural network will learn features from these resized images, and if a image is so long, it will contains some features that is unique and good for recognize.
For those images which are smaller than this ratio, I add vain block(a pure black RGB(0, 0, 0) image) on both side of the image. Or say, generate a pure black image in 5:1 aspect ratio, then put the input image whose aspect ratio is smaller than 5:1 into the center of the black image.
You can find my method in the CRNN-with-STN/Batch_Generator.py, line38~line44.

My statement maybe nor clear, if you still get any question, please tell me. My English is not very good, but I'd love to help you.

@LiangHao92
Copy link
Author

@sbillburg thanks a lot! I have got your point.

@sbillburg
Copy link
Owner

看了一下才发现您是国人,那我就直接再用中文给你说一遍了。
输入长宽比不一样,在resize以后确实会影响识别结果。

所以对我来说,我的思路就是尽量少的去resize。比如我设定一个宽高比5:1, 然后在数据集里生成训练batch的时候,把所有宽高比高于5:1的图片(说明图片很宽,横向很长)直接压缩为5:1,虽然会有图像上的损失或者说失真,但是如果宽高比很高,就说明单词很长,特征很明显,对于网络来说也不难识别了。

对于长宽比小于5;1的图片,说明其宽度较窄,我会在其两遍加上纯黑色的色块,生成一个5:1的图像,原始的图像长宽比并没有改变,而是靠额外的拼接使得图像达到了需要的比例。纯黑色的色块对于网络来说也会学习为‘什么都不输出’,所以不必担心识别错误的问题。

相关的实现方法在CRNN-with-STN/Batch_Generator.py, line38~line44 可以看到,如果您还有不明白的地方可以直接问我或发邮件。

@LiangHao92
Copy link
Author

@sbillburg 哈哈哈,谢谢你了。我觉得你加了stn效果并不比没加stn效果好的原因是stn加在了后面,如果字符行本身旋转角度不大,那么其实形变比较小,后面的特征图,特别是经过了maxpooling的特征图,的特征都是经过了提炼的,你再去stn仿射变换可能效果不如直接在输入的时候做stn效果来的妥当。

@qwzhong1988
Copy link

CRNN-with-STN/Batch_Generator.py里面的38行
if (img_size[1]/img_size[0]*1.0) < 6.4:
要加个括号
if (img_size[1]/(img_size[0]*1.0)) < 6.4:
76行类似。

@sbillburg
Copy link
Owner

CRNN-with-STN/Batch_Generator.py里面的38行
if (img_size[1]/img_size[0]*1.0) < 6.4:
要加个括号
if (img_size[1]/(img_size[0]*1.0)) < 6.4:
76行类似。

Can you tell me the difference? It seems the same in Python3 with or without the parentheses

@qwzhong1988
Copy link

qwzhong1988 commented Nov 13, 2018

Python3没有问题,Python2的时候会有区别,习惯上加个括号比较好

@qwzhong1988
Copy link

想问下,STN加在batchnorm_7这个位置,有什么论文或者理论依据吗??

@sbillburg
Copy link
Owner

想问下,STN加在batchnorm_7这个位置,有什么论文或者理论依据吗??

没有,STN整个部分相当于一个模块,我只是加在了CNN和RNN之间,你可以把这一模块放在网络的任意位置,说不定可以取得更好的效果。本项目只是对于CRNN的Keras实现,以及STN的一些尝试。

@jingwanli6666
Copy link

在调用loc_net函数时报错
image,请问如何解决,谢谢!

@sbillburg
Copy link
Owner

sbillburg commented Nov 28, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants