Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unicode-string.dat #3

Open
goldenMetteyya opened this issue Jun 8, 2019 · 3 comments
Open

unicode-string.dat #3

goldenMetteyya opened this issue Jun 8, 2019 · 3 comments

Comments

@goldenMetteyya
Copy link

Hello,

Great work here with the extension to bencode. I decided that your extension was great with the needed extras. I am working on a rust implementation. In my testing I found that

u146:秋江에 밤이 드니 물결이 차노매라
낚시 드리치니 고기 아니 무노매라
無心한 달빛만 싣고 빈 배 저어 오노라

is not u146 but u145, have you checked this ?

thanks

@dahlia
Copy link
Contributor

dahlia commented Jun 10, 2019

$ wc -c testsuite/unicode-string.dat 
151 testsuite/unicode-string.dat

The data file in itself consists of 151 bytes. The prefix u146: occupies 5 bytes, so the rest of the file indeed are 146 bytes. I guess your implementation might produce a different UTF-8 bytes or you might miss out the last line feed character (U+000A LINE FEED). If you look into the corresponding JSON file (unicode-string.json) the value string ends with a \n character.

@goldenMetteyya
Copy link
Author

```shell
$ wc -c testsuite/unicode-string.dat 
151 testsuite/unicode-string.dat

The data file in itself consists of 151 bytes. The prefix u146: occupies 5 bytes, so the rest of the file indeed are 146 bytes. I guess your implementation might produce a different UTF-8 bytes or you might miss out the last line feed character (U+000A LINE FEED). If you look into the corresponding JSON file (unicode-string.json) the value string ends with a \n character.

Hello,

I will check again but it might be the missing newline character then.

thanks

@dahlia
Copy link
Contributor

dahlia commented Jan 14, 2020

@goldenMetteyya I am sorry for the late response. How does wc -c testsuite/unicode-string.dat say on your system?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants