Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it really byte-level? #61

Open
LuCeHe opened this issue Sep 28, 2023 · 0 comments
Open

Is it really byte-level? #61

LuCeHe opened this issue Sep 28, 2023 · 0 comments

Comments

@LuCeHe
Copy link

LuCeHe commented Sep 28, 2023

From your paper it seems like the byte-level classification decomposes a character i.e. 'C' into its binary representation, something like 000101110, but your code gives back 68, which I think it's not what you intended, cause that is simply a char level representation.

Am I wrong?

Your dataset would be still fulfilling its purpose of using very long sequences, but I think it's not char-byte-level, but char-level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant