Skip to content
This repository has been archived by the owner on Apr 23, 2024. It is now read-only.

[WIP] fast wordpiece tokenization #105

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

gleb-kov
Copy link

что еще доделать:

  • CLI
  • README
  • тесты
  • таблица с бенчмарком
  • причесать VectorSegment, откатить в прежнее состояние, в новом коде использовать полиномиальное хеширование
  • форматинг кода

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant