Skip to content
#

subword

Here are 14 public repositories matching this topic...

Tokenization is a way of separating a piece of text into smaller units called tokens. Here, tokens can be either words, characters, or subwords. Hence, tokenization can be broadly classified into 3 types – word, character, and subword (n-gram characters) tokenization.

  • Updated Jun 30, 2021
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the subword topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the subword topic, visit your repo's landing page and select "manage topics."

Learn more