nano-BERT

Nano-BERT: A Simplified and Understandable Implementation of BERT

Nano-BERT is a straightforward, lightweight and comprehensible custom implementation of BERT, inspired by the foundational "Attention is All You Need" paper. The primary objective of this project is to distill the essence of transformers by simplifying the complexities and unnecessary details, making it an ideal starting point for those aiming to grasp the fundamental ideas behind transformers.

Key Features and Focus 🚀:

Simplicity and Understandability: Nano-BERT prioritizes simplicity and clarity, making it accessible for anyone looking to understand the core concepts of transformers.
Multi-Headed Self Attention: The implementation of multi-headed self-attention is intentionally less efficient but more descriptive. Each attention head is treated as a separate object, emphasizing transparency over optimization techniques like matrix transposition and efficient multiplication.
Educational Purposes: This project is designed for educational purposes, offering a learning platform for individuals interested in transformer architectures.
Customizability: Nano-BERT allows extensive customization, enabling users to experiment with various parameters such as the number of layers, heads, and embedding sizes. It serves as a playground for exploring the impact of different configurations on model performance.
Inspiration: The project draws inspiration from ongoing research endeavors related to efficient LLM fine-tuning space-model. Additionally, it is influenced by the deep learning series conducted by Andrej Karpathy YouTube, particularly the nanoGPT project.
Motivation and Development: Nano-BERT originated from the author's curiosity about embedding custom datasets into a three-dimensional space using BERT. To achieve this, the goal was to construct a fully customizable version of BERT, providing complete control over the model's behavior. The motivation was to comprehend how BERT could handle datasets with words as tokens, diverging from the common sub-word approach.

Community Engagement 💬: While Nano-BERT is not intended for production use, contributions, suggestions, and feedback from the community are highly encouraged. Users are welcome to propose improvements, simplifications, or enhanced descriptions by creating pull requests or issues.

Exploration and Experimentation 🌎: Nano-BERT's flexibility enables users to experiment freely. Parameters like the number of layers, heads, and embedding sizes can be tailored to specific needs. This customizable nature empowers users to explore diverse configurations and assess their impact on model outcomes.

Note: Nano-BERT was developed with a focus on educational exploration and understanding, and it should be utilized within the scope of educational and experimental contexts only!

Installation 🛠️

Prerequisites

Python 3.10.x
pip*

pip install torch

Note: to be able to run demos you might need some additional packages, but for base model all you needs is pytorch

pip install tqdm scikit-learn matplotlib plotly

Package installation

⚠️: currently only available through GitHub, but pip version is coming soon!

git clone https://github.com/StepanTita/nano-BERT.git

Usage Example ⚙️

from nano_bert.model import NanoBERT
from nano_bert.tokenizer import WordTokenizer

vocab = [...]  # a list of tokens (or words) to use in tokenizer

tokenizer = WordTokenizer(vocab=vocab, max_seq_len=128)

# Usage:
input_ids = tokenizer('This is a sentence')  # or tokenizer(['This', 'is', 'a', 'sentence'])

# Instantiate the NanoBERT model
nano_bert = NanoBERT(input_ids)

# Example usage
embedded_text = nano_bert.embedding(input_ids)
print(embedded_text)

Results 📈:

Benchmarks 🏆:

For all of the following experiments we use the following configuration:

n_layer = 1
n_head = 1
dropout = 0.1
n_embed = 3
max_seq_len = 128
epochs = 200
batch_size = 32

Dataset	Accuracy	F-1 Score
IMDB Sentiment (2-class)	0.734	0.745
HateXplain Data (2-class)	0.693	0.597

Result plots IMDB:

Interpretation ⁉️:

Attentions Visualized:

Embeddings Visualized in 3D:

Note: see demo.ipynb and imdb_demo.ipynb for better examples

License 📄

This project is licensed under the MIT License. See the LICENSE.md file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

nano_bert

nano_bert

.gitignore

.gitignore

LICENSE.md

LICENSE.md

README.md

README.md

demo.ipynb

demo.ipynb

imdb_demo.ipynb

imdb_demo.ipynb

Repository files navigation

nano-BERT

Nano-BERT: A Simplified and Understandable Implementation of BERT

Installation 🛠️

Prerequisites

Package installation

Usage Example ⚙️

Results 📈:

Benchmarks 🏆:

Interpretation ⁉️:

Attentions Visualized:

Embeddings Visualized in 3D:

License 📄

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
nano_bert		nano_bert
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
demo.ipynb		demo.ipynb
imdb_demo.ipynb		imdb_demo.ipynb

License

StepanTita/nano-BERT

Folders and files

Latest commit

History

Repository files navigation

nano-BERT

Nano-BERT: A Simplified and Understandable Implementation of BERT

Installation 🛠️

Prerequisites

Package installation

Usage Example ⚙️

Results 📈:

Benchmarks 🏆:

Interpretation ⁉️:

Attentions Visualized:

Embeddings Visualized in 3D:

License 📄

About

Topics

Resources

License

Stars

Watchers

Forks

Languages