GitHub - lucidrains/pause-transformer: Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount of time on any token

Pause Transformer (wip)

Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount of time on any token.

Again, the idea relies on axial attention; one axis attends along the sequence length as in the usual transformer, the other along a thinking or pause dimension.

Todo

allow for custom pause distributions across token
see if one can do a two pass, using the logit entropy as a way to decide how to shape the pause mask
run experiments on enwik8, but if do not see anything, move onwards to something harder, say arithmetic

Citations

@inproceedings{Goyal2023ThinkBY,
    title   = {Think before you speak: Training Language Models With Pause Tokens},
    author  = {Sachin Goyal and Ziwei Ji and Ankit Singh Rawat and Aditya Krishna Menon and Sanjiv Kumar and Vaishnavh Nagarajan},
    year    = {2023},
    url     = {https://api.semanticscholar.org/CorpusID:263608983}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
pause_transformer		pause_transformer
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pause.png		pause.png
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

pause_transformer

pause_transformer

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

pause.png

pause.png

setup.py

setup.py

Repository files navigation

Pause Transformer (wip)

Todo

Citations

About

Releases 7

Packages

Languages

License

lucidrains/pause-transformer

Folders and files

Latest commit

History

Repository files navigation

Pause Transformer (wip)

Todo

Citations

About

Topics

Resources

License

Stars

Watchers

Forks

Languages