Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find more memory-efficient data structure for language models #18

Open
pemistahl opened this issue Nov 5, 2022 · 1 comment
Open

Find more memory-efficient data structure for language models #18

pemistahl opened this issue Nov 5, 2022 · 1 comment
Labels
enhancement New feature or request
Milestone

Comments

@pemistahl
Copy link
Owner

pemistahl commented Nov 5, 2022

Currently, the language models are loaded into simple maps at runtime. Even though accessing the maps is pretty fast, they consume a significant amount of memory. The goal is to investigate whether there are more suitable data structures available that require less storage space in memory, something like NumPy for Python.

One promising candidate could be Gonum.

@pemistahl pemistahl added the enhancement New feature or request label Nov 5, 2022
@pemistahl pemistahl added this to the Lingua 1.1.0 milestone Nov 5, 2022
@pemistahl pemistahl changed the title Reduce resources to load language models Find more memory-efficient data structure for language models Nov 20, 2022
@pemistahl pemistahl modified the milestones: Lingua 1.1.0, Lingua 1.2.0 Nov 20, 2022
@pemistahl pemistahl modified the milestones: Lingua 1.2.0, Lingua 1.3.0 Dec 12, 2022
@pemistahl pemistahl modified the milestones: Lingua 1.3.0, Lingua 1.4.0 Dec 31, 2022
@goldsam
Copy link

goldsam commented Feb 9, 2023

I think you will benefit from using a trie (not tree) data structure. Here is a Go implementation you may be able to use as a drop in replacement to map.

@pemistahl pemistahl modified the milestones: Lingua 1.4.0, Lingua 1.5.0 Sep 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants