Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model and working directories should be separate #51

Open
p-e-w opened this issue Mar 29, 2021 · 1 comment
Open

Model and working directories should be separate #51

p-e-w opened this issue Mar 29, 2021 · 1 comment

Comments

@p-e-w
Copy link

p-e-w commented Mar 29, 2021

KaldiAG currently writes to several files in the model directory (such as file_cache.json, align_lexicon.int etc.), even when a separate temp directory is specified.

I think this breaks standard expectations of the model dir being a data repository, rather than a working directory for KaldiAG. It might also lead to hard-to-debug issues if multiple instances of KaldiAG are using the same model directory. When installing KaldiAG models for all users on a Linux system (e.g. using a package manager), they will likely be located under /usr/share, and will be read-only for unprivileged users, which again will lead to failure.

The best approach IMO would be to allow the user to specify a "working directory" when constructing a Compiler object (the default could be the model directory as it is now). This will enable a clean separation of immutable model data and mutable working cache if the application or the installation environment requires it.

@daanzu
Copy link
Owner

daanzu commented Apr 10, 2021

Yes, the current implementation is not something I am happy with. There are three categories of files: completely static for a given model, only necessary to rebuild when the lexicon is changed, and grammar-specific. On the one hand, I don't want to make things too complicated, with extra directories. On the other hand, as you said, it can be problematic to mix the various categories of files. I'm planning on re-organizing the model structure to make it better suited, and separate the categories more cleanly. Probably like how the latest version handles the cache, but moving all of the non static lexicon files into the cache directory as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants