Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

max_db_size throws when size reached #264

Closed
leoauri opened this issue Nov 1, 2023 · 6 comments · May be fixed by #265
Closed

max_db_size throws when size reached #264

leoauri opened this issue Nov 1, 2023 · 6 comments · May be fixed by #265

Comments

@leoauri
Copy link

leoauri commented Nov 1, 2023

Hi,
I specified --max_db_size=5, expecting to output a small, test dataset with the specified size.
Instead, the script crashed when the size was reached

Traceback (most recent call last):
  File "/usr/local/bin/rave", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.9/dist-packages/scripts/main_cli.py", line 38, in main
    app.run(preprocess.main)
  File "/usr/local/lib/python3.9/dist-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.9/dist-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/usr/local/lib/python3.9/dist-packages/scripts/preprocess.py", line 208, in main
    for audio_id in pbar:
  File "/usr/local/lib/python3.9/dist-packages/tqdm/std.py", line 1182, in __iter__
    for obj in iterable:
  File "/usr/local/lib/python3.9/dist-packages/scripts/preprocess.py", line 115, in process_audio_array
    txn.put(
lmdb.MapFullError: mdb_put: MDB_MAP_FULL: Environment mapsize limit reached

and no metadata.yaml was written. Then train wouldn't run because

Traceback (most recent call last):
...
  File "/usr/local/lib/python3.9/dist-packages/rave/dataset.py", line 185, in get_dataset
    with open(os.path.join(db_path, 'metadata.yaml'), 'r') as metadata:
FileNotFoundError: [Errno 2] No such file or directory: '.../metadata.yaml'
@leoauri
Copy link
Author

leoauri commented Nov 1, 2023

I wrapped it in a simple exception handler, seems to work, will submit a pr

@leoauri
Copy link
Author

leoauri commented Dec 17, 2023

You closed this issue, pr not accepted, no comment. Has the issue been fixed?

@domkirke
Copy link
Collaborator

There is no issue here ; the script fails when the limit is not high enough... Do not see why a exception handling would help ; the data will be corrupted and incomplete.

@leoauri
Copy link
Author

leoauri commented Dec 18, 2023

Huh. Yeah, if the intended behaviour is no database write, then there is no issue.

I assumed the point of the flag is to produce a usable database with size less than max_db_size. This is what happens on the branch I submitted. 🤷‍♀️

After lmdb throws MapFullError, only lmdb.Environment.close() needs to be called to write consistent database, works for me, anyway.

@domkirke
Copy link
Collaborator

Yes I see ; but here the point is to access the lmdb parameter for maximum size, that should raise an exception if the data amount goes beyond the data limit. To restrict the dataset size, restrict your dataset ;)

@leoauri
Copy link
Author

leoauri commented Dec 18, 2023

For me it was useful to be able to produce a database under a certain size. I am not sure what the usefulness is of processing a bunch of data and then not writing anything to disk. To your point about exceptions, this would just be an example of using exceptions for control flow. Just because lmdb provides an exception, doesn't mean it must be used to end execution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants