`max_db_size` throws when size reached #264

leoauri · 2023-11-01T14:36:56Z

Hi,
I specified --max_db_size=5, expecting to output a small, test dataset with the specified size.
Instead, the script crashed when the size was reached

Traceback (most recent call last):
  File "/usr/local/bin/rave", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.9/dist-packages/scripts/main_cli.py", line 38, in main
    app.run(preprocess.main)
  File "/usr/local/lib/python3.9/dist-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.9/dist-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/usr/local/lib/python3.9/dist-packages/scripts/preprocess.py", line 208, in main
    for audio_id in pbar:
  File "/usr/local/lib/python3.9/dist-packages/tqdm/std.py", line 1182, in __iter__
    for obj in iterable:
  File "/usr/local/lib/python3.9/dist-packages/scripts/preprocess.py", line 115, in process_audio_array
    txn.put(
lmdb.MapFullError: mdb_put: MDB_MAP_FULL: Environment mapsize limit reached

and no metadata.yaml was written. Then train wouldn't run because

Traceback (most recent call last):
...
  File "/usr/local/lib/python3.9/dist-packages/rave/dataset.py", line 185, in get_dataset
    with open(os.path.join(db_path, 'metadata.yaml'), 'r') as metadata:
FileNotFoundError: [Errno 2] No such file or directory: '.../metadata.yaml'

The text was updated successfully, but these errors were encountered:

leoauri · 2023-11-01T21:51:24Z

I wrapped it in a simple exception handler, seems to work, will submit a pr

leoauri · 2023-12-17T18:05:31Z

You closed this issue, pr not accepted, no comment. Has the issue been fixed?

domkirke · 2023-12-17T18:56:43Z

There is no issue here ; the script fails when the limit is not high enough... Do not see why a exception handling would help ; the data will be corrupted and incomplete.

leoauri · 2023-12-18T06:19:13Z

Huh. Yeah, if the intended behaviour is no database write, then there is no issue.

I assumed the point of the flag is to produce a usable database with size less than max_db_size. This is what happens on the branch I submitted. 🤷‍♀️

After lmdb throws MapFullError, only lmdb.Environment.close() needs to be called to write consistent database, works for me, anyway.

domkirke · 2023-12-18T11:30:18Z

Yes I see ; but here the point is to access the lmdb parameter for maximum size, that should raise an exception if the data amount goes beyond the data limit. To restrict the dataset size, restrict your dataset ;)

leoauri · 2023-12-18T11:49:41Z

For me it was useful to be able to produce a database under a certain size. I am not sure what the usefulness is of processing a bunch of data and then not writing anything to disk. To your point about exceptions, this would just be an example of using exceptions for control flow. Just because lmdb provides an exception, doesn't mean it must be used to end execution.

leoauri mentioned this issue Nov 2, 2023

Catch database size limit reached exception #265

Open

domkirke closed this as completed Dec 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`max_db_size` throws when size reached #264

`max_db_size` throws when size reached #264

leoauri commented Nov 1, 2023

leoauri commented Nov 1, 2023

leoauri commented Dec 17, 2023

domkirke commented Dec 17, 2023

leoauri commented Dec 18, 2023 •

edited

domkirke commented Dec 18, 2023

leoauri commented Dec 18, 2023

max_db_size throws when size reached #264

max_db_size throws when size reached #264

Comments

leoauri commented Nov 1, 2023

leoauri commented Nov 1, 2023

leoauri commented Dec 17, 2023

domkirke commented Dec 17, 2023

leoauri commented Dec 18, 2023 • edited

domkirke commented Dec 18, 2023

leoauri commented Dec 18, 2023

`max_db_size` throws when size reached #264

`max_db_size` throws when size reached #264

leoauri commented Dec 18, 2023 •

edited