Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MemoryError when loading large number of datasets #394

Open
jakeatmsft opened this issue Oct 7, 2020 · 0 comments
Open

MemoryError when loading large number of datasets #394

jakeatmsft opened this issue Oct 7, 2020 · 0 comments

Comments

@jakeatmsft
Copy link

Describe the bug
My training file is loading multiple datasets into memory using "create_tabular_dataset_from_parquet_files" function then using rm(dataset);gc(); to release the dataset. Error thrown even after datasets are removed and garbage collection invoked.

driver_log:
Error in py_call_impl(callable, dots$args, dots$keywords) :
MemoryError: Engine process terminated. This is most likely due to system running out of memory. Please retry with increased memory. |session_id=ba9874d8-f32a-4568-b75c-650f6747ef4e

Detailed traceback:
File "/azureml-envs/azureml_505e9e90fca3d01d6df21acd71c8c832/lib/python3.6/site-packages/azureml/data/_loggerfactory.py", line 126, in wrapper
return func(*args, **kwargs)
File "/azureml-envs/azureml_505e9e90fca3d01d6df21acd71c8c832/lib/python3.6/site-packages/azureml/data/dataset_factory.py", line 122, in from_parquet_files
partition_format)
File "/azureml-envs/azureml_505e9e90fca3d01d6df21acd71c8c832/lib/python3.6/site-packages/azureml/data/dataset_factory.py", line 134, in _from_parquet_files
validate or _is_inference_required(set_column_types))
File "/azureml-envs/azureml_505e9e90fca3d01d6df21acd71c8c832/lib/python3.6/site-packages/azureml/data/dataset_factory.py", line 768, in _transform_and_validate
'Make sure the path is accessible and contains
Calls: source ... create_tabular_dataset_from_parquet_files -> -> py_call_impl

Expected behavior
rm(); gc(); should release memory associated with in-memory dataset such that memoryerror does not occur.

70_driver_log (1).txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant