Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Log Directory creation causes error (unless it exists already) #1560

Open
jglaser opened this issue Jun 6, 2021 · 0 comments
Open
Labels
bug Something isn't working

Comments

@jglaser
Copy link
Collaborator

jglaser commented Jun 6, 2021

Describe the bug

Running on 90 workers, I get the following error

Could not create directory: /gpfs/alpine/proj-shared/gen119/bsql_shared/logs_ucx_1060213[Errno 17] File exists: '/gpfs/alpine/proj-shared/gen119/bsql_shared/logs_ucx_1060213'
distributed.worker - WARNING - Compute Failed
Function:  initialize_server_directory
args:      ('/gpfs/alpine/proj-shared/gen119/bsql_shared/logs_ucx_1060213', True)
kwargs:    {}
Exception: FileExistsError(17, 'File exists')

The directory /gpfs/alpine/proj-shared/gen119/bsql_shared/logs_ucx_1060213 did not exist prior to launching the job.

Steps/Code to reproduce bug

Launch BlazingSQL on a sufficient number of workers to trigger the race condition, set LOG to the above directory (and make sure it doesn't exist yet), and set the following environment variables

export BLAZING_LOGGING_DIRECTORY=${LOG}
export BLAZING_LOCAL_LOGGING_DIRECTORY=${LOG}
export BSQL_BLAZING_LOGGING_DIRECTORY=${LOG}
export BSQL_BLAZING_LOCAL_LOGGING_DIRECTORY=${LOG}
export ENABLE_COMMS_LOGS=False
export BSQL_ENABLE_COMMS_LOGS=False
export BSQL_ENABLE_TASK_LOGS=True
export BSQL_ENABLE_OTHER_ENGINE_LOGS=True
export RMM_DEBUG_LOG_FILE=${LOG}/rmm_log.txt

Expected behavior

The directory should be silently created if it doesn't exist yet.

  • BlazingSQL Version 0.19

Environment details
Please run and paste the output of the print_env.sh script here, to gather any other relevant environment details

Additional context

Suspected source of the issue

in pyblazing/apiv2/context.py

def initialize_server_directory(dir_path, is_dask):
    if not os.path.exists(dir_path):
        try:
            os.mkdir(dir_path)
        except OSError as error:
            get_blazing_logger(is_dask).error(
                f"Could not create directory: {dir_path}" + str(error)
             )
            raise
        return True
    else:
        return True

This should intercept the FileExistsError and then silently return (instead of using os.path.exists, which results in a race condition).

@jglaser jglaser added bug Something isn't working ? - Needs Triage needs team to review and classify labels Jun 6, 2021
@wmalpica wmalpica removed the ? - Needs Triage needs team to review and classify label Jun 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants