Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permission Denied for DBT Directory #897

Open
mkeightleyentrata opened this issue Mar 20, 2024 · 4 comments
Open

Permission Denied for DBT Directory #897

mkeightleyentrata opened this issue Mar 20, 2024 · 4 comments
Assignees
Labels
area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc dbt:run Primarily related to dbt run command or functionality epic-assigned execution:docker Related to Docker execution environment parsing:custom Related to custom parsing, like custom DAG parsing, custom DBT parsing, etc
Milestone

Comments

@mkeightleyentrata
Copy link

I had trouble getting cosmos dags to run on version 1.3.2. I would always end up with PermissionError: [Errno 13] Permission denied: /path/to/dbt_project.

I tried chmod -R 777 /path/to/dbt_project in my Dockerfile with no luck.

The one solve I found was this version astronomer-cosmos==1.4.0a1. I added invocation_mode=InvocationMode.DBT_RUNNER to my ExecutionConfig and things started running as expected. Per the Invocation Modes section of these docs there is the DBT_RUNNER invocation mode and the SUBPROCESS mode. While things run in the DBT_RUNNER mode, the SUBPROCESS mode still gives me the permission error.

I am wondering what the difference between the DBT_RUNNER and SUBPROCESS modes where the SUBPROCESS mode throws the permission error but the DBT_RUNNER mode does not. And if there's a way to avoid the permission error in the latest official release since Invocation Modes can't be used until 1.4 is released

Copy link

dosubot bot commented Mar 20, 2024

Hey @mkeightleyentrata! Fancy seeing you here again, diving into the depths of permission mysteries. Let's decode this enigma together! 🕵️‍♂️

The difference between the DBT_RUNNER and SUBPROCESS modes in Cosmos v1.3.2 that causes a permission error in SUBPROCESS mode is primarily in how dbt commands are executed and the environment requirements for each mode.

  1. DBT_RUNNER Mode: This mode uses the dbtRunner for programmatic invocations of dbt commands. It requires dbt to be installed in the same local environment as Cosmos. This mode is faster because it does not have the overhead of spawning new subprocesses or parsing the output of dbt commands. It requires dbt version 1.5.0 or higher.

  2. SUBPROCESS Mode: In this mode, Cosmos runs dbt CLI commands using the Python subprocess module. This involves spawning new subprocesses and parsing the output to capture logs and raise exceptions. This mode can lead to permission errors if the environment from which the subprocess is spawned does not have the necessary permissions to execute dbt commands or access certain files or directories.

The permission error in SUBPROCESS mode can be caused by the subprocess not inheriting the correct permissions or environment variables from the parent process, leading to issues when trying to execute dbt commands or access files.

To avoid this error in the latest official release, you can:

  • Ensure that the environment from which Cosmos is running has the necessary permissions to execute dbt commands and access the required files or directories.
  • Use the DBT_RUNNER mode if possible, by ensuring dbt version 1.5.0 or higher is installed in the same environment as Cosmos. This mode avoids the overhead and potential permission issues associated with spawning subprocesses.

The specific differences in the codebase between the DBT_RUNNER and SUBPROCESS modes in Cosmos v1.3.2, based on the provided context from cosmos/operators/local.py, are as follows:

  1. Invocation Method Selection:

    • The _set_invocation_methods method sets the invoke_dbt and handle_exception methods based on the invocation_mode. For DBT_RUNNER, it sets invoke_dbt to run_dbt_runner and handle_exception to handle_exception_dbt_runner. For SUBPROCESS, it sets invoke_dbt to run_subprocess and handle_exception to handle_exception_subprocess.
  2. Running the Command:

    • For DBT_RUNNER, the run_dbt_runner method is used to invoke the dbt command programmatically using the dbtRunner class.
    • For SUBPROCESS, the run_subprocess method is used to run the dbt command using a subprocess, facilitated by the FullOutputSubprocessHook.
  3. Exception Handling:

    • The handle_exception_dbt_runner method handles exceptions for the DBT_RUNNER mode by checking the success attribute of the dbtRunnerResult. If success is False, it raises an AirflowException with details of the error.
    • The handle_exception_subprocess method handles exceptions for the SUBPROCESS mode by checking the exit_code of the FullOutputSubprocessResult. If the exit_code is not 0 (indicating an error), it raises an AirflowException with the command's output details.
  4. On Kill Behavior:

    • The on_kill method has specific behavior for the SUBPROCESS mode, where it sends a SIGINT or SIGTERM to the subprocess based on the cancel_query_on_kill attribute. This behavior is specific to the SUBPROCESS mode and is not applicable to the DBT_RUNNER mode.

The specific permissions required by the SUBPROCESS mode in Cosmos v1.3.2 to execute dbt commands and access required files or directories, based on the provided context from cosmos/hooks/subprocess.py, include:

  1. Execute Permissions for the Command: The user or process executing the dbt commands via this subprocess hook needs to have execute permissions on the dbt executable or any scripts being called.

  2. Read/Write Permissions in the Working Directory: If a working directory (cwd) is specified, the process must have read and write permissions in that directory. This is necessary for creating temporary files or directories and for dbt to read its project files and write logs or output files. If cwd is None, the hook creates a temporary directory in the system's temporary folder, for which it would automatically have the required permissions.

  3. Environment Variable Access: The hook allows passing custom environment variables via the env parameter. If omitted, it uses os.environ, which means the process needs access to read environment variables. This is crucial for dbt commands that might rely on environment variables for configuration (e.g., database connection strings).

  4. Signal Handling Permissions: The hook uses signals (SIGTERM, SIGINT, and restores default signal disposition for SIGPIPE, SIGXFZ, SIGXFSZ) for process management. The process must have the permission to send these signals, especially if it's trying to send them to processes belonging to a different user.

  5. Session and Group ID Management: The pre_exec function calls os.setsid() to start a new session and set the session ID. This requires permissions to create a new session and might be restricted depending on the system's security policies.

  6. Access to Temporary Directory: The hook logs the root location of the temporary directory and might create temporary directories or files. The process needs permissions to access and write to the system's temporary directory.

Unfortunately, I wasn't able to find specific changes made in the astronomer-cosmos==1.4.0a1 version that resolved the permission error issue. It's possible that the changes are not documented in the repository or I might have missed them. If you have more information about the changes or steps to reproduce the issue, it would be helpful in providing a more accurate answer.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@dosubot dosubot bot added area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc dbt:run Primarily related to dbt run command or functionality execution:docker Related to Docker execution environment parsing:custom Related to custom parsing, like custom DAG parsing, custom DBT parsing, etc labels Mar 20, 2024
@tatiana
Copy link
Collaborator

tatiana commented Mar 21, 2024

Hi @mkeightleyentrata, thanks for reporting the issue!

Is this error happening when you try to execute tasks? Please share the steps/code so we can reproduce the problem you're facing. It would also be helpful if you could share more lines of the log stack trace. This usually helps us reproduce and fix the issue.

By default, Cosmos attempts to run dbt commands using a Python subprocess when using ExecutionMode.LOCAL. The commands are executed within the same environment and with the same user privileges as the Python interpreter executing Airflow tasks. The main difference from a permission perspective that I can see when using invocation_mode=InvocationMode.DBT_RUNNER is that, by default, the system environment variables are used to run the command.

Could you try making the system environment variables available to the the process running the dbt command by setting operator args?

DbtDag(
    # ...
    operator_args={"append_env": True}

@mkeightleyentrata
Copy link
Author

Hi @tatiana , thanks for your help. I get the following error on the run step of a model

ERROR - Task failed with exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/cosmos/operators/base.py", line 246, in execute
    self.build_and_run_cmd(context=context, cmd_flags=self.add_cmd_flags())
  File "/usr/local/lib/python3.10/site-packages/cosmos/operators/local.py", line 370, in build_and_run_cmd
    result = self.run_command(cmd=dbt_cmd, env=env, context=context)
  File "/usr/local/lib/python3.10/site-packages/cosmos/operators/local.py", line 237, in run_command
    result = self.run_subprocess(
  File "/usr/local/lib/python3.10/site-packages/cosmos/operators/local.py", line 189, in run_subprocess
    subprocess_result: FullOutputSubprocessResult = self.subprocess_hook.run_command(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/cosmos/hooks/subprocess.py", line 75, in run_command
    self.sub_process = Popen(
  File "/usr/local/lib/python3.10/subprocess.py", line 971, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/local/lib/python3.10/subprocess.py", line 1863, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
PermissionError: [Errno 13] Permission denied: '/usr/local/airflow/dags/dbt'

Here is the dag that I'm attempting to run. I decided to take out the invocation_mode argument so that I can run on what I understand to be the latest official release of 1.3.2. I am able to run this successfully with or without operator_args={"append_env": True} on version 1.4.0a1. As soon as I switch to 1.3.2, it resorts back to the error above with or without append_env: True argument.

from datetime import datetime
from cosmos import ExecutionConfig, ProfileConfig, ProjectConfig, DbtDag, RenderConfig, LoadMode
from cosmos.profiles import RedshiftUserPasswordProfileMapping

DBT_ROOT_PATH = '/usr/local/airflow/dags/dbt'

profile_config = ProfileConfig(
    profile_name="my_profile",
    target_name="my_target",
    profile_mapping=RedshiftUserPasswordProfileMapping(
        conn_id="my_conn",
        profile_args={"schema": "my_schema"},
    )
)

execution_config = ExecutionConfig(dbt_executable_path=DBT_ROOT_PATH)
project_config = ProjectConfig(manifest_path=f"{DBT_ROOT_PATH}/target/manifest.json", project_name="my_project", dbt_project_path=DBT_ROOT_PATH)

cosmos_manifest_example = DbtDag(
    # dbt/cosmos-specific parameters
    project_config=project_config,
    profile_config=profile_config,
    render_config=RenderConfig(load_method=LoadMode.DBT_MANIFEST, select=[f"tag:cosmos-test"]),
    execution_config=execution_config,
    # normal dag parameters
    schedule=None,
    start_date=datetime(2024, 1, 1),
    catchup=False,
    dag_id="cosmos-manifest-example",
    operator_args={"append_env": True}
)

@tatiana
Copy link
Collaborator

tatiana commented May 17, 2024

Hi @mkeightleyentrata Cosmos 1.4 was officially released:

Please, could you also share your Dockerfile so we can reproduce the issue you described with SUBPROCESS?

@tatiana tatiana added this to the 1.5.0 milestone May 17, 2024
@tatiana tatiana added triage-needed Items need to be reviewed / assigned to milestone and removed triage-needed Items need to be reviewed / assigned to milestone labels May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc dbt:run Primarily related to dbt run command or functionality epic-assigned execution:docker Related to Docker execution environment parsing:custom Related to custom parsing, like custom DAG parsing, custom DBT parsing, etc
Projects
None yet
Development

No branches or pull requests

3 participants