Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run_as_user flag in default_args error #85

Open
kokleong9406 opened this issue Aug 17, 2022 · 1 comment
Open

run_as_user flag in default_args error #85

kokleong9406 opened this issue Aug 17, 2022 · 1 comment

Comments

@kokleong9406
Copy link

Hi, for security reason, I have to use Apache Airflow v2.3.3 to with cwl-airflow because I would like to use the flag "run_as_user" defined in "default_args". It is a feature that allows an Airflow task to be ran by another Unix-user. More details can refer here: airflow impersonation

So I face this error "PID of job runner does not match" when I tried to run a workflow in a docker container

scheduler    | [2022-08-17 05:45:54,093] {scheduler_job.py:353} INFO - 1 tasks up for execution:
scheduler    | 	<TaskInstance: 39_test1_1-my-workflow_test.CWLJobDispatcher manual__2022-08-17T05:45:50+00:00 [scheduled]>
scheduler    | [2022-08-17 05:45:54,093] {scheduler_job.py:418} INFO - DAG 39_test1_1-my-workflow_test has 0/16 running and queued tasks
scheduler    | [2022-08-17 05:45:54,094] {scheduler_job.py:504} INFO - Setting the following tasks to queued state:
scheduler    | 	<TaskInstance: 39_test1_1-my-workflow_test.CWLJobDispatcher manual__2022-08-17T05:45:50+00:00 [scheduled]>
scheduler    | [2022-08-17 05:45:54,097] {scheduler_job.py:546} INFO - Sending TaskInstanceKey(dag_id='39_test1_1-my-workflow_test', task_id='CWLJobDispatcher', run_id='manual__2022-08-17T05:45:50+00:00', try_number=1, map_index=-1) to executor with priority 3 and queue default
scheduler    | [2022-08-17 05:45:54,097] {base_executor.py:91} INFO - Adding to queue: ['airflow', 'tasks', 'run', '39_test1_1-my-workflow_test', 'CWLJobDispatcher', 'manual__2022-08-17T05:45:50+00:00', '--local', '--subdir', 'DAGS_FOLDER/39_test1_1-my-workflow_test.py']
scheduler    | [2022-08-17 05:45:54,100] {local_executor.py:79} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', '39_test1_1-my-workflow_test', 'CWLJobDispatcher', 'manual__2022-08-17T05:45:50+00:00', '--local', '--subdir', 'DAGS_FOLDER/39_test1_1-my-workflow_test.py']
scheduler    | [2022-08-17 05:45:54,172] {dagbag.py:508} INFO - Filling up the DagBag from /home/kokleong/projects/root_perseus_app/cwl-airflow-dev-v3/dags/39_test1_1-my-workflow_test.py
scheduler    | /usr/local/lib/python3.8/site-packages/airflow/configuration.py:528 DeprecationWarning: The sql_alchemy_conn option in [core] has been moved to the sql_alchemy_conn option in [database] - the old setting has been used, but please update your config.
scheduler    | [2022-08-17 05:45:55,059] {task_command.py:371} INFO - Running <TaskInstance: 39_test1_1-my-workflow_test.CWLJobDispatcher manual__2022-08-17T05:45:50+00:00 [queued]> on host c318617f24d4
scheduler    | [2022-08-17 05:46:01,689] {local_executor.py:128} ERROR - Failed to execute task PID of job runner does not match.
scheduler    | Traceback (most recent call last):
scheduler    |   File "/usr/local/lib/python3.8/site-packages/airflow/executors/local_executor.py", line 124, in _execute_work_in_fork
scheduler    |     args.func(args)
scheduler    |   File "/usr/local/lib/python3.8/site-packages/airflow/cli/cli_parser.py", line 51, in command
scheduler    |     return func(*args, **kwargs)
scheduler    |   File "/usr/local/lib/python3.8/site-packages/airflow/utils/cli.py", line 99, in wrapper
scheduler    |     return f(*args, **kwargs)
scheduler    |   File "/usr/local/lib/python3.8/site-packages/airflow/cli/commands/task_command.py", line 377, in task_run
scheduler    |     _run_task_by_selected_method(args, dag, ti)
scheduler    |   File "/usr/local/lib/python3.8/site-packages/airflow/cli/commands/task_command.py", line 183, in _run_task_by_selected_method
scheduler    |     _run_task_by_local_task_job(args, ti)
scheduler    |   File "/usr/local/lib/python3.8/site-packages/airflow/cli/commands/task_command.py", line 241, in _run_task_by_local_task_job
scheduler    |     run_job.run()
scheduler    |   File "/usr/local/lib/python3.8/site-packages/airflow/jobs/base_job.py", line 244, in run
scheduler    |     self._execute()
scheduler    |   File "/usr/local/lib/python3.8/site-packages/airflow/jobs/local_task_job.py", line 136, in _execute
scheduler    |     self.handle_task_exit(return_code)
scheduler    |   File "/usr/local/lib/python3.8/site-packages/airflow/jobs/base_job.py", line 225, in heartbeat
scheduler    |     self.heartbeat_callback(session=session)
scheduler    |   File "/usr/local/lib/python3.8/site-packages/airflow/utils/session.py", line 68, in wrapper
scheduler    |     return func(*args, **kwargs)
scheduler    |   File "/usr/local/lib/python3.8/site-packages/airflow/jobs/local_task_job.py", line 211, in heartbeat_callback
scheduler    |     "Recorded pid %s does not match the current pid %s", recorded_pid, current_pid
scheduler    | airflow.exceptions.AirflowException: PID of job runner does not match
scheduler    | [2022-08-17 05:46:01,907] {scheduler_job.py:599} INFO - Executor reports execution of 39_test1_1-my-workflow_test.CWLJobDispatcher run_id=manual__2022-08-17T05:45:50+00:00 exited with status failed for try_number 1
scheduler    | [2022-08-17 05:46:01,912] {scheduler_job.py:642} INFO - TaskInstance Finished: dag_id=39_test1_1-my-workflow_test, task_id=CWLJobDispatcher, run_id=manual__2022-08-17T05:45:50+00:00, map_index=-1, run_start_date=2022-08-17 05:45:55.369221+00:00, run_end_date=2022-08-17 05:46:00.979434+00:00, run_duration=5.610213, state=failed, executor_state=failed, try_number=1, max_tries=0, job_id=57, pool=default_pool, queue=default, priority_weight=3, operator=CWLJobDispatcher, queued_dttm=2022-08-17 05:45:54.095112+00:00, queued_by_job_id=50, pid=618
scheduler    | [2022-08-17 05:46:02,949] {dagrun.py:549} ERROR - Marking run <DagRun 39_test1_1-my-workflow_test @ 2022-08-17 05:45:50+00:00: manual__2022-08-17T05:45:50+00:00, externally triggered: True> failed
scheduler    | [2022-08-17 05:46:02,949] {dagrun.py:609} INFO - DagRun Finished: dag_id=39_test1_1-my-workflow_test, execution_date=2022-08-17 05:45:50+00:00, run_id=manual__2022-08-17T05:45:50+00:00, run_start_date=2022-08-17 05:45:54.058559+00:00, run_end_date=2022-08-17 05:46:02.949602+00:00, run_duration=8.891043, state=failed, external_trigger=True, run_type=manual, data_interval_start=2022-08-17 05:45:50+00:00, data_interval_end=2022-08-17 05:45:50+00:00, dag_hash=8366f942b3d5bba361f6640b7d2ae180

I do not face this error when I tried to run the same workflow in my local host though.

Below are some of the info that I think might be helpful to resolve the issue.
My list of Python packages:

Package                             Version
----------------------------------- ------------------
alembic                             1.8.1
anyio                               3.6.1
apache-airflow                      2.3.3
apache-airflow-providers-common-sql 1.0.0
apache-airflow-providers-ftp        3.1.0
apache-airflow-providers-http       4.0.0
apache-airflow-providers-imap       3.0.0
apache-airflow-providers-sqlite     3.2.0
apispec                             3.3.2
argcomplete                         2.0.0
attrs                               20.3.0
Babel                               2.10.3
bagit                               1.8.1
blinker                             1.5
CacheControl                        0.12.11
cachelib                            0.9.0
cattrs                              1.10.0
certifi                             2022.6.15
cffi                                1.15.1
charset-normalizer                  2.1.0
click                               8.1.3
clickclick                          20.10.2
colorama                            0.4.5
coloredlogs                         15.0.1
colorlog                            4.8.0
commonmark                          0.9.1
connexion                           2.14.0
cron-descriptor                     1.2.31
croniter                            1.3.5
cryptography                        37.0.4
cwl-airflow                         1.2.11
cwltest                             2.1.20210626101542
cwltool                             3.1.20210816212154
defusedxml                          0.7.1
Deprecated                          1.2.13
dill                                0.3.5.1
dnspython                           2.2.1
docker                              5.0.3
docutils                            0.19
email-validator                     1.2.1
Flask                               2.2.2
Flask-AppBuilder                    4.1.2
Flask-Babel                         2.0.0
Flask-Caching                       2.0.1
Flask-JWT-Extended                  4.4.3
Flask-Login                         0.6.2
Flask-Session                       0.4.0
Flask-SQLAlchemy                    2.5.1
Flask-WTF                           0.15.1
graphviz                            0.20.1
greenlet                            1.1.2
gunicorn                            20.1.0
h11                                 0.12.0
httpcore                            0.15.0
httpx                               0.23.0
humanfriendly                       10.0
idna                                3.3
importlib-metadata                  4.12.0
importlib-resources                 5.9.0
inflection                          0.5.1
isodate                             0.6.1
itsdangerous                        2.1.2
Jinja2                              3.1.2
jsonmerge                           1.8.0
jsonschema                          4.9.1
junit-xml                           1.9
lazy-object-proxy                   1.7.1
linkify-it-py                       2.0.0
lockfile                            0.12.2
lxml                                4.9.1
Mako                                1.2.1
Markdown                            3.4.1
markdown-it-py                      2.1.0
MarkupSafe                          2.1.1
marshmallow                         3.17.0
marshmallow-enum                    1.5.1
marshmallow-oneofschema             3.0.1
marshmallow-sqlalchemy              0.26.1
mdit-py-plugins                     0.3.0
mdurl                               0.1.2
mistune                             0.8.4
msgpack                             1.0.4
mypy-extensions                     0.4.3
networkx                            2.8.5
packaging                           21.3
pathspec                            0.9.0
pendulum                            2.1.2
pip                                 22.2.2
pkgutil_resolve_name                1.3.10
pluggy                              1.0.0
prison                              0.2.1
prov                                1.5.1
psutil                              5.9.1
psycopg2                            2.9.3
pycparser                           2.21
pydot                               1.4.2
Pygments                            2.12.0
PyJWT                               2.4.0
pyparsing                           3.0.9
pyrsistent                          0.18.1
python-daemon                       2.3.1
python-dateutil                     2.8.2
python-nvd3                         0.15.0
python-slugify                      6.1.2
pytz                                2022.2.1
pytzdata                            2020.1
PyYAML                              6.0
rdflib                              6.0.2
requests                            2.28.1
requests-toolbelt                   0.9.1
rfc3986                             1.5.0
rich                                12.5.1
ruamel.yaml                         0.17.10
ruamel.yaml.clib                    0.2.6
schema-salad                        8.3.20220801194920
setproctitle                        1.3.2
setuptools                          56.0.0
shellescape                         3.8.1
six                                 1.16.0
sniffio                             1.2.0
SQLAlchemy                          1.4.40
SQLAlchemy-JSONField                1.0.0
SQLAlchemy-Utils                    0.38.3
swagger-ui-bundle                   0.0.9
tabulate                            0.8.10
tenacity                            8.0.1
termcolor                           1.1.0
text-unidecode                      1.3
tornado                             6.2
typing_extensions                   4.3.0
uc-micro-py                         1.0.1
unicodecsv                          0.14.1
urllib3                             1.26.11
websocket-client                    1.3.3
Werkzeug                            2.2.2
wrapt                               1.14.1
WTForms                             2.3.3
zipp                                3.8.1

For the Dockerfile and docker-compose file, I used your templates, with slight modification to it. Below are some of the important info:

ARG UBUNTU_VERSION="18.04"
ARG PYTHON_VERSION="3.8.12"
ARG CWL_AIRFLOW_VERSION="1.2.11"

I am suspecting that this issue is caused by myself running a workflow in a docker container. So far I have not seen anyone mentioned about this issue in the Airflow github.

@michael-kotliar
Copy link
Member

Hi @kokleong9406,

I think it can be related to running CWL-Airflow inside docker. For docker run there is a --user parameter. I believe something similar can be provided in the docker-compose file.

Let me know if this information was useful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants