Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting standard output in airflow scheduler #78

Open
kokleong9406 opened this issue Jan 18, 2022 · 3 comments
Open

Getting standard output in airflow scheduler #78

kokleong9406 opened this issue Jan 18, 2022 · 3 comments
Labels

Comments

@kokleong9406
Copy link

Hi Michael,

Would like to ask that currently, is there anyway to get the standard output printed on airflow scheduler? Currently, I am able to get the log files via the Airflow REST API endpoint "/dags/{dag_id}/dagRuns/{dag_run_id}/taskInstances/{task_id}/logs/{task_try_number}". You may also refer to https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html#operation/get_task_instance. The log files will look something familiar like below:

*** Reading local file: /home/kokleong/projects/cwl-airflow/logs/kokleong_0_compute_prs/5_compute_prs/2022-01-18T09:01:09.184198+00:00/1.log
[2022-01-18 17:01:49,252] {taskinstance.py:903} INFO - Dependencies all met for <TaskInstance: kokleong_0_compute_prs.5_compute_prs 2022-01-18T09:01:09.184198+00:00 [queued]>
[2022-01-18 17:01:49,259] {taskinstance.py:903} INFO - Dependencies all met for <TaskInstance: kokleong_0_compute_prs.5_compute_prs 2022-01-18T09:01:09.184198+00:00 [queued]>
[2022-01-18 17:01:49,259] {taskinstance.py:1095} INFO - 
--------------------------------------------------------------------------------
[2022-01-18 17:01:49,259] {taskinstance.py:1096} INFO - Starting attempt 1 of 1
[2022-01-18 17:01:49,259] {taskinstance.py:1097} INFO - 
--------------------------------------------------------------------------------
[2022-01-18 17:01:49,270] {taskinstance.py:1115} INFO - Executing <Task(CWLStepOperator): 5_compute_prs> on 2022-01-18T09:01:09.184198+00:00
[2022-01-18 17:01:49,273] {standard_task_runner.py:52} INFO - Started process 25549 to run task
[2022-01-18 17:01:49,276] {standard_task_runner.py:76} INFO - Running: ['airflow', 'tasks', 'run', 'kokleong_0_compute_prs', '5_compute_prs', '2022-01-18T09:01:09.184198+00:00', '--job-id', '35', '--pool', 'default_pool', '--raw', '--subdir', 'DAGS_FOLDER/kokleong_0_compute_prs.py', '--cfg-path', '/tmp/tmp4kzmqsnr', '--error-file', '/tmp/tmpp3_vy6iu']
[2022-01-18 17:01:49,277] {standard_task_runner.py:77} INFO - Job 35: Subtask 5_compute_prs
[2022-01-18 17:01:49,320] {logging_mixin.py:109} INFO - Running <TaskInstance: kokleong_0_compute_prs.5_compute_prs 2022-01-18T09:01:09.184198+00:00 [running]> on host ldap.a-star.edu.sg
[2022-01-18 17:01:49,373] {taskinstance.py:1252} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=kokleong_0_compute_prs
AIRFLOW_CTX_TASK_ID=5_compute_prs
AIRFLOW_CTX_EXECUTION_DATE=2022-01-18T09:01:09.184198+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2022-01-18T09:01:09.184198+00:00
[2022-01-18 17:01:50,774] {load_tool.py:102} INFO - Resolved '/home/kokleong/projects/cwl-airflow/cwl_tmp_folder/kokleong_0_compute_prs_manual__2022-01-18T09_01_09.184198_00_00_d4_how0n/5_compute_prs/5_compute_prs_step_workflow.cwl' to 'file:///home/kokleong/projects/cwl-airflow/cwl_tmp_folder/kokleong_0_compute_prs_manual__2022-01-18T09_01_09.184198_00_00_d4_how0n/5_compute_prs/5_compute_prs_step_workflow.cwl'
[2022-01-18 17:01:52,804] {workflow_job.py:794} INFO - [workflow ] start
[2022-01-18 17:01:52,805] {workflow_job.py:630} INFO - [workflow ] starting step 5_compute_prs
[2022-01-18 17:01:52,805] {workflow_job.py:74} INFO - [step 5_compute_prs] start
[2022-01-18 17:01:52,813] {job.py:259} INFO - [job 5_compute_prs] /home/kokleong/projects/cwl-airflow/cwl_tmp_folder/kokleong_0_compute_prs_manual__2022-01-18T09_01_09.184198_00_00_d4_how0n/5_compute_prs/5_compute_prs_step_cache/mcxm1hz9$ sh \
    5_compute_prs.sh > /home/kokleong/projects/cwl-airflow/cwl_tmp_folder/kokleong_0_compute_prs_manual__2022-01-18T09_01_09.184198_00_00_d4_how0n/5_compute_prs/5_compute_prs_step_cache/mcxm1hz9/5_compute_prs_stdout.log 2> /home/kokleong/projects/cwl-airflow/cwl_tmp_folder/kokleong_0_compute_prs_manual__2022-01-18T09_01_09.184198_00_00_d4_how0n/5_compute_prs/5_compute_prs_step_cache/mcxm1hz9/5_compute_prs_stderr.log
[2022-01-18 17:01:53,000] {job.py:403} INFO - [job 5_compute_prs] completed success
[2022-01-18 17:01:53,001] {workflow_job.py:587} INFO - [step 5_compute_prs] completed success
[2022-01-18 17:01:53,001] {workflow_job.py:549} INFO - [workflow ] completed success
[2022-01-18 17:01:53,040] {taskinstance.py:1212} INFO - Marking task as SUCCESS. dag_id=kokleong_0_compute_prs, task_id=5_compute_prs, execution_date=20220118T090109, start_date=20220118T090149, end_date=20220118T090153
[2022-01-18 17:01:53,067] {local_task_job.py:151} INFO - Task exited with return code 0
[2022-01-18 17:01:53,147] {local_task_job.py:261} INFO - 1 downstream tasks scheduled from follow-on schedule check

I was thinking it would be great if I am also able to get the standard output that is printed on the airflow scheduler, which will look something like below:

[2022-01-18 15:55:27,593] {local_executor.py:79} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', 'kokleong_0_compute_prs', '5_compute_prs', '2022-01-18T07:54:20.748908+00:00', '--local', '--pool', 'default_pool', '--subdir', '/home/kokleong/projects/cwl-airflow/dags/kokleong_0_compute_prs.py']
[2022-01-18 15:55:27,611] {dagbag.py:496} INFO - Filling up the DagBag from /home/kokleong/projects/cwl-airflow/dags/kokleong_0_compute_prs.py
Running <TaskInstance: kokleong_0_compute_prs.5_compute_prs 2022-01-18T07:54:20.748908+00:00 [queued]> on host ldap.a-star.edu.sg
Resolved '/home/kokleong/projects/cwl-airflow/cwl_tmp_folder/kokleong_0_compute_prs_manual__2022-01-18T07_54_20.748908_00_00_cc_wep2a/5_compute_prs/5_compute_prs_step_workflow.cwl' to 'file:///home/kokleong/projects/cwl-airflow/cwl_tmp_folder/kokleong_0_compute_prs_manual__2022-01-18T07_54_20.748908_00_00_cc_wep2a/5_compute_prs/5_compute_prs_step_workflow.cwl'
[workflow ] start
[workflow ] starting step 5_compute_prs
[step 5_compute_prs] start
[job 5_compute_prs] /home/kokleong/projects/cwl-airflow/cwl_tmp_folder/kokleong_0_compute_prs_manual__2022-01-18T07_54_20.748908_00_00_cc_wep2a/5_compute_prs/5_compute_prs_step_cache/785meyr6$ sh \
    5_compute_prs.sh
PLINK v1.90b3.42 64-bit (20 Sep 2016)      https://www.cog-genomics.org/plink2
(C) 2005-2016 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to PGS000192.log.
Options in effect:
  --out PGS000192
  --score PGS000192.txt.clean.scores 1 2 3 sum
  --vcf PGS000192.txt.clean.final.vcf

64190 MB RAM detected; reserving 32095 MB for main workspace.
--vcf: PGS000192-temporary.bed + PGS000192-temporary.bim +
PGS000192-temporary.fam written.
9 variants loaded from .bim file.
3202 people (0 males, 0 females, 3202 ambiguous) loaded from .fam.
Ambiguous sex IDs written to PGS000192.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 3202 founders and 0 nonfounders present.
Calculating allele frequencies... done.
9 variants and 3202 people pass filters and QC.
Note: No phenotypes present.
--score: 9 valid predictors loaded.
--score: Results written to PGS000192.profile .
[job 5_compute_prs] completed success
[step 5_compute_prs] completed success
[workflow ] completed success

Currently is this being supported in cwl-airflow? If no, could you point me to the source code that does this and hopefully I am able to do some tricks and modifications to get the standard output logged into the Airflow log files (which is saved to "1.log") :P

@michael-kotliar
Copy link
Member

Hi @kokleong9406

Could you please provide more details on what kind of logs you would like to have? It looks like the reason why you want to have the scheduler logs is that there you can see the output from your 5_compute_prs.sh script. However, you should be able the see the logs from your 5_compute_prs.sh script in the task instance logs after the line CWL LOGS. Please, let me know if you don't have this line in your task instance logs. Also, could you please provide a version of CWL-Airflow you use? This is the link on the code, that appends cwl related logs (a.k.a from your 5_compute_prs.sh script) to the end of task instance logs

def append_cwl_log(context):

@kokleong9406
Copy link
Author

Hi @michael-kotliar

Thank you for your reply. I see that I do not have "def append_cwl_log(content)" in report.py nor any of the files. The current version of CWL-Airflow I am using is 1.2.11. Which version do I need to use to have this function?

Btw, I guess I will try out bit by bit to integrate the "append_cwl_log" function into my existing cwl version as well to see how it goes.

@michael-kotliar
Copy link
Member

Hi @kokleong9406

I think these changes are not in the release on Pypi yet, so you can install it from the latest master branch commit on GitHub. Let me know if need any help with it.

Michael

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants