Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in running "run_sequentially.py" #75

Open
kakotypallab opened this issue Jun 21, 2022 · 3 comments
Open

Error in running "run_sequentially.py" #75

kakotypallab opened this issue Jun 21, 2022 · 3 comments

Comments

@kakotypallab
Copy link

After loading the docker image "tvb-recon" along with path of input data folder

Input: python run_sequentially.py "1"

Output

Starting to process the following subjects: %s [1]
Starting to process the subject: TVB1
Configured atlas default for patient inside folder /home/submitter/data/TVB1/configs
Checking currently running job ids...
Error:

Extra Info: You probably saw this error because the condor_schedd is not
running on the machine you are trying to query. If the condor_schedd is not
running, the Condor system will not be able to find an address and port to
connect to and satisfy this request. Please make sure the Condor daemons are
running and try again.

Extra Info: If the condor_schedd is running on the machine you are trying to
query and you still see the error, the most likely cause is that you have
setup a personal Condor, you have not defined SCHEDD_NAME in your
condor_config file, and something is wrong with your SCHEDD_ADDRESS_FILE
setting. You must define either or both of those settings in your config
file, or you must use the -name option to condor_q. Please see the Condor
manual for details on SCHEDD_NAME and SCHEDD_ADDRESS_FILE.
Currently running job ids are: []
Starting pegasus run for subject: TVB1with atlas: default
main_pegasus.sh: 7: main_pegasus.sh: Bad substitution
/opt/tvb-recon
Traceback (most recent call last):
File "/opt/conda/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/opt/conda/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/opt/tvb-recon/tvb/recon/dax/main.py", line 7, in
from tvb.recon.dax.configuration import Configuration, ConfigKey, SensorsType
File "tvb/recon/dax/configuration.py", line 4, in
LOGGER = get_logger(name)
File "tvb/recon/logger.py", line 25, in get_logger
_ensure_log_folder_exists()
File "tvb/recon/logger.py", line 14, in _ensure_log_folder_exists
os.mkdir(OUTPUT_FOLDER)
OSError: [Errno 13] Permission denied: 'output'
Removing Generate 5tt MIF -> Tracts SIFT
Removing gen_mapping_details -> convert_output
Removing Convert APARC+ASEG to NIFTI with good orientation -> convert_output
Removing Recon-all for T1 -> qc_snapshot
Removing Recon-all for T1 -> qc_snapshot
Traceback (most recent call last):
File "/usr/bin/pegasus-graphviz", line 507, in
main()
File "/usr/bin/pegasus-graphviz", line 504, in main
emit_dot(dag, options.label, options.outfile, options.width, options.height)
File "/usr/bin/pegasus-graphviz", line 412, in init
self.out = open(outfile, 'w')
IOError: [Errno 13] Permission denied: '/home/submitter/data/TVB1/configs/dax/main_bnm.dot'
Error: Could not open "/home/submitter/data/TVB1/configs/dax/main_bnm.png" for writing : Permission denied
2022.06.21 08:02:52.726 UTC:
2022.06.21 08:02:52.732 UTC: -----------------------------------------------------------------------
2022.06.21 08:02:52.737 UTC: File for submitting this DAG to Condor : TVB-PIPELINE-0.dag.condor.sub
2022.06.21 08:02:52.744 UTC: Log of DAGMan debugging messages : TVB-PIPELINE-0.dag.dagman.out
2022.06.21 08:02:52.750 UTC: Log of Condor library output : TVB-PIPELINE-0.dag.lib.out
2022.06.21 08:02:52.756 UTC: Log of Condor library error messages : TVB-PIPELINE-0.dag.lib.err
2022.06.21 08:02:52.762 UTC: Log of the life of condor_dagman itself : TVB-PIPELINE-0.dag.dagman.log
2022.06.21 08:02:52.768 UTC:
2022.06.21 08:02:52.785 UTC: -----------------------------------------------------------------------
2022.06.21 08:02:58.964 UTC: Created Pegasus database in: sqlite:////home/submitter/.pegasus/workflow.db
2022.06.21 08:02:58.969 UTC: Your database is compatible with Pegasus version: 4.8.2
2022.06.21 08:02:59.114 UTC: Submitting to condor TVB-PIPELINE-0.dag.condor.sub
2022.06.21 08:02:59.128 UTC: [ERROR]
2022.06.21 08:02:59.133 UTC: [ERROR] ERROR: Can't find address of local schedd
2022.06.21 08:02:59.139 UTC: [ERROR] ERROR: Running condor_submit /usr/bin/condor_submit TVB-PIPELINE-0.dag.condor.sub failed with exit code 1 at /usr/bin/pegasus-run line 327.
2022.06.21 08:02:59.145 UTC: [FATAL ERROR]
[1] java.lang.RuntimeException: Unable to submit the workflow using pegasus-run at edu.isi.pegasus.planner.client.CPlanner.executeCommand(CPlanner.java:695)
Checking currently running job ids...
Error:

Extra Info: You probably saw this error because the condor_schedd is not
running on the machine you are trying to query. If the condor_schedd is not
running, the Condor system will not be able to find an address and port to
connect to and satisfy this request. Please make sure the Condor daemons are
running and try again.

Extra Info: If the condor_schedd is running on the machine you are trying to
query and you still see the error, the most likely cause is that you have
setup a personal Condor, you have not defined SCHEDD_NAME in your
condor_config file, and something is wrong with your SCHEDD_ADDRESS_FILE
setting. You must define either or both of those settings in your config
file, or you must use the -name option to condor_q. Please see the Condor
manual for details on SCHEDD_NAME and SCHEDD_ADDRESS_FILE.
Currently running job ids are: []
Traceback (most recent call last):
File "run_sequentially.py", line 185, in
current_job_id = new_job_ids[0]
IndexError: list index out of range
submitter@5637a3c2a40e:/opt/tvb-recon/pegasus$ condor_ststus
bash: condor_ststus: command not found
submitter@5637a3c2a40e:/opt/tvb-recon/pegasus$ condor_status
Error: communication error
CEDAR:6001:Failed to connect to <172.17.0.3:9618>

** I am inside a proxy network and has configured docker to run in the proxy network.

@kakotypallab
Copy link
Author

Made some progress with the CONDOR error. Still not getting the Output.

sh main_pegasus.sh /opt/tvb-recon/patients/TVB1/configs/ /opt/tvb-recon/patients/TVB1/configs/dax/

main_pegasus.sh: 7: main_pegasus.sh: Bad substitution
/opt/tvb-recon
2022-06-23 11:21:34,947 - tvb.recon.dax.configuration - INFO - Parsing patient configuration file /opt/tvb-recon/patients/TVB1/configs//patient_flow.properties
Removing Generate 5tt MIF -> Tracts SIFT
Removing gen_mapping_details -> convert_output
Removing Convert APARC+ASEG to NIFTI with good orientation -> convert_output
Removing Recon-all for T1 -> qc_snapshot
Removing Recon-all for T1 -> qc_snapshot
2022.06.23 11:21:38.340 UTC:
2022.06.23 11:21:38.346 UTC: -----------------------------------------------------------------------
2022.06.23 11:21:38.351 UTC: File for submitting this DAG to Condor : TVB-PIPELINE-0.dag.condor.sub
2022.06.23 11:21:38.357 UTC: Log of DAGMan debugging messages : TVB-PIPELINE-0.dag.dagman.out
2022.06.23 11:21:38.367 UTC: Log of Condor library output : TVB-PIPELINE-0.dag.lib.out
2022.06.23 11:21:38.372 UTC: Log of Condor library error messages : TVB-PIPELINE-0.dag.lib.err
2022.06.23 11:21:38.377 UTC: Log of the life of condor_dagman itself : TVB-PIPELINE-0.dag.dagman.log
2022.06.23 11:21:38.383 UTC:
2022.06.23 11:21:38.398 UTC: -----------------------------------------------------------------------
2022.06.23 11:21:39.236 UTC: Your database is compatible with Pegasus version: 4.8.2
2022.06.23 11:21:39.365 UTC: Submitting to condor TVB-PIPELINE-0.dag.condor.sub
2022.06.23 11:21:39.487 UTC: Submitting job(s).
2022.06.23 11:21:39.492 UTC: 1 job(s) submitted to cluster 81.
2022.06.23 11:21:39.498 UTC:
2022.06.23 11:21:39.503 UTC: Your workflow has been started and is running in the base directory:
2022.06.23 11:21:39.510 UTC:
2022.06.23 11:21:39.516 UTC: /home/submitter/pegasus/submit/submitter/pegasus/TVB-PIPELINE/run0006
2022.06.23 11:21:39.521 UTC:
2022.06.23 11:21:39.526 UTC: *** To monitor the workflow you can run ***
2022.06.23 11:21:39.531 UTC:
2022.06.23 11:21:39.537 UTC: pegasus-status -l /home/submitter/pegasus/submit/submitter/pegasus/TVB-PIPELINE/run0006
2022.06.23 11:21:39.542 UTC:
2022.06.23 11:21:39.547 UTC: *** To remove your workflow run ***
2022.06.23 11:21:39.554 UTC:
2022.06.23 11:21:39.559 UTC: pegasus-remove /home/submitter/pegasus/submit/submitter/pegasus/TVB-PIPELINE/run0006
2022.06.23 11:21:39.565 UTC:
2022.06.23 11:21:44.597 UTC: Time taken to execute is 2.747 seconds

@popaula937
Copy link
Collaborator

Hi,
I suggest you retry a clean start, by keeping only the raw data in your patient folder (remove configs and output folders if they exist).
After running the sudo condor_master command, please run condor_status to check whether Condor has started. Only after Condor is started, proceed with the: python run_sequentially.py "1" command.

@kakotypallab
Copy link
Author

python run_sequentially.py "1"

Hi Paula,

As suggested, I re-tried with a clean start by removing the configs and output folder. Also, verified condor is running.

Here is the output:

`docker run -it -v /home/administrator/recon/patients/:/home/submitter/data -v /home/administrator/recon/:/opt/tvb-recon thevirtualbrain/tvb-recon /bin/bash

submitter@c3957a20516a:/opt/tvb-recon$ cd patients/TVB1/
submitter@c3957a20516a:/opt/tvb-recon/patients/TVB1$ ls
raw
submitter@c3957a20516a:/opt/tvb-recon/patients/TVB1$ sudo condor_master

We trust you have received the usual lecture from the local System
Administrator. It usually boils down to these three things:

#1) Respect the privacy of others.
#2) Think before you type.
#3) With great power comes great responsibility.

[sudo] password for submitter:

submitter@c3957a20516a:/opt/tvb-recon$ ls
LICENSE README.md activate docker extern pegasus run-tests.sh tvb
NOTES.md Vagrantfile data docs patients provision setup.py
submitter@c3957a20516a:/opt/tvb-recon$ condor_status
Name OpSys Arch State Activity LoadAv Mem ActvtyTime

slot1@c3957a20516a LINUX X86_64 Unclaimed Idle 0.710 4005 0+00:00:04
slot2@c3957a20516a LINUX X86_64 Unclaimed Idle 0.000 4005 0+00:00:27
slot3@c3957a20516a LINUX X86_64 Unclaimed Idle 0.000 4005 0+00:00:28
slot4@c3957a20516a LINUX X86_64 Unclaimed Idle 0.000 4005 0+00:00:29
slot5@c3957a20516a LINUX X86_64 Unclaimed Idle 0.000 4005 0+00:00:30
slot6@c3957a20516a LINUX X86_64 Unclaimed Idle 0.000 4005 0+00:00:31
slot7@c3957a20516a LINUX X86_64 Unclaimed Idle 0.000 4005 0+00:00:32
slot8@c3957a20516a LINUX X86_64 Unclaimed Idle 0.000 4005 0+00:00:25
Total Owner Claimed Unclaimed Matched Preempting Backfill

    X86_64/LINUX     8     0       0         8       0          0        0

           Total     8     0       0         8       0          0        0

submitter@c3957a20516a:/opt/tvb-recon$ cd pegasus/
submitter@c3957a20516a:/opt/tvb-recon/pegasus$ python run_sequentially.py "1"
Starting to process the following subjects: %s [1]
Starting to process the subject: TVB1
Folder /home/submitter/data/TVB1/configs has been created...
Configuration files for subject TVB1 are ready!
Checking currently running job ids...
Currently running job ids are: []
Starting pegasus run for subject: TVB1with atlas: default
main_pegasus.sh: 7: main_pegasus.sh: Bad substitution
/opt/tvb-recon
2022-06-24 07:52:49,864 - tvb.recon.dax.configuration - INFO - Parsing patient configuration file /home/submitter/data/TVB1/configs/patient_flow.properties
Removing Generate 5tt MIF -> Tracts SIFT
Removing gen_mapping_details -> convert_output
Removing Convert APARC+ASEG to NIFTI with good orientation -> convert_output
Removing Recon-all for T1 -> qc_snapshot
Removing Recon-all for T1 -> qc_snapshot
2022.06.24 07:52:53.351 UTC:
2022.06.24 07:52:53.357 UTC: -----------------------------------------------------------------------
2022.06.24 07:52:53.362 UTC: File for submitting this DAG to Condor : TVB-PIPELINE-0.dag.condor.sub
2022.06.24 07:52:53.367 UTC: Log of DAGMan debugging messages : TVB-PIPELINE-0.dag.dagman.out
2022.06.24 07:52:53.378 UTC: Log of Condor library output : TVB-PIPELINE-0.dag.lib.out
2022.06.24 07:52:53.383 UTC: Log of Condor library error messages : TVB-PIPELINE-0.dag.lib.err
2022.06.24 07:52:53.388 UTC: Log of the life of condor_dagman itself : TVB-PIPELINE-0.dag.dagman.log
2022.06.24 07:52:53.393 UTC:
2022.06.24 07:52:53.409 UTC: -----------------------------------------------------------------------
2022.06.24 07:53:00.074 UTC: Created Pegasus database in: sqlite:////home/submitter/.pegasus/workflow.db
2022.06.24 07:53:00.080 UTC: Your database is compatible with Pegasus version: 4.8.2
2022.06.24 07:53:00.214 UTC: Submitting to condor TVB-PIPELINE-0.dag.condor.sub
2022.06.24 07:53:00.340 UTC: Submitting job(s).
2022.06.24 07:53:00.345 UTC: 1 job(s) submitted to cluster 1.
2022.06.24 07:53:00.350 UTC:
2022.06.24 07:53:00.356 UTC: Your workflow has been started and is running in the base directory:
2022.06.24 07:53:00.362 UTC:
2022.06.24 07:53:00.368 UTC: /home/submitter/pegasus/submit/submitter/pegasus/TVB-PIPELINE/run0001
2022.06.24 07:53:00.373 UTC:
2022.06.24 07:53:00.378 UTC: *** To monitor the workflow you can run ***
2022.06.24 07:53:00.383 UTC:
2022.06.24 07:53:00.389 UTC: pegasus-status -l /home/submitter/pegasus/submit/submitter/pegasus/TVB-PIPELINE/run0001
2022.06.24 07:53:00.394 UTC:
2022.06.24 07:53:00.406 UTC: *** To remove your workflow run ***
2022.06.24 07:53:00.411 UTC:
2022.06.24 07:53:00.416 UTC: pegasus-remove /home/submitter/pegasus/submit/submitter/pegasus/TVB-PIPELINE/run0001
2022.06.24 07:53:00.422 UTC:
2022.06.24 07:53:05.437 UTC: Time taken to execute is 8.555 seconds
Checking currently running job ids...
Currently running job ids are: ['0001']
The job that has been started has the id: 0001
Starting to monitor the submit folder: /home/submitter/pegasus/submit/submitter/pegasus/TVB-PIPELINE/run0001 ...
Checked at Fri, 24 Jun 2022 07:53:06 and monitord.done file was not generated yet!
`

The job fails at around 7.1% . Checking the status of the workflow gives the following output:
submitter@c3957a20516a:/opt/tvb-recon$ pegasus-status -l /home/submitter/pegasus/submit/submitter/pegasus/TVB-PIPELINE/run0001 (no matching jobs found in Condor Q) UNRDY READY PRE IN_Q POST DONE FAIL %DONE STATE DAGNAME 170 0 0 0 0 13 1 7.1 Failure *TVB-PIPELINE-0.dag Summary: 1 DAG total (Failure:1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants