Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds ParallelGroupAFQ #1124

Merged
merged 19 commits into from May 16, 2024
Merged

Adds ParallelGroupAFQ #1124

merged 19 commits into from May 16, 2024

Conversation

teresamg
Copy link
Contributor

Creates a ParallelGroupAFQ class inheriting from GroupAFQ to allow for easier parallelization through pydra. Does not currently work; in progress.

@pep8speaks
Copy link

pep8speaks commented Apr 10, 2024

Hello @teresamg! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 352:80: E501 line too long (86 > 79 characters)
Line 1012:1: W293 blank line contains whitespace

Comment last updated at 2024-05-15 23:45:53 UTC

Copy link
Collaborator

@arokem arokem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really neat implementation now! My main suggestion is to change ParticipantAFQ so that the names of kwargs and attributes are well-matched, so that you can make this even neater.

AFQ/api/group.py Outdated Show resolved Hide resolved
AFQ/api/group.py Outdated Show resolved Hide resolved
AFQ/api/group.py Outdated Show resolved Hide resolved
@arokem
Copy link
Collaborator

arokem commented Apr 18, 2024

The code looks quite neat now!

@36000 : could you take a quick look when you get a chance?

Any ideas about how we can test this? I guess that we'd have to set this up as a nightly test, possibly running this parallelized with concurrent futures across two HBN subjects?

@36000
Copy link
Collaborator

36000 commented Apr 18, 2024

my thoughts, I will implement these:

  1. add a similar looking export function to wrap export like the export_all implemented here
  2. add pydra to setup.cfg
  3. add a nighly test on 2 hbn subjects using concurrent futures for pydra

AFQ/api/group.py Outdated Show resolved Hide resolved
@arokem arokem marked this pull request as ready for review April 18, 2024 21:08
@arokem arokem changed the title WIP: Adds ParallelGroupAFQ Adds ParallelGroupAFQ Apr 18, 2024
@36000
Copy link
Collaborator

36000 commented Apr 22, 2024

I made some minor changes here. I made pydra a required library for pyAFQ (its pip installable with few dependencies, this will just be easier for most users). We now try to catch that error Ariel mentioned (let me know if this doesnt work). And I added an export function based on the export_all function. Next I will add the nightly test then I think we can merge this

@36000
Copy link
Collaborator

36000 commented Apr 22, 2024

I wrote this test:

def test_AFQ_pydra():
    _, bids_path = afd.fetch_hbn_preproc(["NDARAA948VFH", "NDARAV554TP2"])
    pga = ParallelGroupAFQ(bids_path, preproc_pipeline="qsiprep")
    pga.export_all()

and i am getting this error:

FAILED AFQ/tests/test_api.py::test_AFQ_pydra - ValueError: Split is missing values for the following fields ['pAFQ_kwargs']

Either of you two know what causes this?

@arokem
Copy link
Collaborator

arokem commented Apr 24, 2024

Could you please push that test to this PR? I'm not sure what's up and would like to try to debug locally.

@36000
Copy link
Collaborator

36000 commented Apr 24, 2024

Sure, I figured out this was due to me using a different pydra version (the latest, 0.23). But now I am getting picking problems. I will push some of the changes I have made, but I think maybe we will have to talk about this in person at some point. I am running into a few issues

@36000
Copy link
Collaborator

36000 commented May 7, 2024

@teresamg mind trying this to see if it works?
I factored out bidslayout

@teresamg
Copy link
Contributor Author

Looks like it ran successfully, both locally and on Hyak!

@arokem
Copy link
Collaborator

arokem commented May 15, 2024

Did it generate the combined tract profiles file? I also ran a test on hyak:

import os.path as op

from AFQ.api.group import ParallelGroupAFQ
from AFQ.definitions.image import RoiImage
import AFQ.api.bundle_dict as abd
import AFQ.data.fetch as afd


_, bids_path = afd.fetch_hbn_preproc(
        ["NDARZT957CWG",
         "NDARZU279XR3",
         "NDARZU401RCU",
         "NDARZU822WN3",
         "NDARZV421TCZ",
         "NDARZW262ZLV",
         "NDARZX163EWC",
         ],
        path="/gscratch/scrubbed/arokem/data/")
    
my_afq = ParallelGroupAFQ(
    bids_path=bids_path,
    preproc_pipeline="qsiprep",
    parallel_params={
        "submitter_params": {
            "plugin": "slurm",
            "sbatch_args": "-J test \
                            -p ckpt \
                            --nodes=1 \
                            --cpus-per-task=8 \
                            --gpus=1 \
                            -A escience \
                            --mem=64G \
                            --time=2:00:00 \
                            -o /gscratch/scrubbed/arokem/logs/test.out \
                            -e /gscratch/scrubbed/arokem/logs/test.err \
                            --mail-user=arokem@uw.edu \
                            --mail-type=ALL"
        },
        "cache_dir": "/gscratch/scrubbed/arokem/tmp"
    }
)

my_afq.export_all()

which worked great at individual subject level, but I can't find the final combined tract profiles file, which I was expecting to find.

@teresamg
Copy link
Contributor Author

It did for me... does each subject have the *_desc-profiles_dwi.csv?

**pAFQ_kwargs.kwargs)
pAFQ.export_all(viz, xforms, indiv)

for dir in finishing_params["output_dirs"]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little confused at what these lines here are trying to do

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind this is that we want to generate a group-level csv file that contains the merged tract profiles. But we only want to do this once, so each task checks whether all of the other tasks have completed. If they have not, it returns and nothing happens. But if it is the last task running, the condition is always fulfilled and the code continues to lines 1124-1126, which run one last GroupAFQ, which would generate the combined tract profile. I worry that this is not super robust, though, and could potentially run into some funky race conditions. In my own test on hyak, I did not get the combined tract profile did not work. So, maybe we need to do something a bit more direct to create the group-level derivatives.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1120 is checking whether the process is the last process running by trawling for each subject's *_desc-profiles_dwi.csv. If all are present, it exports the GroupAFQ object to create the final tract_profiles.csv.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK that makes sense

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if pydra has some way to do this through their API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ariel, if all csvs are present, perhaps you can print "output_dirs" and double check that none of the paths are wonky or null? Is there anything unusual in test.out or test.err?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, good call:

slurmstepd: error: *** JOB 18301898 ON g3030 CANCELLED AT 2024-05-14T22:33:02 DUE TO TIME LIMIT ***

I will try this again with a longer max time limit than two hours.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be tricky to do this through pydra, because we'd have to somehow leave the submitting process running somehow for the duration of all the sub-processes, and that may become very cumbersome for very large datasets. I think that assuming we don't leave the program running until all tasks are completed, we have two options:

  1. Something like what is being done here.
  2. Don't merge and let users do that separately in a separate program.

I suggest that if my current test works and we get what we expect that we go ahead and merge this PR as is, including this bit of code. I can follow up with a documentation example, based on my current experiments with the HBN data. Users need to know that the final merge into a tract profiles file will fail if any of the sub-tasks fail, so maybe we just need to clarify this in the documentation.

Or is there anything else we need to address before merging?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. This looks good to me!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean about the submitting process... it ends as soon as the jobs are submitted, not completed, and its the last job that does the final merge (all of them check for it). Plus GroupAFQ also fails to produce tract_profiles.csv if anything fails.

AFQ/api/group.py Outdated Show resolved Hide resolved
@arokem
Copy link
Collaborator

arokem commented May 16, 2024

I don't know if this is directly related to the content of this PR, but I am now getting an error in visualizing the standard set of bundles, where the visualization code is raising:

INFO:AFQ:Generating colorful lines from tractography...
KeyError: 'Forceps Minor'

Presumably because it's looking for a tract that is now no longer part of the default set of tracts (because we're using the more granular set of CC tracts).

@36000 : any chance this is related to changes you introduced here to how data is passed between GroupAFQ and ParticipantAFQ?

If you think this is unrelated, I think we can probably merge this PR, and fix this issue elsewhere.

@36000
Copy link
Collaborator

36000 commented May 16, 2024

Not sure what would cause this, but I think it is unrelated, as removing overlapping bundle definitions happens in the init method of the BundleDict, so I don't think a race condition is causing this error.

@36000 36000 merged commit 6d0ffb6 into yeatmanlab:master May 16, 2024
9 checks passed
@teresamg teresamg deleted the pydra branch May 16, 2024 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants