Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Having trouble configuring G-Node GIN with SSH #31

Open
tsalo opened this issue Oct 25, 2023 · 7 comments
Open

Having trouble configuring G-Node GIN with SSH #31

tsalo opened this issue Oct 25, 2023 · 7 comments
Labels
bug Something isn't working

Comments

@tsalo
Copy link
Member

tsalo commented Oct 25, 2023

I created an SSH key and added it to my G-Node GIN user settings, but for some reason I can't push to it from the UPenn HPC.

Steps to reproduce:

dataset_id="ds002156"
base_dir="/cbica/home/salot/open-multi-echo-data/datasets"
superdataset_dir=${base_dir}/${dataset_id}_test
raw_dataset_dir=${superdataset_dir}/inputs/data
code_dir="/cbica/home/salot/open-multi-echo-data/code/code"

# Create the YODA superdataset
datalad create -c yoda \
    -D "Create superdataset for OpenNeuro dataset ${dataset_id}" \
    "${superdataset_dir}"

cd ../../datasets/ds002156_test/

# Clone the dataset
datalad clone -d ${superdataset_dir} \
    -D "Clone of OpenNeuro dataset. May be modified for fMRIPrep/AFNI and pushed to G-Node GIN." \
    https://github.com/ME-ICA/${dataset_id}.git ${raw_dataset_dir}

# Download the files
cd ${raw_dataset_dir}
datalad get ${raw_dataset_dir}

# Create the GIN repo (this works)
datalad create-sibling-gin \
    --access-protocol ssh \
    --dataset ${raw_dataset_dir} \
    --credential GIN \
    ME-ICA/${dataset_id}_raw

# Try to push the data to GIN
datalad push -d ${raw_dataset_dir} --to gin

This gets me the following error:

Push to 'gin':  25%|██████████████████████████████▎                                                                                          | 1.00/4.00 [00:00<00:00, 6.29k Steps/s]ssh_exchange_identification: Connection closed by remote host
Update availability for 'gin':  75%|██████████████████████████████████████████████████████████████████████████████▊                          | 3.00/4.00 [00:00<00:00, 5.12k Steps/s]CommandError: 'git -c diff.ignoreSubmodules=none -c core.quotepath=false fetch gin git-annex' failed with exitcode 128
ssh_exchange_identification: Connection closed by remote host
ConnectionOpenFailedError: 'ssh -fN -o ControlMaster=auto -o ControlPersist=15m -o ControlPath=/cbica/home/salot/.cache/datalad/sockets/... git@gin.g-node.org' failed with exitcode 255 [Failed to open SSH connection (could not start ControlMaster process)]
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
@tsalo tsalo added the bug Something isn't working label Oct 25, 2023
@tsalo tsalo pinned this issue Feb 3, 2024
@tsalo
Copy link
Member Author

tsalo commented Feb 9, 2024

I wonder if the problem is that I'm trying to push to an organization repository instead of my personal account.

Except when I try it with my personal account, I get create_sibling_gin(error): [Organization does not exist]

dataset_id="ds002156"
base_dir="/cbica/home/salot/open-multi-echo-data/datasets"
superdataset_dir=${base_dir}/${dataset_id}_test
raw_dataset_dir=${superdataset_dir}/inputs/data
code_dir="/cbica/home/salot/open-multi-echo-data/code/code"

# Create the YODA superdataset
datalad create -c yoda \
    -D "Create superdataset for OpenNeuro dataset ${dataset_id}" \
    "${superdataset_dir}"

cd ../../datasets/ds002156_test/

# Clone the dataset
datalad clone -d ${superdataset_dir} \
    -D "Clone of OpenNeuro dataset. May be modified for fMRIPrep/AFNI and pushed to G-Node GIN." \
    https://github.com/ME-ICA/${dataset_id}.git ${raw_dataset_dir}

# Download the files
cd ${raw_dataset_dir}
datalad get ${raw_dataset_dir}

# Create the GIN repo (this fails!)
datalad create-sibling-gin \
    --access-protocol ssh \
    --dataset ${raw_dataset_dir} \
    --credential GIN \
    tsalo/${dataset_id}_raw

EDIT: From @\adswa (don't want to subscribe her to this issue) this error occurs because it interprets tsalo/XX as a repo in the organization tsalo. I need to drop the tsalo/ in the repo name for it to go to my personal account.

@handwerkerd
Copy link
Member

No great wisdom, but I've had consistent issues interacting with openneuro from the NIH HPC. In my case it's urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1006)> because requests.get() isn't playing well with our system. Your error is different, but I wonder if it's a similar firewall or security issue. Instead of using all of datalad, could you try just the command that is fetching the data to see if that's causing the problem.

@tsalo
Copy link
Member Author

tsalo commented Feb 9, 2024

Fetching the data works fine, as does creating the repo on G-Node GIN. The step that fails is actually pushing the data to the GIN repository.

Some more updates:

  1. I added my SSH key from the HPC to my G-Node GIN account settings, but the settings page shows "No recent activity", so it must not be being used to create the repository.
  2. I created the necessary personal access token, and that element seems to work fine.
  3. I've tried with both host info in ~/.ssh/config and without.
    Host g-node.gin.org
      Hostname ssh.g-node.gin.org
      IdentityFile ~/.ssh/id_ed25519
      Port 443
    
    Running with that info in the config file leads it to fail more quickly, but the error messages are the same.
  4. I've tried with the environment variable DATALAD_CREDENTIAL_GIN_TOKEN or not.

@tsalo
Copy link
Member Author

tsalo commented Feb 14, 2024

I joined the Datalad office hour yesterday (where everyone was extremely helpful), and the problem appears to be that UPenn's CUBIC cluster blocks outgoing traffic through Port 22, which is the only one that G-Node GIN accepts for SSH. GitHub works because it uses Port 443 (which is typically used for HTTPS). Also, creating the sibling repo uses HTTPS, which is why that worked fine.

I have four options:

  1. File a ticket with the CUBIC admins about opening up Port 22 or adding G-Node GIN to an exception list.
  2. Open an issue on G-Node GIN to request they support SSH with Port 443.
  3. Forward data from CUBIC to another server through Port 443, then push from that other server to GIN, as long as the second server can push using Port 22.
  4. Start using UMinn's MSI cluster, which may allow outgoing traffic through Port 22 (I have to check with them).

@handwerkerd
Copy link
Member

handwerkerd commented Feb 14, 2024

Of the four, I'd recommend 1 & 2.

  1. Filing a ticket with CUBIC will either semi-efficiently let you keep working on this or get a response explaining why this is an actual security issue (which you could share with G-Node GIN).

  2. This is going to be an issue with other clusters and if there's a solution from G-Node GIN, then this is broadly solved.

Any clue if this might be the same issue I'm having with openneuro and the NIH cluster? (i.e. the ssh certificate is being blocked/garbled on one port, but might work if openneuro transfers data over another port?)

@tsalo
Copy link
Member Author

tsalo commented Feb 14, 2024

Any clue if this might be the same issue I'm having with openneuro and the NIH cluster? (i.e. the ssh certificate is being blocked/garbled on one port, but might work if openneuro transfers data over another port?)

It definitely could be. If you're using datalad to push to OpenNeuro I'd recommend joining one of the weekly datalad office hours like I did. They had me try out a series of commands to see what the situation was on CUBIC and were able to diagnose the problem.

@tsalo
Copy link
Member Author

tsalo commented Feb 15, 2024

After speaking with Chris M., it might be inappropriate to push derivatives to GIN anyway. OpenNeuro seems to support derivatives-only datasets now, so I might want to switch to that instead.

EDIT: The problem with that is that the openneuro credential tool crashes the UPenn cluster's login node. See OpenNeuroOrg/openneuro#3015.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants