Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Login and connection issues #146

Open
jthiels opened this issue Dec 21, 2023 · 1 comment
Open

Login and connection issues #146

jthiels opened this issue Dec 21, 2023 · 1 comment

Comments

@jthiels
Copy link

jthiels commented Dec 21, 2023

Hi, seeing inconsistent login issues where the SIRIUS session stops recognizing the login session after about 40 small-mass jobs, the rest of them fail. We are also trying to submit within the same session across different nodes and have both specified cores in the SIRIUS command as well as adding a 'sleep' line to see if providing a quick break to the server prevents possible collisions. Each job runs on 36 cores (the total number of cores on a single node).

Currently the user I'm working with cannot log in at all (after logging in today successfully before). The login is failing repeatedly both in the GUI and in the command line.

We were using the 5.8.5 version through a conda environment and also the 5.8.6-snapshot binary.

Is the server down or having other issues? As the login has previously worked today and some jobs have run successfully, we're not sure how to troubleshoot from here further.

@mfleisch
Copy link
Collaborator

Hey,
this is issue is likely cause by sharing the refresh_token of a login among multiple SIRIUS instances.

When you login in SIRIUS it stores a sol called refresh_token on your system and keeps the a so called acces_token in memory.
Acces_tokens have a short life time and are used to authorize your queries to our application server (e.g. predicting fingerprints). The refresh_token are long living and are used to request a refreshed acces_token from our login server. Further the refresh_token is single use, so when it is used create a new acces_token it also gets replaced by a new refresh_tokens. This is important to prevent misuse in case a long living refresh_token gets stolen. In case a refresh_tokens is used a second time the whole "token chain" becomes invalid and the user has to re-login using username and password.

I assume that your compute nodes share the same user home directory. Per default the token is stored in the SIRIUS config directory in the user home directory (e.g. /home/USERNAME/.sirius-5.8/). If now multiple SIRIUS instances use the same config directory, it happens that a refresh_token is used twice and the tokens become invalid.

You can solve this by using a separate "config directory" on each node (or more precisely for each SIRIUS instance running). This can be achieved via the command line parameter --workspace.

In case you want to automate the login per instance without the risk to leak your credentials in some console logs you can use login via environment variables. In that case you can provide the name of the environment variables where the credentials are stored instead of the actual credentials.

E.g. sirius login --password-env MY_PW_VARIABLE --user-env MY_USER_VARIABLE


Regarding the login problem, I assume that you IP or account got temporarily banned due to too many failing token requests. If the problem still persists please send me an email with the affected username (email address).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants