Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert to pytorch installation with uv in CI #3826

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

DanielYang59
Copy link
Contributor

@DanielYang59 DanielYang59 commented May 13, 2024

Summary

Revert to pytorch installation with uv in CI, as it appears astral-sh/uv#1921 has been resolved.

However we should still keep a close eye on this as I'm not sure how reliable the fix is.

Hopefully this could also help get rid of the following random error which causes false negative CI failures:

error: Failed to download distributions
  Caused by: Failed to fetch wheel: torch==2.2.1
  Caused by: Failed to extract archive
  Caused by: error decoding response body
  Caused by: request or response body error
  Caused by: error reading a body from connection
  Caused by: end of file before message length reached

And then:

> Run micromamba activate pmg
/home/runner/work/_temp/6fa977f4-8025-4e15-8cdc-64e64c6fc7ed.sh: line 2: pytest: command not found

@DanielYang59 DanielYang59 changed the title Revert to pytorch installation with uv Revert to pytorch installation with uv in CI May 13, 2024
@DanielYang59 DanielYang59 marked this pull request as ready for review May 13, 2024 08:26
@janosh janosh added housekeeping Linting fixes, cleaning up and refactoring code ci Continuous integration labels May 13, 2024
Copy link
Member

@janosh janosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @DanielYang59! 👍

@janosh janosh enabled auto-merge (squash) May 13, 2024 11:51
@DanielYang59
Copy link
Contributor Author

DanielYang59 commented May 13, 2024

Looks like we still have the issue where uv cannot install torch randomly. I would report this to uv soon.

To fix this, we might need to install torch and explicitly skip torch for uv.

Before this PR, even after torch is already successfully installed by pip, calling uv pip install torch again would still fail (therefore the random errors we have seen).

@janosh
Copy link
Member

janosh commented May 13, 2024

we're getting

pytest: command not found

which is not a torch related error message?

strange that it only happens on the 1st runner. maybe a race condition? we could try python -m pytest ...

@DanielYang59
Copy link
Contributor Author

DanielYang59 commented May 13, 2024

we're getting

pytest: command not found

which is not a torch related error message?

My bad, I didn't say it clear enough. If you look more closely into the log, the failure actually comes from the "Install pymatgen and dependencies" stage (for some reason it still passed).

image

strange that it only happens on the 1st runner. maybe a race condition? we could try python -m pytest ...

Not sure. I thought it just fails randomly...No idea why. I don't think it has anything to do with the split itself though.

Failed at split 1

Failed at split 5

Failed at split 8

@DanielYang59 DanielYang59 marked this pull request as draft May 13, 2024 12:07
@janosh
Copy link
Member

janosh commented May 13, 2024

at least it's no longer failing due to timeouts. looks like now we're hitting this issue astral-sh/uv#2586

@DanielYang59
Copy link
Contributor Author

DanielYang59 commented May 13, 2024

at least it's no longer failing due to timeouts. looks like now we're hitting this issue astral-sh/uv#2586

Look like this one. I guess we would need to still install torch with pip, and somehow prevent uv from trying to install torch later (not sure how to achieve this though, as torch is not a direct dependency of pymatgen, but chgnet).

Maybe we could try to install everything but [optional] with uv, and then use uv to install [optional] only? I think it's really bad, as the package itself would be installed twice, there should be a better way that I'm not aware of.

Or, just fall back to slow but reliable pip altogether?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Continuous integration housekeeping Linting fixes, cleaning up and refactoring code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants