Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training take 2x longer since 1.13.0 with FastAI #1234

Open
mhtrinh opened this issue Mar 21, 2024 · 5 comments
Open

Training take 2x longer since 1.13.0 with FastAI #1234

mhtrinh opened this issue Mar 21, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@mhtrinh
Copy link

mhtrinh commented Mar 21, 2024

Training out model which is based on FastAI is taking 2x longer with Clearml 1.13.0 compare to 1.12.2

There are no error or warning

I cannot share our code. Here is the requirements.txt of the virtualenv:

absl-py==2.1.0
adal==1.2.7
adlfs==2023.8.0
aiobotocore==2.5.4
aiohttp==3.9.3
aioitertools==0.11.0
aiosignal==1.3.1
annotated-types==0.6.0
antlr4-python3-runtime==4.13.1
applicationinsights==0.11.10
argcomplete==3.1.6
async-timeout==4.0.3
attrs==23.2.0
azure-appconfiguration==1.1.1
azure-batch==14.1.0
azure-cli==2.58.0
azure-cli-core==2.58.0
azure-cli-telemetry==1.1.0
azure-common==1.1.28
azure-core==1.30.1
azure-cosmos==3.2.0
azure-data-tables==12.4.0
azure-datalake-store==0.0.53
azure-graphrbac==0.60.0
azure-identity==1.15.0
azure-keyvault-administration==4.4.0b2
azure-keyvault-certificates==4.7.0
azure-keyvault-keys==4.9.0b3
azure-keyvault-secrets==4.7.0
azure-mgmt-advisor==9.0.0
azure-mgmt-apimanagement==4.0.0
azure-mgmt-appconfiguration==3.0.0
azure-mgmt-appcontainers==2.0.0
azure-mgmt-applicationinsights==1.0.0
azure-mgmt-authorization==4.0.0
azure-mgmt-batch==17.2.0
azure-mgmt-batchai==7.0.0b1
azure-mgmt-billing==6.0.0
azure-mgmt-botservice==2.0.0
azure-mgmt-cdn==12.0.0
azure-mgmt-cognitiveservices==13.5.0
azure-mgmt-compute==30.4.0
azure-mgmt-containerinstance==10.1.0
azure-mgmt-containerregistry==10.3.0
azure-mgmt-containerservice==29.1.0
azure-mgmt-core==1.4.0
azure-mgmt-cosmosdb==9.4.0
azure-mgmt-databoxedge==1.0.0
azure-mgmt-datalake-nspkg==3.0.1
azure-mgmt-datalake-store==0.5.0
azure-mgmt-datamigration==10.0.0
azure-mgmt-devtestlabs==4.0.0
azure-mgmt-dns==8.0.0
azure-mgmt-eventgrid==10.2.0b2
azure-mgmt-eventhub==10.1.0
azure-mgmt-extendedlocation==1.0.0b2
azure-mgmt-hdinsight==9.0.0
azure-mgmt-imagebuilder==1.3.0
azure-mgmt-iotcentral==10.0.0b2
azure-mgmt-iothub==3.0.0
azure-mgmt-iothubprovisioningservices==1.1.0
azure-mgmt-keyvault==10.3.0
azure-mgmt-kusto==0.3.0
azure-mgmt-loganalytics==13.0.0b4
azure-mgmt-managedservices==1.0.0
azure-mgmt-managementgroups==1.0.0
azure-mgmt-maps==2.0.0
azure-mgmt-marketplaceordering==1.1.0
azure-mgmt-media==9.0.0
azure-mgmt-monitor==5.0.1
azure-mgmt-msi==7.0.0
azure-mgmt-netapp==10.1.0
azure-mgmt-nspkg==3.0.2
azure-mgmt-policyinsights==1.1.0b4
azure-mgmt-privatedns==1.0.0
azure-mgmt-rdbms==10.2.0b15
azure-mgmt-recoveryservices==2.5.0
azure-mgmt-recoveryservicesbackup==8.0.0
azure-mgmt-redhatopenshift==1.4.0
azure-mgmt-redis==14.3.0
azure-mgmt-resource==23.1.0b2
azure-mgmt-search==9.1.0
azure-mgmt-security==5.0.0
azure-mgmt-servicebus==8.2.0
azure-mgmt-servicefabric==1.0.0
azure-mgmt-servicefabricmanagedclusters==1.0.0
azure-mgmt-servicelinker==1.2.0b1
azure-mgmt-signalr==2.0.0b1
azure-mgmt-sql==4.0.0b15
azure-mgmt-sqlvirtualmachine==1.0.0b5
azure-mgmt-storage==21.1.0
azure-mgmt-synapse==2.1.0b5
azure-mgmt-trafficmanager==1.0.0
azure-mgmt-web==7.2.0
azure-monitor-query==1.2.0
azure-multiapi-storage==1.2.0
azure-nspkg==3.0.2
azure-storage-blob==12.19.1
azure-storage-common==1.4.2
azure-synapse-accesscontrol==0.5.0
azure-synapse-artifacts==0.18.0
azure-synapse-managedprivateendpoints==0.4.0
azure-synapse-spark==0.2.0
bcrypt==4.1.2
blis==0.7.11
botocore==1.31.17
cachetools==5.3.3
catalogue==2.0.10
certifi==2024.2.2
cffi==1.16.0
chardet==5.2.0
charset-normalizer==3.3.2
clearml==1.14.2
click==8.1.7
cloudpathlib==0.16.0
colorama==0.4.6
confection==0.1.4
contextlib2==21.6.0
contourpy==1.2.0
cryptography==42.0.5
cycler==0.12.1
cymem==2.0.8
decorator==5.1.1
Deprecated==1.2.14
distro==1.9.0
fabric==3.2.2
fastai==2.7.14
fastcore==1.5.29
fastdownload==0.0.7
fastprogress==1.0.3
filelock==3.13.1
fonttools==4.49.0
frozenlist==1.4.1
fsspec==2023.6.0
furl==2.1.3
gitdb==4.0.11
GitPython==3.1.42
google-auth==2.28.1
google-auth-oauthlib==1.0.0
grpcio==1.62.0
huggingface-hub==0.21.4
humanfriendly==10.0
idna==3.6
iniconfig==2.0.0
invoke==2.2.0
isodate==0.6.1
javaproperties==0.5.2
Jinja2==3.1.3
jmespath==1.0.1
joblib==1.3.2
jsondiff==2.0.0
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
kiwisolver==1.4.5
knack==0.11.0
kornia==0.7.1
lakefs-client==0.107.0
langcodes==3.3.0
Markdown==3.5.2
MarkupSafe==2.1.5
matplotlib==3.8.3
ml-collections==0.1.1
mpmath==1.3.0
msal==1.26.0
msal-extensions==1.0.0
msrest==0.7.1
msrestazure==0.6.4
multidict==6.0.5
murmurhash==1.0.10
networkx==3.2.1
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.4.99
nvidia-nvtx-cu12==12.1.105
oauthlib==3.2.2
onnx==1.14.0
opencv-python-headless==4.8.1.78
orderedmultidict==1.0.1
packaging==23.2
pandas==2.2.1
paramiko==3.4.0
pathlib2==2.3.7.post1
Pillow==9.5.0
pipdeptree==2.13.0
pkginfo==1.10.0
portalocker==2.8.2
preshed==3.0.9
protobuf==4.25.3
psutil==5.9.8
pyarrow==13.0.0
pyasn1==0.5.1
pyasn1-modules==0.3.0
pycomposefile==0.0.30
pycparser==2.21
pydantic==2.6.3
pydantic_core==2.16.3
PyGithub==1.59.1
Pygments==2.17.2
PyJWT==2.4.0
PyNaCl==1.5.0
pyOpenSSL==24.0.0
pyparsing==3.1.2
PySocks==1.7.1
python-dateutil==2.9.0.post0
pytz==2024.1
PyYAML==6.0.1
referencing==0.33.0
requests==2.31.0
requests-oauthlib==1.3.1
rpds-py==0.18.0
rsa==4.9
s3fs==2023.6.0
safetensors==0.4.2
scikit-learn==1.4.1.post1
scipy==1.12.0
scp==0.13.6
seaborn==0.13.2
self-supervised==1.0.4
semver==2.13.0
six==1.16.0
smart-open==6.4.0
smmap==5.0.1
spacy==3.7.4
spacy-legacy==3.0.12
spacy-loggers==1.0.5
srsly==2.4.8
sshtunnel==0.1.5
sympy==1.12
tabulate==0.9.0
tensorboard==2.14.0
tensorboard-data-server==0.7.2
thinc==8.2.3
threadpoolctl==3.3.0
timm==0.9.16
torch==2.2.1
torchvision==0.17.1
tqdm==4.66.2
triton==2.2.0
typer==0.9.0
typing_extensions==4.10.0
tzdata==2024.1
urllib3==1.26.18
wasabi==1.1.2
weasel==0.3.4
websocket-client==1.3.3
Werkzeug==3.0.1
wrapt==1.16.0
xmltodict==0.13.0
yarl==1.9.4

Simply pip install clearml==1.12.2 and pip install clearml==1.13.0 and re-run the same code.

OS: openSUSE Leap 15.4

OS  Linux-5.14.21-150400.24.60-default-x86_64-with-glibc2.31
cpu_cores 20
gpu_count 1
gpu_driver_cuda_version 12.4
gpu_driver_version 550.54.14
gpu_memory 48GB
gpu_type NVIDIA RTX A6000
@mhtrinh mhtrinh added the bug Something isn't working label Mar 21, 2024
@AlexandruBurlacu
Copy link

Hey @mhtrinh, have you observed the same slowdown with the newer versions of ClearML? The most recent one is 1.14.4

@mhtrinh
Copy link
Author

mhtrinh commented Mar 22, 2024

Yes, this happen also with the current version 1.14.4, as 2x slower.

Note : this may be specific to fastai as we have another network based on yolov5 and this is not happening

@eugen-ajechiloae-clearml
Copy link
Collaborator

Hi @mhtrinh ! It looks like calculating the metrics that ClearML reports may take a long time. We will try to improve performance.
In the meantime, you could disable fastai bindings using auto_connect_frameworks={"fastai": False} in Task.init

@eugen-ajechiloae-clearml
Copy link
Collaborator

Hi @mhtrinh ! We will release a fix for this issue in the next clearml release clearml==1.16.0

allegroai-git pushed a commit that referenced this issue May 17, 2024
@pollfly
Copy link
Contributor

pollfly commented May 19, 2024

Hey @mhtrinh! Just letting you know that this issue has been resolved in the recently released v1.16.0. Let us know if there are any issues :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants