Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dcgm-exporter README docker instructions contains incorrect commands and information #17370

Open
mbacchi opened this issue Apr 9, 2024 · 0 comments

Comments

@mbacchi
Copy link

mbacchi commented Apr 9, 2024

The dcgm-exporter README.md has incorrect information about running dcgm-exporter in Docker. There are 2 major problems with these instructions which we would appreciate you fix.

  1. In the Docker section, you indicate that we should create a counters csv file with specific fields that you suggest should be used. Unfortunately using that counters file with the most recent version of the dcgm-exporter docker image (3.3.5-3.4.1) causes a segmentation violation:

    time="2024-04-09T21:14:41Z" level=info msg="Initializing system entities of type: CPU"
    SIGSEGV: segmentation violation
    

    If I provide no counters.csv file to the docker command it works fine. (For example using no -v argument in the recommended command in your step 2 here.)

  2. Again in your recommended docker run command, you suggest using -e DCGM_EXPORTER_INTERVAL=3 which tells dcgm-exporter to read GPU metrics every 3 milliseconds. This is apparently too fast, and causes high CPU usage, which I found out when I opened this issue in the dcgm-exporter repository. The default is -e DCGM_EXPORTER_INTERVAL=30000, which does not cause a high CPU usage problem on the system

These two issues cause the dcgm-exporter to be unusable due to your suggested commands and usage. Please fix this documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant