Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Bladebit v3.1.0 multi-gpu error during plotcheck #16677

Open
Valeri4n opened this issue Oct 24, 2023 · 2 comments
Open

[Bug] Bladebit v3.1.0 multi-gpu error during plotcheck #16677

Valeri4n opened this issue Oct 24, 2023 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@Valeri4n
Copy link

Valeri4n commented Oct 24, 2023

What happened?

When running two BB plotters on one machine with two GPUs, the one on device1 gives an intermittent error when performing the plotcheck at the end. Doesn't happen with every plot. When it does happen, I see device1's PID listed as a duplicate on device0 with a smaller memory footprint. Seems there might not be complete isolation between the GPUs for all aspects of the plotter when using more than one device.

--check 50 --check-threshold 0.8

It looks like the plotter for device 1 is using device 0 when it does the check portion.

Dev0 memory is 8gb, dev1 memory is 12gb, system memory is 512gb, Ubuntu 22.04 kernel 6.2.0-34

image

image

Version

chia version 2.1.1, bladebit cuda version 3.1.0, also tested new build on develop branch - same result.

What platform are you using?

Linux

What ui mode are you using?

CLI

Relevant log output

No response

@Valeri4n Valeri4n added the bug Something isn't working label Oct 24, 2023
@harold-b harold-b self-assigned this Oct 24, 2023
@Valeri4n
Copy link
Author

Valeri4n commented Oct 29, 2023

Another issue I have noticed is that when I assigned device1 to the harvester in config.yaml, the same PID for the harvester showed up on both devices as well. I also don't see any activity on device1 gpu when using it as harvester.

image

@wjblanke
Copy link
Contributor

wjblanke commented Nov 1, 2023

we only run one processor and the gpu code is multithreaded (not python multiprocessing) so there would be only one pid. harold will look at the other issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants