Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different result on CPU and GPU #63984

Open
bahar3474 opened this issue Apr 29, 2024 · 2 comments
Open

Different result on CPU and GPU #63984

bahar3474 opened this issue Apr 29, 2024 · 2 comments
Assignees
Labels

Comments

@bahar3474
Copy link

bahar3474 commented Apr 29, 2024

Hello everyone,

I'm currently facing an issue and would greatly appreciate any assistance you can offer.

I have a paddleOCR model that I'm serving through a Docker image based on version 2.5.1 of the paddlepaddle/paddle image. On one workstation, it works well with the 'use_gpu' attribute set to True or False. However, on another workstation, the outputs of the model are incorrect when it runs on GPU. I have attached the results of the model in both situations.

CPU result:
cpu

GPU result:
gpu
It appears that the computation of the model is incorrect when it's running on GPU. It's important to note that I'm not encountering any errors or warnings, just inaccurate results.

Some additional context:

  • This issue did not occur with version 2.4 of the Paddle library, but I need to upgrade my Paddle version.
  • I have attempted to make the environments of the two workstations as similar as possible, but due to them having different GPUs (3090 Ti and Tesla P40), achieving complete parity is not feasible. And while it runs on Docker, I'm not sure how the host effect on the reult of the model on container.
  • The images above show the result of the text detection model, but I've had the same experience with the text recognition and layout models as well.

What could be the root cause of this inconsistency?

Thank you in advance for any insights or suggestions you can provide.

@Vvsmile
Copy link
Contributor

Vvsmile commented Apr 30, 2024

It seems like you use PaddleOCR, and I want to suggest you post an issue below the repo PaddleOCR, but I found you have already submitted an issue below it with a link(PaddlePaddle/PaddleOCR#12027).

First of all, the reply suggests you update the version of Paddle to 2.5.2, is the problem fixed? if not, please try Paddle 2.6 and the Paddle develop branch. The latest updates may have an effect on your bugs.

Secondly, your question is more similar to Paddle framework bugs. Maybe the version updates import bugs to the kernel. As for more detailed exploration, please supply more details about which part or algorithm you used in PaddleOCR, what data type format you used, and so on. If you have any minimal reproducible environment(such as data and script), it is better.

@bahar3474
Copy link
Author

Hello, Vvsmile,
Thank you for addressing my issue and helping me.

Unfortunately, updating the Paddle version to 2.5.2 doesn't fix my problem. I still get different results on CPU and GPU. I have tried to update my Paddle version to 2.6, but it was not compatible with the version of PaddleOCR that I'm supposed to use.

I also believe that the bug is connected with the Paddle framework, not PaddleOCR, because I don't face any challenges with version 2.4 of Paddle. From PaddleOCR, I use text detection, text recognition, and the LayoutXML model. All of them produce the wrong result when I run my script on a machine with the following configuration:

  • Nvidia driver version: 550.54.15
  • CUDA version: 12.4
  • GPU: Tesla P40

I have to mention that I checked a lot of other versions of CUDA and Nvidia driver (CUDA: 10.2, 11.6, 11.7, 11.8 / Nvidia: 510, 535, 545, 550), but none of them changed this behavior. The only thing that was constant was the Tesla P40 GPU, and I'm wondering if that's the reason. I checked the change logs of version 2.5 of Paddle and found out that you're not supporting GPUs with Kepler architectures anymore after 2.5 (and supporting CUDA 12), but Pascal series was still on the list. Is it possible that maybe Paddle should have discarded Pascal GPUs too?

I mentioned in my previous message that the Docker image works perfectly on both GPU and CPU on another machine. Here are the configs of this one:

  • Nvidia driver version: 545.23.08
  • CUDA version: 12.3
  • GPU: NVIDIA GeForce RTX 3090 Ti

I provide a Docker image, and I hope it helps you to reproduce the issue. You can find it in this link. By following these steps, you can see the English text detection model result on CPU and GPU.

docker build -t repr-gpu-bug .
mkdir result
docker run --runtime nvidia -v ./result:/home/result repr-gpu-bug python test.py

After that, you will find two images in the result directory. When I run this docker on my 3090 Ti machine, the two images are similar. However, on the P40 GPU, they are different. I have uploaded the result for you on Google Drive too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants