Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor performance with default request type queue size #12

Open
mp49 opened this issue Jun 17, 2022 · 7 comments
Open

Poor performance with default request type queue size #12

mp49 opened this issue Jun 17, 2022 · 7 comments

Comments

@mp49
Copy link
Contributor

mp49 commented Jun 17, 2022

Hi,

I've been playing around with pvaDriver today, transporting 0.5MB images between two areaDetector IOCs with and without LZ4 compression.

I saw poor performance with the pvaDriver with even a low frame rate of 100Hz (only 50MB/s), in that I was dropping arrays every few seconds.

Then I made this change in pvaDriver:

diff --git a/pvaDriverApp/src/pvaDriver.cpp b/pvaDriverApp/src/pvaDriver.cpp
index 02b5d4a..278c21f 100644
--- a/pvaDriverApp/src/pvaDriver.cpp
+++ b/pvaDriverApp/src/pvaDriver.cpp
@@ -23,8 +23,8 @@
#include <epicsExport.h>
#include "pvaDriver.h"

-//#define DEFAULT_REQUEST "record[queueSize=100]field()"
-#define DEFAULT_REQUEST "field()"
+#define DEFAULT_REQUEST "record[queueSize=100]field()"
+//#define DEFAULT_REQUEST "field()"

And that worked wonders. I was able to reliably run at 100Hz, 800Hz and even 1500Hz without dropping frames.

It seems like that driver was used with queueSize=100 at some point but that it was commented out.

I think this parameter could be made configurable as an argument to pvaDriverConfig(). Does that sound reasonable? If so, I can make a pull request and test it.

Matt

@MarkRivers
Copy link
Member

Hi @mp49, that is interesting and your proposal for a PR sounds good.

I am puzzled however on some benchmarks I did back in 2017. They are shown in this slide from the EPICS meeting. I seems like I was getting 1.2 GB/s when using 1 pvaDriver. The images were large, so it was only about 100 frames/s. Is this consistent with what you were seeing? My tests were done on a single machine, so the PVA traffic was not going across a physical wire.
image

@mp49
Copy link
Contributor Author

mp49 commented Jun 18, 2022

Thanks, I'll work on that PR.

I'll also do more testing next week, with different image sizes and frame rates. I was testing on a RHEL8 VM, which only has 2 cores, and the sim detector IOC and the pvaDriver IOC were running on the same VM.

I couldn't find any information on the 'queueSize' parameter for the request in the PVA documentation, so I'm not fully sure what the default size is, but grepping the source code leads me to think it is only 1 or 2.

@MarkRivers
Copy link
Member

I did find a document here that says the default is 2. But this does not look like the location of official documentation.

https://mrkraimer.github.io/website/developerGuide/pvRequest/pvRequest.html (search for the word "queue").

@mp49
Copy link
Contributor Author

mp49 commented Jun 21, 2022

I'm still running tests and I'll post some results here.

However, I think we can't rely on this code snippet in pvaDriver to tell us how many images we lost:

if(!update->overrunBitSet->isEmpty())
        {
            int overrunCounter;
            getIntegerParam(PVAOverrunCounter, &overrunCounter);
            setIntegerParam(PVAOverrunCounter, overrunCounter + 1);
            callParamCallbacks();
        }

It assumes that we only lost 1 image if the overrunBitSet is not empty, but I think it tells us we lost 1 or more images.

We could use the NTNDArray uniqueId field instead?

@MarkRivers
Copy link
Member

I don't think you can use the UniqueID field, since there is no guarantee that the source of the NTNDArrays is sending you all of them, and that they will be in the correct order.

Mark

@mp49
Copy link
Contributor Author

mp49 commented Jun 22, 2022

That makes sense. In the NDPluginScatter case that you pasted above, the separate pvaDrivers would each only get a sub-set of images. I think I'll just compare the total sent and the total received, and not rely on the overrun counter.

There may be another way. Perhaps the NDPluginPva could make use of the userTag in the timestamp:

time_t dataTimeStamp 2022-06-21 18:19:54.828
long secondsPastEpoch 1655849994
int nanoseconds 827677249
int userTag 0

And then the pvaDriver would always expect that number to increment by 1.

@mp49
Copy link
Contributor Author

mp49 commented Jun 27, 2022

I've made a PR here: #13

The default is the same as before. This just deals with setting a different queueSize for the PVA request, which improves performance at high frame rates or on machines that are heavily loaded.

For example, with the standard queueSize, on a underpowered VM (2-cores) I run into problems at 100Hz frame rates even for tiny 128x128 UInt8 images, but with a queueSize=100 I can safely run at 700Hz (which is the maximum rate I can generate images on the same VM).

On a more powerful machine (8-cores), using the default queueSize, I only saw a few dropped images (out of several 1000) when running at 1700Hz (the max the simulation driver was able to run at). However when using queueSize=100 I did not see any dropped images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants