Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I am confused about the initElapsedS & readElapsedS. #173

Open
CKopoer opened this issue Nov 8, 2023 · 4 comments
Open

I am confused about the initElapsedS & readElapsedS. #173

CKopoer opened this issue Nov 8, 2023 · 4 comments

Comments

@CKopoer
Copy link

CKopoer commented Nov 8, 2023

So, As these lines shows, we get the initElapsedS & readElapsedS from the difference of each other. Is't a mistake or something meaningful I haven't understood?
image

Otherwise, I get the results on H800 using another closed-source NV-STREAM tool. It seems that it provided better bandwidth performance result compared with BabelStream because of the optimized block size parameters. What's more, it also and show Read & Write results. Could I take the Init_kernel as the Write result and read_arrays as Read result in BabelStream?
image
image

@CKopoer
Copy link
Author

CKopoer commented Nov 8, 2023

I think the time read_arrays function consumed actually depend on the PCIe bandwidth? That function copy the data to Host from device

@tom91136
Copy link
Member

Yes, Init and Read is a new thing we report that measures the setup and read-back time of the buffers.
BabelStream does not have an direct equivalent of the Read and Write kernel in NV-STREAM.

It seems that it provided better bandwidth performance result compared with BabelStream because of the optimized block size parameters.

What's the performance difference you're observing?

@gonzalobg
Copy link
Contributor

@CKopoer the time intervals used for Init and Read are incorrect.
#186 fixes it (among other things).

Could I take the Init_kernel as the Write result and read_arrays as Read result in BabelStream?

The Init and Read timings stem from a single measurement, and are not intended to be a measure of Read and Write bandwidth.

After #186 is merged, adding proper Read & Write bandwidth benchmarks is straightforward (although still quite a bit of work since they need to be added to all languages). I agree that these two help paint a more complete picture about the hardware than Copy.

@gonzalobg
Copy link
Contributor

Fixed by #186 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants