Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance measure using hash program #208

Open
Jeremy-Jia opened this issue Oct 13, 2023 · 8 comments
Open

Performance measure using hash program #208

Jeremy-Jia opened this issue Oct 13, 2023 · 8 comments

Comments

@Jeremy-Jia
Copy link

Hi,i want to measure nyuzi performance when config to more than 1 core(4 thread per core),
when i using the hash benchmark program and config nyuzi to 8 core,the output “cycles per hash” is my higher than single core, why?
And if i want to measure multicore performance, what should i do?

thanks.

@jbush001
Copy link
Owner

Are you running this in Verilog simulation? Could you attach the command line you used and perhaps the diffs of what you changed to increase the core count, just so I can be sure I understand your configuration correctly? Thanks!

@Jeremy-Jia
Copy link
Author

Yes, I'm using verilog simulation.
I just change the NUM_CORES to 8, no other changes,
and run ./run_vcs in benchmarks/hash/ folder

@jbush001
Copy link
Owner

Got it. It's been a while since I worked on this project, so I need to refresh my memory, but I'm afraid I can't think of a reason this should be off the top of my head. I'll need to look into this.

@jbush001
Copy link
Owner

Maybe a quick experiment: what happens with 2 cores? What are the corresponding cycle counts in each case (like is it close to a clean integer multiple)?

My first question would be whether this is some artifact of the test configuration that is not synchronizing correctly (and thus misreporting the count), or if you are actually running into some kind of memory saturation issue where the performance is decreasing because of cache thrashing.

@Jeremy-Jia
Copy link
Author

  1. I tried 1, 2, 4, and 8, but the performance dropped once.
  2. Is it related to this problem? I don’t know the reason: assert failed during simulation: refesh_delay should < MAX_REFESH_INTERNAL

@jbush001
Copy link
Owner

Can you clarify what you mean by "performance dropped once"? (One time? Once you were above a certain number of cores?)
The refresh_delay assertion is probably not related (but kind of interesting, as I haven't seen that one).

@Jeremy-Jia
Copy link
Author

Sorry, performance dropped once means The more cores used, the greater the performance degradation(cycles per hash is high)

@jbush001
Copy link
Owner

jbush001 commented Oct 14, 2023

Oops, I see the problem :)
The total number of hashes performed is hard coded here (256):

printf("%g cycles per hash\n", (float) endTime / 256);

Because there are 16 vector lanes, four threads, and each thread does four hashes = 16 * 4 * 4 = 256. When you increase the number of cores, the total hashes that are being done increases, but this is still assuming it is fixed. The latency for each thread is going to increase because there's more memory contention, but this calculation is not accounting for the fact that the throughput has increased.

One each fix might be to add another global variable gTotalThreads and do a __sync_fetch_and_add at the top:

    __sync_fetch_and_add(&gActiveThreadCount, 1);
+    __sync_fetch_and_add(&gTotalThreads, 1);

Then use that to compute the total number of operations done:

 printf("%g cycles per hash\n", (float) endTime / (4 * gTotalThreads * 16));

(looking at this now, it should probably use constant variables for the number of iterations each thread takes and number of vector lanes for clarity instead of hard coding the numbers). I hope that helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants