Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend C-Bindings? #70

Open
ifadams opened this issue Jun 1, 2018 · 6 comments
Open

Extend C-Bindings? #70

ifadams opened this issue Jun 1, 2018 · 6 comments

Comments

@ifadams
Copy link

ifadams commented Jun 1, 2018

Hi,

Are there any plan to extend the C-Binding support, e.g. highywayhash256 with cat/append capabilities?

Thanks!

@jan-wassenberg
Copy link
Member

Hi, thanks for reaching out. We could extend them whenever someone needs them. Is speed a concern? If not, there's already the simple C version in the c/ directory which includes cat.
Otherwise, if you're willing to extend the bindings, I'd be happy to review and integrate.

@ifadams
Copy link
Author

ifadams commented Jun 15, 2018

Hi Jan,

Sorry for the slow response.

Speed is indeed a concern, we originally used exactly the ones you pointed out.

Since then original question we have made some hack-ey C bindings wrapped around the C++ but so far are drastically under-performing. We're definitely hitting the avx instructions (confirmed by with valgrind/callgrind), and getting correct hash values back, so I'm not sure what we're doing wrong. We've got aligned allocations in the buffers, and I've seen similarly bad performance when I just use the native CPP class, so I'm clearly doing something wrong.

Any common "gotchas" that you might know of for us to look at would be much appreciated!

Thanks again.

Ian

@jan-wassenberg
Copy link
Member

jan-wassenberg commented Jun 17, 2018

Hi Ian, no worries!

I've seen similarly bad performance when I just use the native CPP class, so I'm clearly doing something wrong.

I'm not sure I understand the problem - are you saying both the C and C++ versions are equally slow? What's the basis for comparison - the published benchmark results?

Some ideas: is the InstructionSets dispatcher involved? Especially with Cat, the app-specific code generating data and the calls to Cat really should be inlined together with only a single call to the dispatcher (instead of once per Append).

Somewhat related: compilers didn't seem able to keep the hash state in registers across calls to Append, so we're loading 128 bytes for every call. Might be even worse on an older/non-Clang compiler.

Also, depending on how the C++ code and its wrapper are compiled, we might get VZEROUPPER after every function call. Does it help to enable -mavx2 in all translation units?

@ifadams
Copy link
Author

ifadams commented Jun 18, 2018

Hi Jan,

Right now, The C++ version with the -mavx2 flag set is working significantly slower than the pure C implementation when we're streaming file-data through them. Our measurements are just a crude timer for MBPS throughput, and getting something like <100MBPS using the C++ libraries, and we're pulling from the pagecache, so filethroughput is >5GBPS, so thats not the issue.

That said included benchmark is a bit slower than the results published on the README, but not so much that it can explain that. I can provide detailed numbers if thats helpful.

We'll take a look at the stuff you suggested.

Thanks again!

Ian

@jan-wassenberg
Copy link
Member

Oh, that's a surprise. We aren't using Cat in time-critical apps yet, and I did have trouble with the state not being kept in vector registers, but I'm still surprised it's slower than C.
Would you be able to share disassembly of the relevant parts to see where we're running into trouble?

@ifadams
Copy link
Author

ifadams commented Jul 11, 2018

Hi Jan,

As usual, we get sidetracked before circling around, apologies. I'll see if we can do that. Not sure we can (heavy-weight bureaucracy on our end even in innocuous cases) but worth a shot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants