Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding mutex protection in Handler4 Handler6 to avoid concurrent access #110

Open
gadrenayan opened this issue Jul 12, 2020 · 14 comments
Open

Comments

@gadrenayan
Copy link

Hi,

I have been using the coredhcp server for some learning purpose and I was doing some performance tests using
"perfdhcp" tool. I saw there were race conditions hit in the Handler4 where there were concurrent read write access to the map of lease records. I now know that map is not good for concurrent access. So I tested by adding a minor change of locking this path with a &sync.mutex. And I have not hit the race since.

Have you gone through the concurrency consideration in any future releases ?

I saw that the server either simply fails to process any requests when the race hits, or performs smoothly for the entire duration of the tests.
[2020-07-13T01:00:13+05:30] INFO plugins/range: found IP address 192.168.245.54 for MAC 00:0c:01:02:03:04
concurrent map writes
fatal error: concurrent map writes

root@ngadre-Inspiron-3542:~/go/src/github.com/coredhcp# perfdhcp -l veth11
Running: perfdhcp -l veth11
^CRate statistics
Rate: 0 4-way exchanges/second
Statistics for: DISCOVER-OFFER
sent packets: 498055
received packets: 0
drops: 498055

min delay: inf ms
avg delay: Delay summary unavailable! No packets received.

Statistics for: REQUEST-ACK
sent packets: 0
received packets: 0
drops: 0

min delay: inf ms
avg delay: Delay summary unavailable! No packets received.

With Mutex protection:

root@ngadre-Inspiron-3542:~/vethDHCP# perfdhcp -l veth11
Running: perfdhcp -l veth11
^CRate statistics
Rate: 3360.67 4-way exchanges/second
Statistics for: DISCOVER-OFFER
sent packets: 85109
received packets: 85106
drops: 3

min delay: 0.091 ms
avg delay: 1.201 ms
max delay: 34.907 ms
std deviation: 3.124 ms
collected packets: 1

Statistics for: REQUEST-ACK
sent packets: 85106
received packets: 85036
drops: 70

min delay: 0.096 ms
avg delay: 1.008 ms
max delay: 28.306 ms
std deviation: 2.407 ms
collected packets: 70

@insomniacslk
Copy link
Member

Thanks @gadrenayan , this is very useful! We should definitely fix this issue. I'm wondering what would the performance hit of a mutex be, and I would like to explore an alternative approach with goroutine and channels to do this in a lockless manner. I'll run some experiment this week

@insomniacslk
Copy link
Member

Just to be sure - are we talking about this map in the file plugin? https://github.com/coredhcp/coredhcp/blob/master/plugins/file/plugin.go#L53

@insomniacslk
Copy link
Member

I looked better at the logs and I assume it's actually in the range plugin, https://github.com/coredhcp/coredhcp/blob/master/plugins/range/plugin.go#L45 . We should fix this in multiple plugins, and possibly wrap this in a way that other plugins can use without reimplementing the same buggy code

@insomniacslk
Copy link
Member

Uh, maybe this could be handled with the Allocator in #108? (I haven't looked at that code yet though)

@insomniacslk
Copy link
Member

no, sorry - it seems to be a different thing, for managing address space allocations

@gadrenayan
Copy link
Author

I looked better at the logs and I assume it's actually in the range plugin, https://github.com/coredhcp/coredhcp/blob/master/plugins/range/plugin.go#L45 . We should fix this in multiple plugins, and possibly wrap this in a way that other plugins can use without reimplementing the same buggy code

I have tried replacing the existing Records "map" to a "sync.Map" which is mentioned as concurrent safe in the docs, however, I could not seamlessly integrate it due to few APIs missing, but I guess the only point of contention is the Handlerv4 and Handlerv6, so adding a mutex worked for me. Also, we are saving the IP-MAC in the lease file in the same context, which needs to be done together with the sync.Map.Store() method, so both operations anyways need a lock, hence, I did not decide to use sync.Map.

Reading of the records can happen in parallel for all goroutines. We are expecting client-HWaddr to be different for different clients, even in case of a relay-agent.

@gadrenayan
Copy link
Author

@insomniacslk Also, I was checking your channel and **goroutine** suggestion, would it be better if we do the "saveIPAddress" using the goroutines, as anyways we are not handling any errors in "saveIPAddress" so that can speedup the response.

@gadrenayan
Copy link
Author

@Natolumin @insomniacslk I wanted to ask, if the storage of the IP address in lease files and in the MAC-IP RecordsV4 happens on an OFFER or on an ACK. There can be scenarios where there are multiple DHCP servers sending OFFERS to client, however client will choose only one of the servers, in that case the lease file entry would have been wasted for such a client.

@Natolumin
Copy link
Member

Honest answer here is that we haven't really thought on that question before, and the current implementation of range definitely has the issue you describe

One way I've been thinking of handling this, is to reserve the ip on OFFER, but expiring immediately (or after a short timeout of the order of a few seconds). Then on ACK, you find an expired allocation, and extend and offer it for the actual duration of the lease. If you never go to ACK, then the IP is free for reuse at some other point.

@gadrenayan
Copy link
Author

Hi @Natolumin I think this is getting added through this but in @insomniacslk library.

@Natolumin
Copy link
Member

No, it's completely unrelated; we don't use either client/server from the library anymore (and the PR there is about a client behavior)

However, as part of #111 and #108 I will probably rework range to also use the allocator and lease storage interfaces (so we can have an example of a real usecase), which should take care of this.

In the mean time if you have a simple fix for range please send it as a PR; but don't invest too much time in it, we'll have to rewrite it anyway, esp for DHCPv6 support

@gadrenayan
Copy link
Author

Hi @Natolumin I have raised request with my org for any conflict of interest.
You can close this Issue, will commit after I receive permit from my organisation.

@gadrenayan
Copy link
Author

Hi, Sorry for further bothering, do we have a way to expire the lease in the server, I see that we record the lease, but there is no timer installed.

@Natolumin
Copy link
Member

here:

if v.IP.String() == ip.String() && (v.expires.After(time.Now())) {
we check for expired leases in checkIfTaken
However, we never delete the entries from the map (which shouldn't change anything for the client, but will result in more memory usage I suppose)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants