Invesitgate a less CPU intensive way of iterating over channels (for dumps) #183

gavin-norman-sociomantic · 2018-11-20T11:40:49Z

The CPU intensiveness of iterations leads to timeouts in reading apps. Because of this side-effect, we've always limited the frequency of dumps (once every 6 hours, say). It'd be nice to be able to dump more often.

gavin-norman-sociomantic · 2018-11-20T11:43:16Z

Comments from old issue:

Nemanja:

I wonder if fork and somehow "dumping" (like writing the content of tokiocabinet pages to the disk, not per record basis) the data would be the way to do it.

Gavin:

One possibility I thought of would be to use the "standard" (i.e. non-interruptible) TC iterator. Only one such iteration can be going at a time, but we could add a new request which is only issued by dhtdump.

Why would forking help?

Yeah, I also wondered about dumping the raw memory of the database. But then how to reload it would be a tricky question.

Nemanja:

Yes, the suggestion assumes having both dump & load methods.

It's the forking of the DHT process. It would allow you doing it while not interrupting your running server. The fork would make a snapshot of the current memory layout, and you could dump that as a snapshot - not having to worry about monitoring and dealing with changes (the only slowdown would be because of the copy-on-write having to duplicate pages that are modified), nor if the process takes too long (if it's done in the main thread, your application message pump will not react).

Gavin:

Ah yes I see. I was getting the copy-on-write thing backwards ;) I think forking the dhtnode is potentially dangerous, though. We'd need a guarantee that the memory of the two forks wouldn't diverge too much, otherwise the memory usage could grow and overfill the RAM.

Nemanja:

Yeah, this is something I also see as dangerous.
I see there are two methods:
https://github.com/clement/tokyo-cabinet/blob/master/tcutil.h#L832-L848
which serialize to and from the byte array.
For the serializing, you of course don't have that RAM, but there's memory mapped file to save you :-).
So, one hypothetical procedure could be:

Fork the DHT node process
See how much memory you need to serialize the map
Use mmap to map the file of the given size into the memory and get it's address into the memory
Run tcmapdump passing pointer obtained in 3
Call fsync on this file to force all pages to be written back to the disk
Exit the forked process.

Oh: Because the region of the return value is allocated with the malloc' call, it should be released with the free' call when it is no longer in use. */ Yeah, this method doesn't work :-)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invesitgate a less CPU intensive way of iterating over channels (for dumps) #183

Invesitgate a less CPU intensive way of iterating over channels (for dumps) #183

gavin-norman-sociomantic commented Nov 20, 2018

gavin-norman-sociomantic commented Nov 20, 2018