Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invesitgate a less CPU intensive way of iterating over channels (for dumps) #183

Open
gavin-norman-sociomantic opened this issue Nov 20, 2018 · 1 comment

Comments

@gavin-norman-sociomantic

The CPU intensiveness of iterations leads to timeouts in reading apps. Because of this side-effect, we've always limited the frequency of dumps (once every 6 hours, say). It'd be nice to be able to dump more often.

@gavin-norman-sociomantic
Copy link
Author

Comments from old issue:

Nemanja:

I wonder if fork and somehow "dumping" (like writing the content of tokiocabinet pages to the disk, not per record basis) the data would be the way to do it.

Gavin:

One possibility I thought of would be to use the "standard" (i.e. non-interruptible) TC iterator. Only one such iteration can be going at a time, but we could add a new request which is only issued by dhtdump.

Why would forking help?

Yeah, I also wondered about dumping the raw memory of the database. But then how to reload it would be a tricky question.

Nemanja:

Yes, the suggestion assumes having both dump & load methods.

It's the forking of the DHT process. It would allow you doing it while not interrupting your running server. The fork would make a snapshot of the current memory layout, and you could dump that as a snapshot - not having to worry about monitoring and dealing with changes (the only slowdown would be because of the copy-on-write having to duplicate pages that are modified), nor if the process takes too long (if it's done in the main thread, your application message pump will not react).

Gavin:

Ah yes I see. I was getting the copy-on-write thing backwards ;) I think forking the dhtnode is potentially dangerous, though. We'd need a guarantee that the memory of the two forks wouldn't diverge too much, otherwise the memory usage could grow and overfill the RAM.

Nemanja:

Yeah, this is something I also see as dangerous.
I see there are two methods:
https://github.com/clement/tokyo-cabinet/blob/master/tcutil.h#L832-L848
which serialize to and from the byte array.
For the serializing, you of course don't have that RAM, but there's memory mapped file to save you :-).
So, one hypothetical procedure could be:

  1. Fork the DHT node process
  2. See how much memory you need to serialize the map
  3. Use mmap to map the file of the given size into the memory and get it's address into the memory
  4. Run tcmapdump passing pointer obtained in 3
  5. Call fsync on this file to force all pages to be written back to the disk
  6. Exit the forked process.

Oh: Because the region of the return value is allocated with the malloc' call, it should be released with the free' call when it is no longer in use. */ Yeah, this method doesn't work :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant