Changes to enable threadsafe operation #1064

oir · 2017-11-13T16:52:59Z

This PR includes changes to (optionally) enable threadsafe operation of dynet, providing the ability to run multiple dynet models within a single application, or executing a single dynet model over multiple data instances (computation graphs) concurrently.

This includes:

Disabling the restriction that one CG should exist at a time, and disabling the notion of staleness
Switching to a dynamic memory pool for memory allocation for tensors to avoid race conditions when managing memory for tensors / CGs from different threads.

Multithread data parallelism without copying a single model in memory works as follows:

Have a single dynet::ParameterCollection object shared between threads containing (physical) model parameters
Have multiple copies of a model object (e.g. LSTMBuilder). This copy causes copies of model parameters but that is okay because these are just shells that contain pointers to the same physical storage.
Each thread runs one copy of these models.

Main motivation was multithreaded inference, but possibly the changes might apply to training-time as well, similar to asynchronous SGD training (which I did not test).

My implementation is limited (and tested on) only the SimpleExecutionEngine (so no autobatch) and only for CPU devices.

neubig · 2017-12-20T15:37:40Z

Thanks! This is highly appreciated. I haven't been able to make the time to do a careful review yet, but it's upcoming.

danielhers · 2018-02-04T15:44:12Z

Multithreading support would be great, but even in a single thread, maintaining multiple computational graphs in parallel would help be a lot since it would enable model ensemble without having to reset the CG between querying different models. That would be great too!

… oir-threadsafe-rebase

neubig · 2018-03-23T15:07:07Z

Thanks again for contributing, and I'm super-sorry for taking so long to get to this! Here are a few comments/questions:

DynamicCPUMemoryPool: If I understand correctly, this is a direct parallel to InternalMemoryPool, correct? If so, shall we call them DynamicMemoryPool and StaticMemoryPool?
I've been wondering about this for a while: do you know how much going from static to dynamic memory allocation hurts performance?
Has this tested for compatibility with multi-device support?
Typo: dynet-dynemic-mem.
It seems this is missing checks for unsupported cases: dynamic-mem + autobatch, etc.
Could you rebase to the current master one more time? Things have gotten out of sync. I'll make sure that this can get incorporated before things get out of sync again, so I promise this will be the last rebase!

neubig · 2018-03-30T12:57:53Z

@oir FYI: If you're busy and don't have time to handle this we can pick things up and do the rest on our side. Of course if you're willing to help we'll be happy to have you.

oir · 2018-04-02T20:37:54Z

@neubig Hey! Sorry for the late response, I am willing to pick up. I will go through your comments and address them, as well as attempt a rebase, hopefully soon enough.

oir · 2018-05-30T16:27:06Z

@neubig To keep you updated: This week I am starting to look again at this (possibly alongside NAACL). We have noticed another minor issue with the PR which needs to be fixed (about guarding shared parameter pools), which will also be part of this PR after I do the rebase.

neubig · 2018-06-15T15:06:52Z

@oir Great, thanks!

kwalcock · 2020-06-08T15:12:27Z

Has any progress been made on this in the past two years?

MihaiSurdeanu · 2020-06-09T17:57:20Z

Hi @neubig and @oir: can you please let us know the status of this PR? We are very interested in the same issue. Can we help?

neubig · 2020-06-09T21:15:52Z

If my comments above could be addressed I'd be happy to merge a PR!

kwalcock · 2020-07-01T18:33:37Z

Please see also oir#1. It doesn't appear that the modifications are sufficient.

kwalcock · 2020-07-24T15:45:12Z

tests/test-exec-dynamic.cc

+  for (size_t t = 0; t < 4; ++t) { threads[t].join(); }
+  for (size_t t = 0; t < 4; ++t) {
+    for(size_t i = 1; i < results[t].size(); ++i) {
+      BOOST_CHECK_CLOSE(results[t][0], results[t][i], 0.0001);


This code never runs because results[t].size() is always 1. Thread safety is never tested (unless the test crashes, which it does).

kwalcock · 2020-07-24T16:04:21Z

The line

dynet/tests/test-exec-dynamic.cc

Line 144 in 2da4a05

BOOST_CHECK_CLOSE(results[t][0], results[t][i], 0.0001);

which is supposed to check whether the multi-threading functionality works, never seems to be executed. In the line just before it,

dynet/tests/test-exec-dynamic.cc

Line 143 in 2da4a05

for(size_t i = 1; i < results[t].size(); ++i) {

, results[t].size() is always 1, so the loop is not entered.

If the code is changed to

  for (size_t t = 1; t < threadCount; ++t) {
    BOOST_CHECK_CLOSE(results[0][0], results[t][0], 0.0001);
  }

the check will pass when the threads are all processed serially, which is not much of a surprise. When they are processed in parallel, the test crashes and the check is never performed.

The PR contains some promising code, but it does not appear to be usable/correct. It should not be merged.

Changes to enable threadsafe operation

2da4a05

neubig added 2 commits February 9, 2018 11:14

Merge branch 'threadsafe-rebase' of https://github.com/oir/dynet into…

61028b8

… oir-threadsafe-rebase

Merged master

8852d1c

This was referenced Mar 23, 2018

Can I use more than one Computational Graph in different threads? #678

Open

Does decoding process must memory the whole compute graph? #582

Open

neubig mentioned this pull request Apr 20, 2018

Are RNNs thread-safe? #1335

Open

kwalcock reviewed Jul 24, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes to enable threadsafe operation #1064

Changes to enable threadsafe operation #1064

oir commented Nov 13, 2017 •

edited

neubig commented Dec 20, 2017

danielhers commented Feb 4, 2018 •

edited

neubig commented Mar 23, 2018

neubig commented Mar 30, 2018

oir commented Apr 2, 2018

oir commented May 30, 2018

neubig commented Jun 15, 2018

kwalcock commented Jun 8, 2020

MihaiSurdeanu commented Jun 9, 2020

neubig commented Jun 9, 2020

kwalcock commented Jul 1, 2020

kwalcock Jul 24, 2020

kwalcock commented Jul 24, 2020 •

edited

Changes to enable threadsafe operation #1064

Are you sure you want to change the base?

Changes to enable threadsafe operation #1064

Conversation

oir commented Nov 13, 2017 • edited

neubig commented Dec 20, 2017

danielhers commented Feb 4, 2018 • edited

neubig commented Mar 23, 2018

neubig commented Mar 30, 2018

oir commented Apr 2, 2018

oir commented May 30, 2018

neubig commented Jun 15, 2018

kwalcock commented Jun 8, 2020

MihaiSurdeanu commented Jun 9, 2020

neubig commented Jun 9, 2020

kwalcock commented Jul 1, 2020

kwalcock Jul 24, 2020

Choose a reason for hiding this comment

kwalcock commented Jul 24, 2020 • edited

oir commented Nov 13, 2017 •

edited

danielhers commented Feb 4, 2018 •

edited

kwalcock commented Jul 24, 2020 •

edited