Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Builders leak ParameterCollectionStorage memory upon construction #1632

Open
kwalcock opened this issue Feb 10, 2021 · 6 comments
Open

Builders leak ParameterCollectionStorage memory upon construction #1632

kwalcock opened this issue Feb 10, 2021 · 6 comments

Comments

@kwalcock
Copy link
Contributor

In part due to the use of a pointer in ParameterCollection for ParameterCollectionStorage and the lack of assignment operator or copy constructor for ParameterCollection, RnnBuilders like a VanillaLSTMBuilder will leak memory during their construction. In particular at a line like

local_model = model.add_subcollection("vanilla-lstm-builder");
the old pointer will be overwritten and lost.

This can be demonstrated with a short program like this one extracted from train_rnn-autobatch.cc:

#include "dynet/lstm.h"
#include <iostream>
using namespace dynet;

int main(int argc, char** argv) {
  initialize(argc, argv);
  {
      unsigned int HIDDEN = 200;
      unsigned int EMBED_SIZE = 200;

      std::cout << "m: ";
      ParameterCollection m;
      std::cout << "fwR: ";
      VanillaLSTMBuilder fwR = VanillaLSTMBuilder(1, EMBED_SIZE, HIDDEN, m);
  }
  cleanup();
  return 0;
}

If some debugging output is added like these lines in model.cc,

ParameterCollection::ParameterCollection() : name("/"),
    storage(DYNET_NEW(ParameterCollectionStorage(default_weight_decay_lambda))),
    parent(nullptr) {

    std::cout
        << "Constructing1 " << this
        << " with parent " << parent
        << " with storage " << storage
        << std::endl;
}

ParameterCollection::ParameterCollection(const string & my_name, ParameterCollection* my_parent, float weight_decay_lambda) :
    name(my_name), storage(DYNET_NEW(ParameterCollectionStorage(weight_decay_lambda))), parent(my_parent) {

    std::cout
        << "Constructing2 " << this
        << " with parent " << parent
        << " with storage " << storage
        << std::endl;
}

ParameterCollection::~ParameterCollection() {
    std::cerr
        << "Destructing " << this
        << " with parent " << parent
        << " with storage " << storage
        << std::endl;

    if (parent == nullptr && storage != nullptr)
        delete storage;
}

and a note added to lstm.cc in VanillaLSTMBuilder::VanillaLSTMBuilder

  std::cout << "local_model: ";
  local_model = model.add_subcollection("vanilla-lstm-builder");

one can get output like this:

m: Constructing1 000000F8B4EFF8C0 with parent 0000000000000000 with storage 000002411AD0FD40
fwR: Constructing1 000000F8B4EFF9F8 with parent 0000000000000000 with storage 000002411AD0FF80
local_model: Constructing2 000000F8B4EFEEF0 with parent 000000F8B4EFF8C0 with storage 000002411AD101C0
Destructing 000000F8B4EFEEF0 with parent 000000F8B4EFF8C0 with storage 000002411AD101C0
Destructing 000000F8B4EFF9F8 with parent 000000F8B4EFF8C0 with storage 000002411AD101C0
Destructing 000000F8B4EFF8C0 with parent 0000000000000000 with storage 000002411AD0FD40

The temporary ParameterCollection displayed in the fwR line and stored in local_model will leak when local_model is overwritten with the value from model.add_subcollection("vanilla-lstm-builder");.

Probably the ParameterCollection should be made to assign and copy correctly. In this particular case, one can skip that and initialize local_model from the beginning by changing the constructor of VanillaLSTMBuilder to

VanillaLSTMBuilder::VanillaLSTMBuilder(unsigned layers, unsigned input_dim,
    unsigned hidden_dim, ParameterCollection& model, bool ln_lstm, float forget_bias) :
    // The initialization of local_model has been added here.
    local_model(model.add_subcollection("vanilla-lstm-builder")), layers(layers),
    input_dim(input_dim), hid(hidden_dim), ln_lstm(ln_lstm), forget_bias(forget_bias),
    dropout_masks_valid(false), _cg(nullptr) {

This results in the output

m: Constructing1 0000006F4795F890 with parent 0000000000000000 with storage 0000018CA3ED0150
fwR: Constructing2 0000006F4795F9C8 with parent 0000006F4795F890 with storage 0000018CA3ED0390
Destructing 0000006F4795F9C8 with parent 0000006F4795F890 with storage 0000018CA3ED0390
Destructing 0000006F4795F890 with parent 0000000000000000 with storage 0000018CA3ED0150

This will still result in a leak because of the condition on the destructor of the ParameterCollection:

    if (parent == nullptr && storage != nullptr)
        delete storage;

This parent doesn't seem to have much to do with anything here. Perhaps it was meant to work around other problems. Removing the condition parent == nullptr will prevent the leak in this case. I was a little more cautious and changed storage to be a shared pointer, instead. It probably helps in cases when ParameterCollections are copied.

This problem probably exists with all or most builders, but I haven't studied whether similar modification will be effective for all the others. Thanks for looking into this.

@kwalcock
Copy link
Contributor Author

Ping, just to see if anyone is watching/listening.

@MihaiSurdeanu, this is the PR I mentioned this morning. I was hoping to find out the repercussions before resyncing the forks, but it's not a requirement. For clulab deployment I have only considered the builders that we are using, while for clab probably all should be checked, so it's not (yet) just a merge of our changes. I'm also not sure whether clab will want to stick with a plain pointer or change to a smart pointer, etc.

@MihaiSurdeanu
Copy link

@neubig: I agree with @kwalcock that this should be merged in master for dynet. This seems to be a global problem.

@neubig
Copy link
Contributor

neubig commented Feb 16, 2021

Hi, thanks for checking this! I think that we considered this several years ago and ended up not fixing it for some reason, but honestly I don't remember why. I don't have time to go digging through the archives at the moment, but if the change passes all tests then I'll try to review it and accept the PR

@kwalcock
Copy link
Contributor Author

Do you mean any particular "the change"? Is it the one from several years ago?

@MihaiSurdeanu
Copy link

@kwalcock: my coreference resolution algorithm says that "the change" refers to this PR, no?

@kwalcock
Copy link
Contributor Author

This is just an issue filed. I don't have a PR that isolates this change or code that applies to every type of builder. It could be done, of course. I wouldn't trust it completely just because the unit tests pass. It would need close review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants