Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't load the model! #11

Open
Evraa opened this issue Oct 19, 2021 · 19 comments
Open

Can't load the model! #11

Evraa opened this issue Oct 19, 2021 · 19 comments

Comments

@Evraa
Copy link

Evraa commented Oct 19, 2021

Greetings,

Actually I'm surprised that such an error came up, my problem lies with this line

model = torch.load("models/DialoFlow_large/model.bin")

model.bin is placed appropriately, and EC2 works with cuda 11.2 and pytorch = 1.9.

Where would the problem come from?

Thanks in advance

@Evraa
Copy link
Author

Evraa commented Oct 19, 2021

Error:

Traceback (most recent call last):
File "generate.py", line 283, in
model = torch.load("models/DialoFlow_large/model.bin")
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/serialization.py", line 608, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/serialization.py", line 787, in _legacy_load
result = unpickler.load()
ModuleNotFoundError: No module named 'transformers.modeling_gpt2'

@lizekang
Copy link
Collaborator

You can try to lower the version of transformers. You can try transformers==3.1.0.

@Evraa
Copy link
Author

Evraa commented Oct 20, 2021

Thank you for your fast reply.

It worked.
But got stuck in another problem concerning versions, I guess.

Current versions:
torch = 1.7.0
transformers = 3.1.0
pickle = 4.0
regex = 2.5.103

Error:
Traceback (most recent call last):
File "generate.py", line 283, in
model = torch.load("model.bin")
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/serialization.py", line 595, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/serialization.py", line 774, in _legacy_load
result = unpickler.load()
ModuleNotFoundError: No module named '_regex'

@Evraa
Copy link
Author

Evraa commented Oct 20, 2021

Please, don't refer me to issue #2 .
It didn't work for me, also it's in Chinese, and I'm not familiar with the language :'D.

thank you

@lizekang
Copy link
Collaborator

You can try regex==2018.1.10. I should work.

@Evraa
Copy link
Author

Evraa commented Oct 20, 2021

Thank you very much.

Could you please tell me how to structure test.refs.txt, is it just sentences separated by '\n' ?
and what is the minimum/maximum number of utterances allowed?

thank you in advance

@lizekang
Copy link
Collaborator

The structure is like the following:
utterance1 EOS utterance2 EOS utterance3 \t reference1 \t reference 2 \t reference 3 \t\n
There is no specific constraint for minimum/maximum number of utterances.

@Evraa
Copy link
Author

Evraa commented Oct 20, 2021

I'm sorry, what are "reference" ?
Could you provide an example?

@Evraa
Copy link
Author

Evraa commented Oct 20, 2021

Also, this error came up!

Traceback (most recent call last):
File "generate.py", line 298, in
hypstr = beam_search(history, tokenizer, model, args)
File "generate.py", line 215, in beam_search
delta = work_delta(model, conv_seq, sentence_idx, token_type_seq)
File "generate.py", line 95, in work_delta
conv_hidden_state = model.speak_model(conv_seq, token_type_ids=token_type_seq)[0]
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/modeling_gpt2.py", line 527, in forward
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/configuration_utils.py", line 219, in use_return_dict
return self.return_dict and not self.torchscript
AttributeError: 'GPT2Config' object has no attribute 'return_dict'

@lizekang
Copy link
Collaborator

In the dialogue dataset, there are many possible responses. The responses collected in advance are the references.
For example:
utterance1: What's your hobby?
reference1: I like basketball.
reference2: Reading. What about you?
reference3: Tell me yours first.

@lizekang
Copy link
Collaborator

For the error, I never meet this. Maybe you can try downgrade the transformers to 3.0.0 or 2.7.0. I'm not sure.

@Evraa
Copy link
Author

Evraa commented Oct 25, 2021

Worked with transformers version 3.0.0
Not 2.7.0 nor 3.1.0

One last question, what exactly does "generate.py" script produce?
taking Dialogue and some reference responses .. what exactly is the output hypstr[0]?

thanks in advance

@Evraa
Copy link
Author

Evraa commented Oct 25, 2021

Follow up question, the variable "responses" in "generate.py" ln: 293.
What is the purpose of it?

@dongqian0206
Copy link

Hi Zekang,

Thanks for providing the source code.

I just followed this post. Suppose I don't want to use the older version of transformers, due to python environment issue, then pre-training DialoFlow based on GPT-2 using DailyDialog is also possible, although the results would be worse than the reported ones. Right? The effectiveness of DialoFlow is independent of if it was pre-trained on Reddit dataset.

Thanks in advance.

Best,

Dong

@dongqian0206
Copy link

dongqian0206 commented Nov 18, 2021

Follow up question, the variable "responses" in "generate.py" ln: 293. What is the purpose of it?

Hi Evram,

Based on my understanding, the evaluation task is to generate a response given a context. For example,

context = [utterance1], response = [[utterance2], [utterance3], [utterance4], ....].

I have another question regarding the 'generation.py' script and to see if you are willing to answer it. Given speaker1's utterance, how do we know which utterances correspond to speaker2 and speaker1's next response? As it is multi-turn dialogue generation, I suppose the output should contain more than one utterance.

Please see the following example, where a context, ground-truth responses, and generated responses are shown. For the generated responses, the correspondence is not clear to me.

["We've managed to reduce our energy consumption in our factory by about 15 per cent in the last two years ."]
["That's excellent . How have you managed that ?", "Mainly because we've invested in a heat recovery system .", 'What does that mean exactly ?', 'Well , we use the exhaust gases from our printing presses to provide energy to heat our dryers .', 'What other sources of energy do you use ?', "We don't use any fossil fuels . Most of our power comes from hydro-electric plants . We're hoping to use even more energy from alternative sources in the future - perhaps even wind power ."]
["Does that mean that we can't afford to pay for more? We can't afford to pay for more than we can afford. Why not? We can't afford to pay for more. Why can't we? We can't afford to pay for more."]

As the input indices are [speaker1, text1, eos, empty, speaker2, text2, eos, empty], one potential way is to comment off the following code in the 'generation.py' script. But I am not sure if it is correct or not.

# if o in [eos, empty, speaker1, speaker2]:
#     continue

Looking forward to hearing from you @Evraa, as well as @lizekang.

Best,

Dong

@lizekang
Copy link
Collaborator

Hi Zekang,

Thanks for providing the source code.

I just followed this post. Suppose I don't want to use the older version of transformers, due to python environment issue, then pre-training DialoFlow based on GPT-2 using DailyDialog is also possible, although the results would be worse than the reported ones. Right? The effectiveness of DialoFlow is independent of if it was pre-trained on Reddit dataset.

Thanks in advance.

Best,

Dong

Hi, sorry for the late response. The effectiveness of DialoFlow is independent. It should be also effective without pre-training on Reddit dataset.

@dongqian0206
Copy link

Hi Zekang,
Thanks for providing the source code.
I just followed this post. Suppose I don't want to use the older version of transformers, due to python environment issue, then pre-training DialoFlow based on GPT-2 using DailyDialog is also possible, although the results would be worse than the reported ones. Right? The effectiveness of DialoFlow is independent of if it was pre-trained on Reddit dataset.
Thanks in advance.
Best,
Dong

Hi, sorry for the late response. The effectiveness of DialoFlow is independent. It should be also effective without pre-training on Reddit dataset.

OK. Thanks a lot!

@lizekang
Copy link
Collaborator

Follow up question, the variable "responses" in "generate.py" ln: 293. What is the purpose of it?

Hi Evram,

Based on my understanding, the evaluation task is to generate a response given a context. For example,

context = [utterance1], response = [[utterance2], [utterance3], [utterance4], ....].

I have another question regarding the 'generation.py' script and to see if you are willing to answer it. Given speaker1's utterance, how do we know which utterances correspond to speaker2 and speaker1's next response? As it is multi-turn dialogue generation, I suppose the output should contain more than one utterance.

Please see the following example, where a context, ground-truth responses, and generated responses are shown. For the generated responses, the correspondence is not clear to me.

["We've managed to reduce our energy consumption in our factory by about 15 per cent in the last two years ."]
["That's excellent . How have you managed that ?", "Mainly because we've invested in a heat recovery system .", 'What does that mean exactly ?', 'Well , we use the exhaust gases from our printing presses to provide energy to heat our dryers .', 'What other sources of energy do you use ?', "We don't use any fossil fuels . Most of our power comes from hydro-electric plants . We're hoping to use even more energy from alternative sources in the future - perhaps even wind power ."]
["Does that mean that we can't afford to pay for more? We can't afford to pay for more than we can afford. Why not? We can't afford to pay for more. Why can't we? We can't afford to pay for more."]

As the input indices are [speaker1, text1, eos, empty, speaker2, text2, eos, empty], one potential way is to comment off the following code in the 'generation.py' script. But I am not sure if it is correct or not.

# if o in [eos, empty, speaker1, speaker2]:
#     continue

Looking forward to hearing from you @Evraa, as well as @lizekang.

Best,

Dong

For the corresponding to different speakers, there are two ways: 1) we insert special tokens like speaker1 and speaker2. 2) we use different segment embeddings for different speakers (function build_input_from_input generate.py).

For this question, during the generation, we don't want to see the model generates special tokens inside the response.

# if o in [eos, empty, speaker1, speaker2]:
#     continue

@dongqian0206
Copy link

dongqian0206 commented Nov 18, 2021

Follow up question, the variable "responses" in "generate.py" ln: 293. What is the purpose of it?

Hi Evram,
Based on my understanding, the evaluation task is to generate a response given a context. For example,

context = [utterance1], response = [[utterance2], [utterance3], [utterance4], ....].

I have another question regarding the 'generation.py' script and to see if you are willing to answer it. Given speaker1's utterance, how do we know which utterances correspond to speaker2 and speaker1's next response? As it is multi-turn dialogue generation, I suppose the output should contain more than one utterance.
Please see the following example, where a context, ground-truth responses, and generated responses are shown. For the generated responses, the correspondence is not clear to me.

["We've managed to reduce our energy consumption in our factory by about 15 per cent in the last two years ."]
["That's excellent . How have you managed that ?", "Mainly because we've invested in a heat recovery system .", 'What does that mean exactly ?', 'Well , we use the exhaust gases from our printing presses to provide energy to heat our dryers .', 'What other sources of energy do you use ?', "We don't use any fossil fuels . Most of our power comes from hydro-electric plants . We're hoping to use even more energy from alternative sources in the future - perhaps even wind power ."]
["Does that mean that we can't afford to pay for more? We can't afford to pay for more than we can afford. Why not? We can't afford to pay for more. Why can't we? We can't afford to pay for more."]

As the input indices are [speaker1, text1, eos, empty, speaker2, text2, eos, empty], one potential way is to comment off the following code in the 'generation.py' script. But I am not sure if it is correct or not.

# if o in [eos, empty, speaker1, speaker2]:
#     continue

Looking forward to hearing from you @Evraa, as well as @lizekang.
Best,
Dong

For the corresponding to different speakers, there are two ways: 1) we insert special tokens like speaker1 and speaker2. 2) we use different segment embeddings for different speakers (function build_input_from_input generate.py).

For this question, during the generation, we don't want to see the model generates special tokens inside the response.

# if o in [eos, empty, speaker1, speaker2]:
#     continue

Thank you for your prompt reply and to see if I can explain my confusion clearly.

For the corresponding to different speakers, there are two ways: 1) we insert special tokens like speaker1 and speaker2. 2) we use different segment embeddings for different speakers (function build_input_from_input generate.py).

I agree on this part. During training, the input index is defined as [speaker1, text1, eos, empty, speaker2, text2, eos, empty], which shows the correspondence.

But it might be different from that during generation. For example, given the first utterance from speaker1, the input index [speaker1, text1, eos, empty] is provided, as well as the segment index.

Please see the segment result from your code, where '0' corresponds to speaker1 and '1' corresponds to speaker2. Longer outputs exhibit a similar pattern (just more 1's).
tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])

My confusion: as this task is multi-turn dialogue generation, the output is essentially one-round dialogue. The model doesn't know how to further switch the speaker identity, while it knows 0 -->1, as the information is provided by the user.

if len(conv) % 2 == 1:
   current_output = [speaker2]
else:
   current_output = [speaker1]

During training, ground-truth sequences are provided, while during generation, sequences are generated in an autoregressive manner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants