[Text Generation] Debug a Text Generator #681

SatishDeshbhratar · 2022-03-27T01:42:41Z

Hi,

I want to debug a text generator, I am using two fine-tuned models facebook/bart-large-cnn and human-centered-summarization/financial-summarization-pegasus .

I am following this link for the same https://pair-code.github.io/lit/tutorials/generation/, as the link suggests t5 models, can I use this code file for my fine-tuned models(as I am not able to import any other model other than t5). If yes, can I have any reference code file for the same?

I was also following this link for adding my own model 'https://github.com/PAIR-code/lit/wiki/api.md#adding-models-and-data',
but I am working on a text summarization task, so how can I replace the following parameters with my specifiation-
def init(self, path):
# Read the eval set from a .tsv file as distributed with the GLUE benchmark.
df = pandas.read_csv(path, sep='\t')
# Store as a list of dicts, conforming to self.spec()
self._examples = [{
'premise': row['sentence1'], --> In my case input text
'hypothesis': row['sentence2'], --> Dont require hypothesis
'label': row['gold_label'], --> In my case there will be an output summary
'genre': row['genre'], --> Dont require
} for _, row in df.iterrows()]

The text was updated successfully, but these errors were encountered:

jameswex · 2022-03-28T13:01:03Z

The generations tutorial link follows along with some demo code we wrote that uses a t5 model for text generation, but the concepts in the UI will be the same regardless of what model architecture your text generation models you are using in LIT.

You are correct that you will want to define new LIT Dataset and Model classes for your specific dataset and models that you wish to use in LIT. As per the documentation, you specify for the LIT Dataset what fields will exist in the dataset and what their names will be. If your data just contains a single field called "input text" then have your Dataset spec have a single entry in its dictionary with name "input text" and its value of type TextSegment. Then set your self._examples to a list of dicts with each dict for each input having that single key "input text" and the value being the string from the dataset being loaded. For your model, its input spec can also be the same as the spec for the dataset (just one TextSegment, with name "input text"), and have the output be of type "GeneratedText", as our example T5 model shows its its code. You can define the predict_minibatch function to do whatever it needs to do to get model predictions from your model and return the generated text, similar to our T5 example.

SatishDeshbhratar · 2022-03-29T00:15:10Z

The generations tutorial link follows along with some demo code we wrote that uses a t5 model for text generation, but the concepts in the UI will be the same regardless of what model architecture your text generation models you are using in LIT.

You are correct that you will want to define new LIT Dataset and Model classes for your specific dataset and models that you wish to use in LIT. As per the documentation, you specify for the LIT Dataset what fields will exist in the dataset and what their names will be. If your data just contains a single field called "input text" then have your Dataset spec have a single entry in its dictionary with name "input text" and its value of type TextSegment. Then set your self._examples to a list of dicts with each dict for each input having that single key "input text" and the value being the string from the dataset being loaded. For your model, its input spec can also be the same as the spec for the dataset (just one TextSegment, with name "input text"), and have the output be of type "GeneratedText", as our example T5 model shows its its code. You can define the predict_minibatch function to do whatever it needs to do to get model predictions from your model and return the generated text, similar to our T5 example.

I have tried creating this, but I am getting some issues where I am stucked from long time, can you help identify the issue ?
I have shared the colab notebook feel free to edit it or comment if I have made any mistakes.
https://drive.google.com/file/d/1E1Iwr-vMFO11D3RRQIzTgk34ZvlGPn-_/view?usp=sharing

jameswex · 2022-03-29T12:40:22Z

Thanks for sharing. The first issue you are running into is that your self.tokenizer and self.model are swapped. self.model should be the BartForConditionalGeneration, not the BartTokenizer. If you fix that, from my test, you'll run into another error down in the call to batch_encode_plus, but I'm not an expert at these tokenizers/models, so I'm not sure of the root cause of that issue with your code.

SatishDeshbhratar · 2022-03-30T17:35:54Z

Is there anyone else in the team who can help, I am trying to work out this issue but I am not progressing. Or is there an example/reference of summarization because t5 demo is not working for me?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Text Generation] Debug a Text Generator #681

[Text Generation] Debug a Text Generator #681

SatishDeshbhratar commented Mar 27, 2022 •

edited

jameswex commented Mar 28, 2022

SatishDeshbhratar commented Mar 29, 2022

jameswex commented Mar 29, 2022

SatishDeshbhratar commented Mar 30, 2022

[Text Generation] Debug a Text Generator #681

[Text Generation] Debug a Text Generator #681

Comments

SatishDeshbhratar commented Mar 27, 2022 • edited

jameswex commented Mar 28, 2022

SatishDeshbhratar commented Mar 29, 2022

jameswex commented Mar 29, 2022

SatishDeshbhratar commented Mar 30, 2022

SatishDeshbhratar commented Mar 27, 2022 •

edited