New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TAPAS unable to use weak supervision labels to finetune #141
Comments
Hi,
The TAPAS model used for WTQ is a model that learns to predict an operation
(aggregation_function) and to select cells over which to apply the
operation:
Example:
1- The models predict SUM aggregation_function and the cell coordinates
(0,1) (0,2). The output: ~SUM((0,1) (0,2))
2- The models predict NONE operation and the coordinates (0,1). The output:
text from cell(0,1)
In case of aggregation, the model expects the list of coordinates (row
index , column index) of all the cells to aggregate.
You need to fill the different interaction proto field related to
aggregation
1- repeated AnswerCoordinate answer_coordinates
message AnswerCoordinate {
optional int32 row_index = 1;
optional int32 column_index = 2;
}
(In tf example represented by the feature called "label_ids".)
2- optional float float_value: contains the result of the aggregation
3- optional AggregationFunction aggregation_function: contains the
aggregation type: NONE/ SUM/ AVERAGE or COUNT
- Make sure that the 3 fields are filled otherwise the model won't learn
the aggregation loss: In this case the model will try to find an answer
text from the table. (The default aggregation is NONE, The default
float_value=0)
- If no coordinates are provided label_ids contains 0 values (the default
values). If label_ids contains 0 that would create a mask and no
aggregation loss is computed.
Question/comment: "currently, over 72% of the samples do not have any
predicted answer coordinates":
- I suspect that your model is trying to learn to select an answer form the
table and not applying an aggregation. In case no answer text from table
cell or no list of coordinates are provided the model will learn to predict
no answer and empty list of coordinates.
Question/comment: "I have tried passing the labels tensor to be all zeros
as try but that makes the model learn to not select any column"
That's the expected behavior. I can suggest a method that you can try: you
can try to implement a heuristic that finds candidate answers to pass to
the model: In this case the model will be limited by the heuristic errors
but at least it would learn an aggregation loss.
Thanks,
Syrine
…On Thu, Sep 30, 2021 at 6:05 AM shabbir ***@***.***> wrote:
I am trying to Fine-tune the pretrained TAPAS WTQ model on a custom
dataset. I have used both Hugging face Pytorch code and Tensorflow code
present on github. My dataset has majority of samples with arithmetic
operations, so they rely on scalar answer as supervision. The details of
the problem faced with both codes are described below:
1.
Tensorflow
The model gets trained and saves the intermediate checkpoints, I used
different checkpoints to do inference on the test data.
As the training progresses more and more samples predicted
co-ordinates come to be an empty list.
currently, over 72% of the samples do not have any predicted answer
coordinates while on zero-shot setting on 7% were coming to be empty.
So the conclusion is TAPAS model is not able to learn from weak
supervision signals of the dataset in use.
2.
Pytorch
- As we do not have the answer coordinates available, the coordinates
are predicted by the utility by computing the cost matrix in the utility
provided.
- However, the utility returns 'None' as the result when it does not
find a matching candidate from the table (which is the case whenever the
answer is the result of an aggregation operation over cell values)
- Now, if we pass 'None' as the answer coordinate to the TAPAS
Tokenizer, we don't get the labels tensor as the result of that.
- While using that tokenization output and passing it to the TAPAS
model, it does not compute loss rather just returns the predicted answer
coordinates and predicted aggregation operator (this is the case while we
do inference)
I have tried passing the labels tensor to be all zeros as try but that
makes the model learn to not select any column
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#141>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APARZONPVIJOKKVIN5ANS73UEPOYPANCNFSM5FBLF23Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I am trying to Fine-tune the pretrained TAPAS WTQ model on a custom dataset. I have used both Hugging face Pytorch code. My dataset has majority of samples with arithmetic operations, so they rely on scalar answer as supervision. The details of the problem faced with both codes are described below:
I have tried passing the labels tensor to be all zeros as try but that makes the model learn to not select any column
The text was updated successfully, but these errors were encountered: