Skip to content

This repository contains the code used for the method paper "BERTweet’s TACO Fiesta: Contrasting Flavors On The Path Of Inference And Information-Driven Argument Mining On Twitter".

Notifications You must be signed in to change notification settings

TomatenMarc/TACO-Fiesta

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🌮🔥 BERTweet's TACO Fiesta: Contrasting Flavors On The Path Of Inference And Information-Driven Argument Mining On Twitter 🔥🌮

Share to Community

Open In Colab

This repository contains the code used for the paper "BERTweet's TACO Fiesta: Contrasting Flavors On The Path Of Inference And Information-Driven Argument Mining On Twitter".

Table of Contents:

Repository Layout

  1. notebooks
    1. classifier_cv.ipynb: For the validation of closed and cross-topic classifications on TACO.
    2. data_augmentation.ipynb: For creating A-TACO, which is an augmented copy of TACO.
    3. fine_tuning_bertweet.ipynb: For the contrastive pre-classification fine-tuning of BERTweet.
    4. target_space.ipynb: For visually analyzing the optimized embeddings of BERTweet.
    5. train_classifier.ipynb: For training Augmented BERTweet and retraining WRAP for publication.
    6. tweet_embeddings.ipynb: For generating embeddings of TACO and A-TACO for all BERTweet models.
  2. outputs
    1. cv-6-shuffled.csv: The outputs of all models for closed-topic evaluation with dynamic embeddings.
    2. cv-6-topic.csv: The outputs of all models for cross-topic evaluation with dynamic embeddings.
    3. cv-6-shuffled-frozen.csv: The outputs of all models for closed-topic evaluation with frozen embeddings.
    4. cv-6-topic-frozen.csv: The outputs of all models for cross-topic evaluation with frozen embeddings.

All models can be used via Huggingface:

Note

Notice: For accessing the data please contact the authors for additional information.

Findings

Macro Performance Pre-Classification Fine-Tuning on Holdout Data

Model                     Precision   Recall      F1     

   Evaluation on golden holdout tweets of #Abortion

Vanilla BERTweet-CLS      50.00       100.00      66.67 
Augmented BERTweet-CLS    65.69       86.66       74.73 
WRAPresentations-CLS      66.00       84.32       74.04 
WRAPresentations-MEAN     63.05       88.91       73.78 

Macro F1 Performance Binary Classification Tasks

                    (1) Inference       (2) Information     (3) Multi-Class                                   
Model               Frozen  Dynamic     Frozen  Dynamic     Frozen  Dynamic                                                                    

                    Closed-Topic (6-fold) Validation                        
                                                                    
Length                  62.34               71.47               38.26        
SVM + TF-IDF            76.00               75.44               55.39        
LR + TF-IDF             76.87               74.73               54.76        
RF + TF-IDF             76.12               80.56               55.65        
Vanilla BERTweet    73.12   84.54       66.49   83.55       42.87   71.05
Augmented BERTweet  84.49   86.68       79.22   84.57       67.07   73.80
WRAPresentations    86.88   86.62       81.54   86.30       71.07   75.29

                    Cross-Topic (6-fold) Validation
                                                                    
Length                  61.99               71.55               38.17        
SVM + TF-IDF            72.24               74.79               50.55        
LR + TF-IDF             72.20               75.90               50.41        
RF + TF-IDF             73.93               80.16               53.29        
Vanilla BERTweet    70.28   83.15       66.15   82.22       39.00   68.12
Augmented BERTweet  84.20   84.25       79.38   83.31       66.41   69.99
WRAPresentations    86.83   86.27       81.54   84.90       70.93   73.54

Micro F1 Performance Multi-Class Task

                        Reason             Statement          Notification           None
Model               Frozen  Dynamic     Frozen  Dynamic     Frozen  Dynamic     Frozen  Dynamic

                            Closed-Topic (6-fold) Validation                        
                                                                    
Length                  61.68               20.19               14.47               56.72          
SVM + TF-IDF            64.79               24.57               62.36               69.85          
LR + TF-IDF             65.75               17.66               62.62               73.02          
RF + TF-IDF             69.35               17.30               63.35               72.62          
Vanilla BERTweet    66.05   74.98       00.00   53.99       43.80   77.62       61.63   77.62  
Augmented BERTweet  74.50   76.82       49.53   58.37       70.95   80.28       73.29   79.71  
WRAPresentations    77.34   78.14       58.66   60.96       72.61   79.36       75.67   82.72  

                            Cross-Topic (6-fold) Validation
                                                                    
Length                  61.78               19.32               14.49               57.09          
SVM + TF-IDF            62.35               18.68               56.11               65.05          
LR + TF-IDF             65.19               16.09               55.30               65.08          
RF + TF-IDF             68.61               13.33               62.75               68.46          
Vanilla BERTweet    63.57   73.15       00.00   47.40       35.79   74.92       56.64   77.01  
Augmented BERTweet  75.18   75.10       46.34   51.74       71.61   75.71       72.50   77.42  
WRAPresentations    77.13   77.05       57.62   58.33       73.05   78.45       75.91   80.33  

Examples of Text Data and Augmentation Techniques

Examples for Reason

Topic Original Augmented
Abortion If you eat eggs, you shouldn't say anything against abortion #AbortionIsHealthcare #AbortionIsAWomansRight #AbortionBan #abortion HTTPURL If you eat meat, you should not say anything against it.....
Brexit #OTD 1920 science fiction author Isaac Asimov was born. When stupidity is considered patriotism, it is unsafe to be intelligent. As advocated by #NotMyPM serial #LiarJohnson with the disaster called #Brexit. HTTPURL A science fiction author @USER was born. When he is considered intelligent, it became unsafe to be intelligent. As advocated by a serial killer with the disaster called @USER. HTTPURL
Twitter-Takeover It's amazing that so much stupid could come out of someone so small... #TwitterTakeover HTTPURL It is amazing that so much good could come out of someone so small... HTTPURL HTTPURL

Examples for Statement

Topic Original Augmented
Abortion Do the people against requiring the #vaccine- stating the argument 'it's against our #medicalfreedom'- realize that outlawing #abortion is against the same #rights they are leaning on? #VaccineMandate #AbortionBan #prochoice #ProLife #YCHYCAEIT Do the people against requiring a #vaccine- stating the argument 'it is against our rights'- realize that outlawing it is against the same #rights they are leaning against? #VaccineMandate HTTPURL #CYCHYCAEIT
Brexit The #brexit countdown clock is like the years rolling back The Christmas countdown advert is like the clocks rolling back
Twitter-Takeover @USER @USER Not really. They see that Twitter continues to make poor decisions that devalue the stock and product. @USER Not really. They see that Apple continues to make poor decisions that devalue the stock and product.

Examples for Notification

Topic Original Augmented
Abortion BREAKING: Federal judge tells #Texas to shove its 6-week #abortion ban... HTTPURL BREAKING: Federal judge tells Trump to shove his Muslim travel ban... HTTPURL
Brexit British lawmakers finally approve historic #Brexit deal HTTPURL US lawmakers to approve historic trade deal HTTPURL
Twitter-Takeover The CEO of Twitter @USER hasn't posted in 4 days. Meanwhile, @USER has posted more than 15 times in that period. #TwitterTakeover HTTPURL The CEO of the company has not posted in months. However, he has posted several times in that period. HTTPURL

Examples for None

Topic Original Augmented
Abortion @USER @USER Bastards! Yes!
Brexit @USER 😆 Lol...
Twitter-Takeover @USER @USER Hi Jules! 👋 Hello! 👋

Publication

Licensing

BERTweet’s TACO Fiesta: Contrasting Flavors On The Path Of Inference And Information-Driven Argument Mining On Twitter by Marc Feger is licensed under CC BY-NC-SA 4.0

Contact

Please contact marc.feger@uni-duesseldorf.de or stefan.dietze@gesis.org.

About

This repository contains the code used for the method paper "BERTweet’s TACO Fiesta: Contrasting Flavors On The Path Of Inference And Information-Driven Argument Mining On Twitter".

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published