Skip to content

tanalpha-aditya/NLP-Sarcasm-Irony-Detection

Repository files navigation

Advanced NLP –  Project Submission

About

This repository consists of experiements conduced on Sarcasm and Irony datasets using various model architectures. The quality of the language model is measured through confusion matrix and detailed report is present in 15-report.pdf

Project files

Common Util files

  • preprocessing.ipynb module consists of various methods to load datasets, generate datasets, applying tokenization, etc
  • It consists of various cleaning methods to clean the text like following:
    1. replace_url to replace urls with URL
    2. replace_hashtags to replace with HASHTAG
    3. replace_email to replace with EMAIL
    4. replace_mentions to replace with MENTION
    5. replace_numbers to replace with NUMBER
    6. remove_abbrevations to replace possessive words with their extended representations.
    7. remove_special_patterns to replace words like 10334m delimiter words found in corpus.
    8. remove_punctuation to remove the punctuations and it can be replaced with PUNCT
  • Transformers specific preprocessing

  • Since transformers are powered by powerful tokenization like BPE etc, we restrcited preprocessing to following:
    1. replace_url to replace urls with URL
    2. Include special token '[EMOTICON]' for the sentence where emoticons and text baesd smilies are present
    3. Include special token '[ELONGATED]' for the sentences where words with elongaged expressions present like "foreveeer", "yayyy", "Aweeeeesome", etc
  • Model specific files

  • experiment/Irony_bilstm.ipynb consists of training code and evaluation loop methods for Bilstm model with attention
  • experiment/irony_transformers_hf.ipynb consists of bidirectional encoder transformer model implementation
  • experiment/irony_transformers_torch.ipynb consists of transformer encoder model implementation
  • experiment/setfit_impl.ipynb consists of setfit few shot training implementation
  • experiment/setfit_impl.ipynb consists of setfit few shot training implementation
  • experiment/irony_tf_exponential.ipynb consists of exponential task specific postional encoding transformer training implementation
  • Dependencies

    create virtual environment and install dependencies transformers, nltk, gensim, sklearn to reproduce https://drive.google.com/drive/folders/1wwpnXvfuH1vbCFFsTfMj_xuE1fhlSRZd?usp=drive_link

    Troubleshooting

    Often times the execution may fail if python path is not set correctly. Try loading the project to IDE for smooth execution.

    Contact

    Contact author for any queries to reproduce the results