Skip to content

laihuiyuan/Figurative-Language-Generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 

Repository files navigation

A Survey on Automatic Generation of Figurative Language: From Rule-based Systems to Large Language Models (ACM Computing Surveys)

Abstract

Figurative language generation (FLG) is the task of reformulating a given text to include a desired figure of speech, such as a hyperbole, a simile, and several others, while still being faithful to the original context. This is a fundamental, yet challenging task in Natural Language Processing (NLP), which has recently received increased attention due to the promising performance brought by pre-trained language models. Our survey provides a systematic overview of the development of FLG, mostly in English, starting with the description of some common figures of speech, their corresponding generation tasks and datasets. We then focus on various modelling approaches and assessment strategies, leading us to discussing some challenges in this field, and suggesting some potential directions for future research. To the best of our knowledge, this is the first survey that summarizes the progress of FLG including the most recent development in NLP. We also organize corresponding resources, e.g., paper lists and datasets, and make them accessible in an open repository. We hope this survey can help researchers in NLP and related fields to easily track the academic frontier, providing them with a landscape and a roadmap of this area.

Survey Overview

Datasets & Benchmarks

Figure of speech Task Dataset train Valid Test Lang Para
Simile Literal↔Simile Data 82,687 5,145 150 en
Simile↔Context Data 5.4M 2,500 2,500 zh
Narrative+Simile→Text Data 3,100 376 1,520 en
Concept→Analogy + Explanation Data - - 148 en
Metaphor Literal↔Metaphor Data 260k 15,833 250 en
Data 90k 3,498 150 en
Data 248k - 150 en
Data - - 171 en
CMC 3,554/2,703 - - zh
Hyperbole Literal↔Hyperbole Paper 709 - - en
HYPO-cn 2,082/2,680 - - zh
HYPO-red 2,163/1,167 - - en
HYPO-XL -/17,862 - - en
Idiom Idiom↔Literal Paper 88 - 84 en
Idiom (en)↔Literal (de) Data 1,998 - 1,500 en/de
Idiom (de)↔Literal (en) 1,848 - 1,500 de/en
Literal↔Idiom PIE 3,784 876 876 en
Narrative+Idiom→Text Data 3,204 355 1,542 en
Irony (Sarcasm) Literal↔Irony (Sarcasm) Data 2,400 300 300 en
Data - - 203 en
Data 112k/262k - - en
Data 4,762 - - en
Pun Word senses→Pun Data 1,274 - - en
Context→Pun Data 2,753 - - en
Personification Topic→Personification Data 67,441 3,747 3,747 zh

Modelling Approaches

We review the modelling approaches, from traditional to state-of-the-art, and divide them into two categories: knowledge-based and neural-based approaches.

Knowledge-based Approaches

Subcategory Paper Code Form Venue Pros and Cons
Rule and template Abe et al. - Metaphor CSS 2006 Pros:
- Intuitive and simple
- Tailored to specific forms
Cons:
- Poor flexibility and diversity
Terai et al. - Metaphor ICANN 2010
Joshi et al. Code Sarcasm WISDOM 2015
Veale et al. - Metaphor Metaphor WS 2016
Knowledge resource Pereira et al. - Metaphor AAAI WS 2006 Pros:
- Exploiting knowledge resource
- High interpretability
Cons:
- Prior linguistic knowledge
- Construct desired resources
Veale et al. - Metaphor COLING 2008
Petrović et al. - Pun ACL 2013
Hong et al. - Pun CALC 2009
Shutova et al. - Metaphor NAACL 2010
Valitutti et al. - Pun ACL 2013
Liu et al. - Idiom NAACL 2016
Gero et al. - Metaphor CHI 2019
Stowe et al. - Metaphor ACL 2021
Hervas et al. - Metaphor MICAI 2007
Ovchinnikova et al. - Metaphor Arxiv 2014
Harmon et al. - Simile ICCC 2015

Neural-based Approaches

Subcategory Paper Code Form Venue Pros and Cons
Training from scratch Peled et al. Code Sarcasm ACL 2017 Pros:
- Straightforward
- Combine retrieval approaches
Cons:
- Large-scale training data
- Large computational resources
Fadaee et al. Code Idiom LREC 2018
Liu et al. Code Metaphor/
Personification
ACL 2019
Stowe et al. Code Metaphor CoNLL 2021
Yu et al. - Pun ACL 2018
Yu et al. Code Metaphor NAACL 2019
Li et al. Code Metaphor INLG 2022
He et al. Code Pun NAACL 2019
Yu et al. Code Pun EMNLP 2020
Zhou et al. Code Idiom Arxiv 2021
Zhu et al. Code Irony Arxiv 2019
Luo et al. Code Pun EMNLP 2019
Mishra et al. Code Sarcasm EMNLP 2019
Fine-tuning PLMs Zhang et al. Code Simile AAAI 2021 Pros:
- Straightforward
-Pre-trained knowledge
- State-of-the-art results
Cons:
- Large computational resources
Zhou et al. Code Idiom AAAI 2022
Zhang et al. Code Hyperbole NAACL 2022
Chakrabarty et al. Code Simile EMNLP 2020
Stowe et al. Code Metaphor ACL 2021
Chakrabarty, et al. Code Metaphor NAACL 2021
Stowe et al. Code Metaphor CoNLL 2021
Tian et al. Code hyperbole EMNLP 2021
Chakrabarty et al. Code Sarcasm ACL 2020
Mittal et al. Code Pun NAACL 2022
Chakrabarty et al. Code Idiom
Simile
TACL 2022
Tian et al. Code Pun EMNLP 2022
Lai et al. Code Hyperbole
Sarcasm
Idiom
Metaphor
Simile
COLING 2022
Prompt learning Chakrabarty et al. Code Idiom
Simile
TACL 2022 Pros:
- Straightforward
- A few/no labelled samples
Cons:
- Prompt engineering
- Large computational resources
Reif et al. - Metaphor ACL 2022
Mittal et al. Code Pun NAACL 2022
Bhavya et al. Code Analogy
(Simile)
INLG 2022

Evaluation Methods

We review 34 papers and count automatic metrics used for the automatic evaluation and criteria set for the human evaluation in Figurative Language Generation.

Workshops

- [4th Workshop on Processing Figurative Language Processing](https://sites.google.com/view/figlang2024). 2024. - [3rd Workshop on Processing Figurative Language Processing](https://aclanthology.org/events/flp-2022/). 2022. - [2nd Workshop on Processing Figurative Language Processing](https://aclanthology.org/volumes/2020.figlang-1/). 2020. - [1st Workshop on Processing Figurative Language Processing](https://aclanthology.org/volumes/W18-09/). 2020.

Citation

@article{lai-etal-2024-agfl,
    title = "A Survey on Automatic Generation of Figurative Language: From Rule-based Systems to Large Language Models",
    author = "Lai, Huiyuan and Nissim, Malvina",
    journal = {ACM Computing Surveys},
    year = {2024},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
}