InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation

Pierre Colombo, Chloé Clavel, Pablo Piantanida

[AAAI-22] Main Track
Abstract: Assessing the quality of natural language generation (NLG) systems through human annotation is very expensive. Additionally, human annotation campaigns are time-consuming and include non-reusable human labour. In practice, researchers rely on automatic metrics as a proxy of quality. In the last decade, many string-based metrics (\textit{e.g.}, BLEU or ROUGE) have been introduced. However, such metrics usually rely on exact matches and thus, do not robustly handle synonyms. In this paper, we introduce \texttt{InfoLM} a family of untrained metrics that can be viewed as a string-based metric that addresses the aforementioned flaws thanks to a pre-trained masked language model. This family of metrics also makes use of information measures allowing the possibility to adapt \texttt{InfoLM} to different evaluation criteria. Using direct assessment, we demonstrate that \texttt{InfoLM} achieves statistically significant improvement and two figure correlation gains in many configurations compared to existing metrics on both summarization and data2text generation tasks.

Introduction Video

Sessions where this paper appears

  • Poster Session 2

    Fri, February 25 12:45 AM - 2:30 AM (+00:00)
    Red 5
    Add to Calendar

  • Poster Session 7

    Sat, February 26 4:45 PM - 6:30 PM (+00:00)
    Red 5
    Add to Calendar

  • Oral Session 7

    Sat, February 26 6:30 PM - 7:45 PM (+00:00)
    Red 5
    Add to Calendar