SimCTC: A Simple Contrast Learning Method of Text Clustering (Student Abstract)

Chen Li, Xiaoguang Yu, Shuangyong Song, Jia Wang, Bo Zou, Xiaodong He

[AAAI-22] Student Abstract and Poster Program
Abstract: This paper presents SimCTC, a simple contrastive learning (CL) framework that greatly advances the state-of-the-art text clustering models. In SimCTC, a pre-trained BERT model first maps the input sequence to the representation space, which is then followed by three different loss function heads: Clustering head, Instance-CL head and Cluster-CL head. Experimental results on multiple benchmark datasets demonstrate that SimCTC remarkably outperforms 6 competitive text clustering methods with 1%-6% improvement on Accuracy (ACC) and 1%-4% improvement on Normalized Mutual Information (NMI). Moreover, our results also show that the clustering performance can be further improved by setting an appropriate number of clusters in the cluster-level objective.

Sessions where this paper appears

  • Poster Session 4

    Fri, February 25 5:00 PM - 6:45 PM (+00:00)
    Blue 5
    Add to Calendar

  • Poster Session 8

    Sun, February 27 12:45 AM - 2:30 AM (+00:00)
    Blue 5
    Add to Calendar