Papers

Are Vision-Language Transformers Learning Multimodal Representations? A Probing Perspective.
Emmanuelle Salin, Badreddine Farah, Stéphane Ayache, Benoit Favre
[AAAI-22] Main Track
Video as Conditional Graph Hierarchy for Multi-Granular Question Answering
Junbin Xiao, Angela Yao, Liu Zhiyuan, Yicong Li, Wei Ji, Tat-Seng Chua
[AAAI-22] Main Track
MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-Based Image Captioning
Wenqiao Zhang, Haochen Shi, Jiannan Guo, Shengyu Zhang, Qingpeng Cai, Juncheng Li, Sihui Luo, Yueting Zhuang
[AAAI-22] Main Track
We use cookies to store which papers have been visited.
I agree