Papers

A Multimodal Fusion-Based LNG Detection for Monitoring Energy Facilities (Student Abstract)
Junchi Bin, Choudhury A. Rahman, Shane Rogers, Shan Du, Zheng Liu
[AAAI-22] Student Abstract and Poster Program
Interactive Image Generation with Natural-Language Feedback
Yufan Zhou, Ruiyi Zhang, Jiuxiang Gu, Chris Tensmeyer, Tong Yu, Changyou Chen, Jinhui Xu, Tong Sun
[AAAI-22] Main Track
Improving Zero-Shot Phrase Grounding via Reasoning on External Knowledge and Spatial Relations
Zhan Shi, Yilin Shen, Hongxia Jin, Xiaodan Zhu
[AAAI-22] Main Track
MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-Based Image Captioning
Wenqiao Zhang, Haochen Shi, Jiannan Guo, Shengyu Zhang, Qingpeng Cai, Juncheng Li, Sihui Luo, Yueting Zhuang
[AAAI-22] Main Track
TEACh: Task-Driven Embodied Agents that Chat
Aishwarya Padmakumar, Jesse Thomason, Ayush Shrivastava, Patrick Lange, Anjali Narayan-Chen, Spandana Gella, Robinson Piramuthu, Gokhan Tur, Dilek Hakkani-Tur
[AAAI-22] Main Track
Predicting Physical World Destinations for Commands Given to Self-Driving Cars
Dusan Grujicic, Thierry Deruyttere, Marie-Francine Moens, Matthew Blaschko
[AAAI-22] Main Track
SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning
Zhecan Wang, Haoxuan You, Liunian Harold Li, Alireza Zareian, Suji Park, Yiqing Liang, Kai-Wei Chang, Shih-Fu Chang
[AAAI-22] Main Track
Flow-Based Unconstrained Lip to Speech Generation
Jinzheng He, Zhou Zhao, Yi Ren, Jinglin Liu, Baoxing Huai, Nicholas Yuan
[AAAI-22] Main Track
Attention-Aligned Transformer for Image Captioning
Zhengcong Fei
[AAAI-22] Main Track
Are Vision-Language Transformers Learning Multimodal Representations? A Probing Perspective.
Emmanuelle Salin, Badreddine Farah, Stéphane Ayache, Benoit Favre
[AAAI-22] Main Track
DOC2PPT: Automatic Presentation Slides Generation from Scientific Documents
Tsu-Jui Fu, William Yang Wang, Daniel McDuff, Yale Song
[AAAI-22] Main Track
Visual Sound Localization in the Wild by Cross-Modal Interference Erasing
Xian Liu, Rui Qian, Hang Zhou, Di Hu, Weiyao Lin, Ziwei Liu, Bolei Zhou, Xiaowei Zhou
[AAAI-22] Main Track
Motion-Aware Embedding Enhancement for Image-Text Retrieval
Jiangtong Li, Li Niu, Liqing Zhang
[AAAI-22] Main Track
OVIS: Open-Vocabulary Visual Instance Search via Visual-Semantic Aligned Representation Learning
Sheng Liu, Kevin Lin, Lijuan Wang, Junsong Yuan, Zicheng Liu
[AAAI-22] Main Track
End-to-End Transformer Based Model for Image Captioning
Yiyu Wang, Jungang Xu, Yingfei Sun
[AAAI-22] Main Track
Visual Semantics Allow for Textual Reasoning Better in Scene Text Recognition
Yue He, Chen Chen, Jing Zhang, Juhua Liu, Fengxiang He, Chaoyue Wang, Bo Du
[AAAI-22] Main Track
Bridging the Gap between Expression and Scene Text for Referring Expression Comprehension (Student Abstract)
Yuqi Bu, Jiayuan Xie, Liuwu Li, Qiong Liu, Yi Cai
[AAAI-22] Student Abstract and Poster Program
Video as Conditional Graph Hierarchy for Multi-Granular Question Answering
Junbin Xiao, Angela Yao, Liu Zhiyuan, Yicong Li, Wei Ji, Tat-Seng Chua
[AAAI-22] Main Track
Playing Lottery Tickets with Vision and Language
Zhe Gan, Yen-Chun Chen, Linjie Li, Tianlong Chen, Yu Cheng, Shuohang Wang, Jingjing Liu, Lijuan Wang, Zicheng Liu
[AAAI-22] Main Track
We use cookies to store which papers have been visited.
I agree