Less is More: Pay Less Attention in Vision Transformers

Zizheng Pan; Bohan Zhuang; Haoyu He; Jing Liu; Jianfei Cai

Less is More: Pay Less Attention in Vision Transformers

Zizheng Pan, Bohan Zhuang, Haoyu He, Jing Liu, Jianfei Cai

[AAAI-22] Main Track

Keywords
Poster Session 2 @ Red 3, Poster Session 9 @ Red 3, Poster Session 2, Poster Session 9

Download Paper

Enter the Virtual Venue

Abstract: Transformers have become one of the dominant architectures in deep learning, particularly as a powerful alternative to convolutional neural networks (CNNs) in computer vision. However, Transformer training and inference in previous works can be prohibitively expensive due to the quadratic complexity of self-attention over a long sequence of representations, especially for high-resolution dense prediction tasks. To this end, we present a novel Less attention vIsion Transformer (LIT), building upon the fact that the early self-attention layers in Transformers still focus on local patterns and bring minor benefits in recent hierarchical vision Transformers. Specifically, we propose a hierarchical Transformer where we use pure multi-layer perceptrons (MLPs) to encode rich local patterns in the early stages while applying self-attention modules to capture longer dependencies in deeper layers. Moreover, we further propose a learned deformable token merging module to adaptively fuse informative patches in a non-uniform manner. The proposed LIT achieves promising performance on image recognition tasks, including image classification, object detection and instance segmentation, serving as a strong backbone for many vision tasks.

Introduction Video

Sessions where this paper appears

Timezone

Poster Session 2

Red 3

{ "name":"Less is More: Pay Less Attention in Vision Transformers (Poster Session 2)", "description":"", "startDate":"02-24-2022", "endDate":"02-24-2022", "startTime": "16:45", "endTime": "18:30", "location": "Red 3", "timeZone": "US/Pacific", "options":[ "Apple", "Google", "iCal", "Microsoft365", "Outlook.com", "Yahoo" ] }

Poster Session 2
Poster Session 9

Red 3

{ "name":"Less is More: Pay Less Attention in Vision Transformers (Poster Session 9)", "description":"", "startDate":"02-27-2022", "endDate":"02-27-2022", "startTime": "00:45", "endTime": "02:30", "location": "Red 3", "timeZone": "US/Pacific", "options":[ "Apple", "Google", "iCal", "Microsoft365", "Outlook.com", "Yahoo" ] }

Poster Session 9