Multilingual Code Snippets Training for Program Translation

Ming Zhu; Karthik Suresh; Chandan K. Reddy

Multilingual Code Snippets Training for Program Translation

Ming Zhu, Karthik Suresh, Chandan K. Reddy

[AAAI-22] Main Track

Keywords
Poster Session 4 @ Red 5, Poster Session 11 @ Red 5, Poster Session 4, Poster Session 11

Download Paper

Enter the Virtual Venue

Abstract: Program translation aims to translate source code from one programming language to another. It is particularly useful in applications such as multiple-platform adaptation and legacy code migration. Traditional rule-based program translation methods usually rely on meticulous manual rule-crafting, which is costly both in terms of time and effort. Recently, neural network based methods have been developed to address this problem. However, the absence of high-quality parallel code data is one of the main bottlenecks which impedes the development of program translation models. In this paper, we introduce CoST, a new multilingual Code Snippet Translation dataset that contains parallel data from 7 commonly used programming languages. The dataset is parallel at the level of code snippets, which provides much more fine-grained alignments between different languages than the existing translation datasets. We also propose a new program translation model that leverages cross-lingual snippet denoising auto-encoding and Multilingual Snippet Translation (MuST) pre-training. Extensive experiments shows that the multilingual snippet training is effective in improving program translation performance, especially for low-resource languages. Moreover, our training method shows good generalizability and consistently improves the translation performance of a number of baseline models. The proposed model outperforms the baselines on both snippet-level and program-level translation, and achieves state-of-the-art performance on CodeXGLUE translation task.

Introduction Video

Sessions where this paper appears

Timezone

Poster Session 4

Fri, February 25 5:00 PM - 6:45 PM (+00:00)

Red 5

Add to Calendar
Apple
Google
iCal File
Microsoft 365
Outlook.com
Yahoo

Poster Session 4
Poster Session 11

Mon, February 28 12:45 AM - 2:30 AM (+00:00)

Red 5

Add to Calendar
Apple
Google
iCal File
Microsoft 365
Outlook.com
Yahoo

Poster Session 11