Abstract
Publicly available source-code libraries are continuously growing and changing. This makes it impossible for models of code to keep current with all available APIs by simply training these models on existing code repositories. Thus, existing models inherently cannot generalize to using unseen functions and libraries, because these would never appear in their training data. In contrast, when human programmers use functions and libraries for the first time, they frequently refer to textual resources such as code manuals and documentation, to explore and understand the available functionality. Inspired by this observation, we introduce DocPrompting: a natural-language-to-code generation approach that explicitly leverages code documentation by (1) retrieving the relevant documentation pieces given a natural language (NL) intent, and (2) generating code based on the NL intent and the retrieved documentation. DocPrompting is general: it can be applied to any programming language, and is agnostic to the underlying neural model. We demonstrate that DocPrompting consistently improves NL-to-code models: DocPrompting improves strong base models such as CodeT5 by 2.85% in pass@1 (52% relative gain) and 4.39% in pass@10 (30% relative gain) in execution-based evaluation on the popular Python CoNaLa benchmark; on a new Bash dataset tldr, DocPrompting improves CodeT5 and GPT-Neo-1.3B by up to absolute 6.9% exact match.
Original language | English |
---|---|
State | Published - 2023 |
Externally published | Yes |
Event | 11th International Conference on Learning Representations, ICLR 2023 - Kigali, Rwanda Duration: 1 May 2023 → 5 May 2023 |
Conference
Conference | 11th International Conference on Learning Representations, ICLR 2023 |
---|---|
Country/Territory | Rwanda |
City | Kigali |
Period | 1/05/23 → 5/05/23 |
Bibliographical note
Publisher Copyright:© 2023 11th International Conference on Learning Representations, ICLR 2023. All rights reserved.
ASJC Scopus subject areas
- Language and Linguistics
- Computer Science Applications
- Education
- Linguistics and Language