CrossCodeBench: Benchmarking Cross-Task Generalization of Source Code Models

by   Changan Niu, et al.

Despite the recent advances showing that a model pre-trained on large-scale source code data is able to gain appreciable generalization capability, it still requires a sizeable amount of data on the target task for fine-tuning. And the effectiveness of the model generalization is largely affected by the size and quality of the fine-tuning data, which is detrimental for target tasks with limited or unavailable resources. Therefore, cross-task generalization, with the goal of improving the generalization of the model to unseen tasks that have not been seen before, is of strong research and application value. In this paper, we propose a large-scale benchmark that includes 216 existing code-related tasks. Then, we annotate each task with the corresponding meta information such as task description and instruction, which contains detailed information about the task and a solution guide. This also helps us to easily create a wide variety of “training/evaluation” task splits to evaluate the various cross-task generalization capabilities of the model. Then we perform some preliminary experiments to demonstrate that the cross-task generalization of models can be largely improved by in-context learning methods such as few-shot learning and learning from task instructions, which shows the promising prospects of conducting cross-task learning research on our benchmark. We hope that the collection of the datasets and our benchmark will facilitate future work that is not limited to cross-task generalization.


page 1

page 6


OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization

Recent work has shown that fine-tuning large pre-trained language models...

ZooD: Exploiting Model Zoo for Out-of-Distribution Generalization

Recent advances on large-scale pre-training have shown great potentials ...

Learning to Initialize: Can Meta Learning Improve Cross-task Generalization in Prompt Tuning?

Prompt tuning (PT) which only tunes the embeddings of an additional sequ...

MetaXT: Meta Cross-Task Transfer between Disparate Label Spaces

Albeit the universal representational power of pre-trained language mode...

Impossible Triangle: What's Next for Pre-trained Language Models?

Recent development of large-scale pre-trained language models (PLM) have...

Out-of-distribution Few-shot Learning For Edge Devices without Model Fine-tuning

Few-shot learning (FSL) via customization of a deep learning network wit...

Improving Baselines in the Wild

We share our experience with the recently released WILDS benchmark, a co...

Please sign up or login with your details

Forgot password? Click here to reset