Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models

by   Cheng-Yu Hsieh, et al.

Today, large language models (LLMs) are taught to use new tools by providing a few demonstrations of the tool's usage. Unfortunately, demonstrations are hard to acquire, and can result in undesirable biased usage if the wrong demonstration is chosen. Even in the rare scenario that demonstrations are readily available, there is no principled selection protocol to determine how many and which ones to provide. As tasks grow more complex, the selection search grows combinatorially and invariably becomes intractable. Our work provides an alternative to demonstrations: tool documentation. We advocate the use of tool documentation, descriptions for the individual tool usage, over demonstrations. We substantiate our claim through three main empirical findings on 6 tasks across both vision and language modalities. First, on existing benchmarks, zero-shot prompts with only tool documentation are sufficient for eliciting proper tool usage, achieving performance on par with few-shot prompts. Second, on a newly collected realistic tool-use dataset with hundreds of available tool APIs, we show that tool documentation is significantly more valuable than demonstrations, with zero-shot documentation significantly outperforming few-shot without documentation. Third, we highlight the benefits of tool documentations by tackling image generation and video tracking using just-released unseen state-of-the-art models as tools. Finally, we highlight the possibility of using tool documentation to automatically enable new applications: by using nothing more than the documentation of GroundingDino, Stable Diffusion, XMem, and SAM, LLMs can re-invent the functionalities of the just-released Grounded-SAM and Track Anything models.


page 8

page 23


Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations

Although large language models can be prompted for both zero- and few-sh...

Self-ICL: Zero-Shot In-Context Learning with Self-Generated Demonstrations

Large language models (LMs) have exhibited superior in-context learning ...

Visually-Grounded Descriptions Improve Zero-Shot Image Classification

Language-vision models like CLIP have made significant progress in zero-...

Is ChatGPT a Biomedical Expert? – Exploring the Zero-Shot Performance of Current GPT Models in Biomedical Tasks

We assessed the performance of commercial Large Language Models (LLMs) G...

Zero-shot Task Adaptation using Natural Language

Imitation learning and instruction-following are two common approaches t...

Forget Demonstrations, Focus on Learning from Textual Instructions

This work studies a challenging yet more realistic setting for zero-shot...

GeneGPT: Augmenting Large Language Models with Domain Tools for Improved Access to Biomedical Information

While large language models (LLMs) have been successfully applied to var...

Please sign up or login with your details

Forgot password? Click here to reset