Google’s new AI referee Cappy brings out the best in large language models



Google Research has developed a new method called Cappy to improve the performance and efficiency of Large Language Models (LLMs). The lightweight, pre-trained scorer, with only 360 million parameters, allows LLMs to be adapted to specific tasks without fine-tuning.

Cappy is a scoring system that evaluates the quality of the answers generated by LLMs with comparatively little computational effort, and can therefore improve their performance, as Google engineers Yun Zhu and Lijuan Liu explain in a blog post.

The new method makes it possible to tailor LLMs to specific tasks without having to fine-tune model parameters. According to Google, this saves memory and processing power.

Cappy works as a kind of referee: it evaluates how well an LLM’s answers match a specific question. Cappy assigns values between 0 and 1: the higher the value, the better the answer.



For example, if a user asks “What was the impact of the Industrial Revolution on society?” and the LLM generates several answers, Cappy can highlight the answer that covers the most important aspects, such as urbanization, the emergence of the working class, and social upheaval, and display it to the user.

Less relevant answers with low scores are hidden. In this way, Cappy ensures that the LLM provides the most accurate, relevant, and high-quality answers possible.

Image: Zhu & Liu, Google Research

Cappy works either standalone in classification tasks or as an auxiliary component in multi-task LLMs to improve their performance.

Lean “scorer” optimizes LLM performance

To make these evaluations, Cappy is first trained on a large number of question-answer pairs. This way, the system learns to distinguish between good and bad answers. The existing language model RoBERTa serves as the basis for this pre-training process.

Cappy also works with LLMs that can only be used via APIs. In contrast to in-context learning approaches, where the information is provided directly in the prompt, Cappy is not limited by the input length and can include any number of training examples.


Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top