LLMs are biased and don’t match human preferences when evaluating text, study finds


Large language models show cognitive biases and do not align with human preferences when evaluating text, according to a study.

Understanding bias in large language models (LLMs) is important because they are increasingly used in real-world applications, from recommending content to scoring job applications. When these models are biased, they can make decisions or predictions that are unfair or inaccurate.

Suppose an AI system is used to score job applications. The system uses a large language model to evaluate the quality of the cover letter. But if that model has an inherent bias, such as favoring longer text or certain keywords, it could unfairly favor some applicants over others, even if they are not necessarily more qualified.

Cognitive biases in LLMs

Researchers at the University of Minnesota and Grammarly have now conducted a study to measure cognitive biases in large language models (LLMs) when used to automatically evaluate text quality.



The research team assembled 15 LLMs from four different size ranges and analyzed their responses. The models were asked to evaluate the responses of other LLMs, e.g. “System Star is better than System Square”.

For this purpose, the researchers introduced the “COgnitive Bias Benchmark for LLMs as EvaluatoRs” (COBBLER), a benchmark for measuring six different cognitive biases in LLM evaluations.

They used 50 question-answer examples from the BIGBENCH and ELI5 datasets, generated responses from each LLM, and asked models to evaluate their own responses and the responses of other models.

Examples of measured biases include egocentric bias, where a model favors its results when scoring, and order bias, where a model favors an option based on its order. See the table below for a complete list of measured biases.

The biases investigated in the study. | Image: Koo et al.

The study shows that LLMs are biased when judging text quality. The researchers also examined the correlation between human and machine preferences and found that machine preferences do not closely match human preferences (rank bias overlap: 49.6%).


“Benchmarking Cognitive Biases in Large Language Models as Evaluators”.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top