Sunday May 25, 2025

Computation and Language - Veracity Bias and Beyond Uncovering LLMs’ Hidden Beliefs in Problem-Solving Reasoning

Hey PaperLedge crew, Ernis here, ready to dive into some fascinating – and frankly, a little unsettling – research about AI. Today, we're unpacking a study that looks at how Large Language Models, or LLMs – think of them as super-smart chatbots – handle demographics and solution accuracy.

Now, these LLMs are supposed to be unbiased. They're programmed to avoid stereotypes. But, as this paper reveals, things aren't quite that simple. The researchers found that LLMs exhibit some pretty concerning biases when it comes to judging whether a solution is correct based on who they think wrote it.

Think of it like this: imagine you're a teacher grading papers. You shouldn't be influenced by the student's name or background, right? You should focus solely on the quality of the work. Well, this study suggests that LLMs aren't always doing that.

The researchers identified two main types of bias:

Attribution Bias: This is where the LLM is more likely to say a correct answer came from a certain demographic group, even if it didn't. It's like assuming the math whiz in class is always going to be that kid.
Evaluation Bias: This is even trickier. Here, the LLM might actually grade the same answer differently depending on who it thinks wrote it. So, a solution attributed to one group might get a better grade than the exact same solution attributed to another.

The researchers tested this across different problem types – math, coding, commonsense reasoning, and even writing – and used several different LLMs that are specifically designed to align with human values. The results? Pretty consistent biases across the board.

For example, in math and coding problems, LLMs were less likely to attribute correct solutions to African-American groups and more likely to say their solutions were incorrect. On the flip side, when it came to evaluating writing, LLMs seemed to have a bias against solutions they thought were written by Asian authors.

"Our results show pervasive biases: LLMs consistently attribute fewer correct solutions and more incorrect ones to African-American groups in math and coding, while Asian authorships are least preferred in writing evaluation."

But it gets even weirder. In another part of the study, the researchers asked the LLMs to generate code that visualized demographic groups. Shockingly, the LLMs automatically assigned racially stereotypical colors to these groups! This suggests that these biases aren't just surface-level; they're deeply embedded in the models' internal reasoning.

So, why does this matter? Well, think about how LLMs are increasingly being used in education – for tutoring, grading, and even providing feedback. If these systems are biased, they could perpetuate existing inequalities and disadvantage certain groups of students. This also applies to other evaluation settings, like job applications that use AI to screen candidates.

This research really highlights the need for careful scrutiny and ongoing monitoring of AI systems to ensure they're fair and equitable. We can't just assume that because these models are programmed to be unbiased, they actually are.

Here are a couple of things I'm wondering about:

Could these biases be amplified if the training data used to build these LLMs reflects existing societal biases?
What are some concrete steps we can take to mitigate these biases and ensure that AI is used in a way that promotes fairness and opportunity for everyone?

Really interesting stuff, crew. I'd love to hear your thoughts. What do you make of these findings, and what do you think we should be doing about it? Let's discuss!

Credit to Paper authors: Yue Zhou, Barbara Di Eugenio

Comment (0)

No comments yet. Be the first to say something!