Giant language fashions (LLMs) utilized by English native authorities to help social employees could also be introducing gender bias into care selections, in keeping with analysis from London College of Economics and Political Science (LSE).
The research, printed within the journal BMC Medical Informatics and Determination Making, on 11 August 2025, discovered that Google’s widely-used AI mannequin ‘Gemma’ downplays ladies’s bodily and psychological points compared to males’s when used to generate and summarise case notes.
Phrases related to vital well being issues, similar to “disabled,” “unable,” and “advanced,” appeared considerably extra typically in descriptions of males than ladies.
Related care wants amongst ladies have been extra prone to be omitted or described in much less severe phrases.
Dr Sam Rickman, lead writer of the report, mentioned: “If social employees are counting on biased AI-generated summaries that systematically downplay ladies’s well being wants, they could assess in any other case equivalent circumstances in another way primarily based on gender quite than precise want.
“Since entry to social care is set by perceived want, this might end in unequal care provision for ladies.”
The research is the primary to quantitatively measure gender bias in LLM-generated case notes from real-world care information, utilizing each state-of-the-art and benchmark fashions, providing an evidence-based analysis of the dangers of AI in social care.
LLMs are more and more getting used to ease the executive workload of social employees and the general public sector, but it surely stays unclear which particular fashions are being deployed by councils and whether or not they could be introducing bias.
To research potential gender bias, Dr Rickman used massive language fashions to generate 29,616 pairs of summaries primarily based on actual case notes from 617 grownup social care customers.
Every pair described the identical particular person, with solely the gender swapped, permitting for a direct comparability of how female and male circumstances have been handled by the AI.
The evaluation revealed statistically vital gender variations in how bodily and psychological well being points have been described.
Among the many fashions examined, Google’s AI mannequin, Gemma, exhibited extra pronounced gender-based disparities than benchmark fashions developed by both Google or Meta in 2019.
Meta’s Llama 3 mannequin – which is of the identical technology as Google’s Gemma – didn’t use completely different language primarily based on gender.
Dr Rickman mentioned: “Giant language fashions are already getting used within the public sector, however their use should not come on the expense of equity.
“Whereas my analysis highlights points with one mannequin, extra are being deployed on a regular basis making it important that every one AI techniques are clear, rigorously examined for bias and topic to strong authorized oversight.”
The analysis, carried out by LSE’s Care Coverage and Analysis Centre, was funded by the Nationwide Institute for Well being and Care Analysis.
Google mentioned that its groups will study the findings of the report.
In the meantime, OpenAI has introduced modifications to the way in which that ChatGPT interacts with customers, following analysis which discovered that LLMs can introduce biases and failures which can be dangerous to psychological well being.