LLMs hallucinate when eradicating affected person information from EPR, finds research

Editorial Team
4 Min Read


AI instruments generally produce hallucinations when requested to take away private affected person data from digital affected person information (EPRs), a research has discovered.

Researchers from the College of Oxford evaluated the flexibility of huge language fashions (LLMs) and purpose-built software program instruments to detect and take away affected person names, dates, medical report numbers, and different identifiers from real-world information, with out altering scientific content material.

The research, printed by iScience on 9 December 2025, discovered that smaller LLMs often over-redacted or produced hallucinatory content material, through which misguided textual content not current within the unique report was proven, or often introducing fabricated medical particulars.

“Hallucinations, notably people who fabricate scientific data, pose a non-trivial danger to the integrity of downstream analysis.

“We recommend future analysis specializing in systematic, scalable methods to detect and supress hallucinations, particularly in zero- and few-shot situations,” the research says.

Firstly, the researchers examined the flexibility of a human to anonymise the info by manually redacting 3,650 medical information, evaluating and correcting the info till they’d an entire set to make use of as a benchmark.

They then in contrast two task-specific de-identification software program instruments (Microsoft Azure and AnonCAT) and 5 general-purpose LLMs, together with GPT-4, GPT-3.5, Llama-3, Phi-3, and Gemma for redacting identifiable data.

Dr Andrew Soltan, educational scientific lecturer in oncology on the College of Oxford and engineering analysis fellow, mentioned: “Whereas some giant language fashions carry out impressively, others can generate false or deceptive textual content.

“This behaviour poses a danger in scientific contexts, and cautious validation is important earlier than deployment.”

The researchers concluded that automating de-identification might considerably scale back the time and price required to arrange scientific information for analysis, whereas sustaining affected person privateness in compliance with information safety rules.

Microsoft’s Azure de-identification service achieved the very best efficiency general, carefully matching human reviewers. GPT-4 additionally carried out strongly, demonstrating that fashionable language fashions can precisely take away identifiers with minimal fine-tuning or task-specific coaching.

Dr Soltan added: “Certainly one of our most promising findings was that we don’t have to retrain advanced AI fashions from scratch.

“We discovered that some fashions labored effectively out-of-the-box, and that others noticed their efficiency nudged upwards with easy methods.

“For the general-purpose fashions, this meant exhibiting them simply a handful of examples of what a accurately anonymised report seems to be like.

“For the specialised software program, one mannequin discovered to decide up nuances in our hospital’s information, just like the format of phone extensions, after fine-tuning on simply a small pattern.

“That is thrilling as a result of it exhibits a sensible path for hospitals to undertake these applied sciences with out manually labelling hundreds of affected person notes.”

Professor David Eyre, professor of infectious illnesses at Oxford Inhabitants Well being and the Huge Knowledge Institute, mentioned: “This work exhibits that AI could be a highly effective ally in defending affected person confidentiality.

“However human judgement and robust governance should stay on the centre of any system that handles affected person information.”

The research was supported by the Nationwide Institute for Well being and Care Analysis (NIHR), Microsoft Analysis UK, Most cancers Analysis UK, the EPSRC, and the NIHR Oxford Biomedical Analysis Centre.

 

Share This Article