Why LLM Citations Are Often Wrong: Correctness vs Faithfulness
Citations can look right and still be misleading. Learn the difference between correctness and faithfulness, why post-rationalized citations happen, and how to test for them.
People want “LLM answers with citations” because citations feel like truth.
But there’s a trap:
A citation can be correct (the source contains supporting text) and still be unfaithful (the model didn’t actually rely on it).
A 2025 paper on RAG attributions calls this out explicitly and reports substantial post-rationalization—up to 57% of citations lacking faithfulness in their experiments.
If you’re building any citation-based trust layer, you need this distinction.
The two citation problems you actually have
Problem A: Citation correctness
Does the cited source support the claim?
This is the normal check: “Is the text there? Does it entail the statement?”
Problem B: Citation faithfulness
Did the model use the cited source to produce the claim—or did it answer from memory and then attach a source after the fact?
The faithfulness problem matters because it creates a false sense of security: users trust a citation that didn’t causally contribute.
Why post-rationalized citations happen in practice
Common failure patterns:
- the model answers from parametric memory (what it “knows”)
- it retrieves sources that “seem relevant”
- it picks something that looks supportive enough
- it cites it—sometimes loosely or incorrectly
Now you have citations that look scholarly, but don’t reflect how the answer was produced.
This isn’t hypothetical. In health-related Q&A, a large evaluation found many responses were not fully supported by cited sources—and sometimes contradicted—highlighting how fragile “citation trust” can be.
5 tests you can run to catch unfaithful citations
You don’t need perfect research-grade evaluation to improve reliability. You need cheap tests that expose post-rationalization.
- Counterfactual swap test. If you change the cited source content (or swap to an alternative) in a way that should change the answer, does the answer change?
- Drop-the-citation test. Remove the cited document from context. If the answer remains identical, the citation may not be doing real work.
- Adversarial distraction test. Add irrelevant-but-plausible documents. A faithfulness-aware system should ignore them and keep citations stable.
- Statement-level coverage test. Break the answer into statements and require each statement to be supported by at least one citation.
- Post-processing citation correction. Cross-check generated citations against retrieved material and reassign citations using matching/NLI-like methods.
What to do about it: three practical strategies
Strategy 1: Make citations granular and anchored
If your output is grounded in PDFs, “URL citations” are too coarse. Users need:
- page number
- precise region
- source snippet
CiteLLM returns this kind of citation object for each extracted field (page, bbox, snippet, confidence).
That doesn’t fully solve faithfulness for open-ended Q&A—but it does drastically improve verifiability for document extraction, because the citation is tied to the specific location where the value appears.
Strategy 2: Verify with a second model (or NLI)
Several citation-verification approaches incorporate claim checking with NLI-style verification to test whether evidence supports a claim.
You can implement a lightweight version:
- split answer into claims
- for each claim, verify support using retrieved text
- if unsupported → downgrade confidence, ask for review, or abstain
Strategy 3: Build UI that encourages skepticism
Instead of “Sources: [1][2][3]” at the bottom, give users:
- inline citations
- a “view source” panel
- highlighted snippets
- a fast “Flag this claim” action
This turns your citations into a workflow, not decoration.
The key insight
Citations are not the end goal.
They’re a tool to make verification fast—if they are correct, comprehensive, and faithful enough to deserve trust.
If you design your product assuming citations are always reliable, your users will eventually prove you wrong.