Mismatches Are the Product: Cross-Document Reconciliation with Dual Citations

January 20, 2026 • Tags: reconciliation, cross-document verification, exception handling, audit readiness, workflow automation

Catch contradictions early by extracting canonical fields across documents, comparing them, and showing two clickable citations for every mismatch.

The highest value thing your system can do is not “extract more fields.”

It’s this:

Flag the mismatch, and show me the two pieces of proof that disagree.

CiteLLM’s use case examples repeatedly point to the same reality: in high-stakes workflows, it’s the discrepancies (not the misses) that cost you money and time. Citations make discrepancy review fast because every value can be traced back to its source region.

This post shows how to implement cross-document reconciliation in a way reviewers actually like.

Step 1: Normalize your document packs into canonical fields

Pick a canonical schema that works across related documents.

Example: AP 3-way match (invoice, PO, receipt)

Canonical keys:

po_number
vendor_name
line_items (or a simplified representation)
subtotal_amount
tax_amount
total_amount
currency
invoice_date / delivery_date

CiteLLM supports schemas for extraction and can run against PDFs provided by base64, URL, or uploaded document IDs.

Step 2: Extract each doc with the same canonical keys

Don’t create an “invoice schema” and a totally different “PO schema” unless you must.

Even if one field is missing in one document, keeping keys consistent reduces complexity.

Example extraction request conceptually targets:

Invoice PDF → total_amount + citations
PO PDF → total_amount + citations
Receipt PDF → total_amount + citations

Each extracted field includes a citation object (page, bbox, snippet, confidence).

Step 3: Normalize before compare (so you don’t create fake mismatches)

Normalization steps that prevent “false exceptions”:

currency normalization (symbols → ISO codes)
decimal rounding policy
date parsing into ISO
whitespace/casing normalization for IDs
unit normalization (“EA” vs “Each”)

Do this in your application layer after extraction.

Step 4: Compare and classify outcomes explicitly

Your compare step should return one of:

match (within tolerance)
mismatch (requires human decision)
insufficient evidence (low confidence / missing field)
suspicious (fraud or policy trigger)

Step 5: For every mismatch, show dual citations

This is the “aha” moment for reviewers.

Instead of:

“Mismatch detected”

Show:

invoice value + citation (page/bbox/snippet)
PO/receipt value + citation (page/bbox/snippet)
the delta (what differs)
a recommended action (accept invoice / accept PO / partial receipt / escalate)

A mismatch object you can store and display:

{
  "mismatch_type": "total_amount",
  "invoice": {
    "value": 4250.00,
    "citation": { "page": 1, "bbox": [300, 245, 420, 270], "snippet": "Total: $4,250.00", "confidence": 0.95 }
  },
  "po": {
    "value": 4100.00,
    "citation": { "page": 2, "bbox": [280, 510, 420, 535], "snippet": "Total Amount: $4,100.00", "confidence": 0.93 }
  },
  "policy": {
    "tolerance": 25.00,
    "decision_required": true
  }
}

This is exactly what citations unlock: fast, defensible, side-by-side verification.

Cross-document reconciliation patterns that deliver real ROI

You can apply the same playbook across industries:

Underwriting: tax return vs bank deposits. Flag income inconsistencies and show both citations.
Diligence: deck metrics vs audited financials. Catch inflated KPIs by comparing values and surfacing citations to both sources.
Contracts: amendment language vs original agreement. Detect term drift and force a reviewer decision with both citations shown.

Use confidence to prioritize (not to hide problems)

Confidence helps you triage mismatches:

if one side is low confidence, route to “insufficient evidence”
if both sides are high confidence but disagree, route to “real mismatch”

CiteLLM provides confidence scores per field and documents how to interpret ranges.

What to measure

Cross-doc reconciliation success shows up as:

% mismatches resolved within SLA
median time-to-resolution per mismatch
reviewer minutes saved vs baseline
$ recovered / prevented (credits, avoided overpayments, prevented underwriting errors)
repeat mismatch rate by vendor/template

Takeaway

The best document AI systems don’t brag about extraction coverage.

They surface contradictions early—and make resolving them faster than arguing about them.

Dual citations turn mismatches into a one-click decision instead of a 20-minute PDF scavenger hunt.

See the API Request Access