Skip to content
Case studiesPricingSecurityCompareBlog

Europe

Americas

Oceania

Guide8 min read

Checklist: Signs a Document Was Generated or Altered by AI

12 concrete signals — metadata, text patterns, visual cues, and cross-checks — to identify AI-generated or AI-altered documents. Practical guide for compliance and KYC teams.

CheckFile Team
CheckFile Team·
Illustration for Checklist: Signs a Document Was Generated or Altered by AI — Guide

Summarize this article with

A document generated or altered by AI carries characteristic traces across four layers: file metadata, text structure, visual consistency, and verifiable data accuracy. This checklist covers 12 of the most reliable signals, ranked by difficulty of falsification, to help compliance, KYC, and credit teams filter suspicious submissions before any lending or onboarding decision is made.

According to the ACFE 2024 Report to the Nations, only 37% of document fraud cases are detected manually, with an average detection lag of 87 days. Generative AI tools have reduced the time required to produce a convincing forged document to minutes — making systematic verification non-negotiable.

Level 1 — File Metadata: 90-Second Verification

Metadata is the first layer to inspect, because most AI generators either omit it or populate it in inconsistent ways.

In a genuine document issued by an official body, metadata reflects the production chain — institutional software, print queue, digital certificate. An AI-generated document typically shows a consumer tool in the Producer or Creator field: ChatGPT PDF Export, Canva, PDFCreator, or a Python library (reportlab, fpdf). This pattern is documented in the ENISA Threat Landscape 2024.

Fields to check systematically:

  • Creator / Producer: must match the expected institutional software (e.g., Microsoft Word for an employment contract, SAP for a large company payslip)
  • CreationDate vs ModDate: a gap of only a few seconds is suspicious; genuine documents have an editing history
  • Author: often blank or filled with a generic identifier in fabricated documents
  • XMP metadata: entirely absent in documents produced by low-end generation tools

For attached photographs (KYC selfies, utility bills with embedded photos): the absence of EXIF data (device model, GPS, timestamp) indicates a digitally generated or heavily cropped image.

Level 2 — Text Anomalies Specific to LLMs

Language models such as GPT-4o or Gemini produce statistically over-uniform text: no typos, no manual corrections, no stylistic variation between paragraphs. This signal is invisible on a first read but becomes clear when analysing multiple fields of the same document.

Signs to look for:

  • Uniform lexical density: a genuine payslip contains sector abbreviations, collective agreement codes, and non-standardised job titles. A fake presents "clean" text without authentic jargon.
  • LLM transition phrases: "It is worth noting that", "Furthermore", "In this regard" — constructions over-represented in generative text compared to genuine official documents.
  • Suspiciously regular reference numbers: contract numbers, invoice numbers, or VAT numbers generated randomly often pass syntax checks but fail control-key validation (UK company numbers: modulo 11; IBAN: modulo 97).
  • Surface-coherent but impossible dates: a contract signed "15 March 2024" referencing a collective agreement version dated 2025.

Users on compliance forums frequently raise the question: "How do I tell if a payslip was made in ChatGPT?" The answer almost always involves checking whether the National Insurance number matches HMRC's format, whether the employer PAYE reference is real, and whether deduction categories align with HMRC tax codes for the stated period. These checks cannot be done by eye.

Level 3 — Visual and Graphical Signals

Image generation tools (Midjourney, DALL·E, Stable Diffusion) and automated layout engines leave characteristic traces.

Under EU AI Act Article 50 (Regulation EU 2024/1689), synthetic content must be labelled — but this obligation does not apply retroactively to documents already in circulation.

Visual checkpoints:

  • Perfect alignment: printed and scanned documents show a slight rotation (0.5°–2°). A digitally generated document is perfectly straight, with no perspective distortion.
  • Inconsistent resolution and compression: high-resolution logos on a form whose body text is blurred, or vice versa.
  • Stamps and signatures: an official stamp shows ink irregularities and slight distortion. An AI-generated stamp is a perfect circle with a perfectly centred typeface.
  • Absence of paper texture: photos of real documents show paper grain, reflections, and drop shadows. AI documents are uniformly flat.
  • Identity document photos: skin too smooth, excessive facial symmetry, hair edges too sharp — hallmarks of a deepfake image. See our article on detecting deepfakes in identity documents.

Ready to automate your checks?

Free pilot with your own documents. Results in 48h.

Request a free pilot

Level 4 — Cross-Data Inconsistencies

A document cannot be verified in isolation. Consistency between internal fields and verifiable external reality is the hardest test for a fraudster to pass.

Essential cross-checks:

  • Company registration number: verify against Companies House — the number must exist, match the named company, and be active at the document date
  • Address: verify against the Royal Mail PAF database — a non-existent postcode or one assigned to a different locality is a strong signal
  • VAT number: verify via the EU VIES system for European counterparties, or HMRC for UK VAT numbers
  • Sort code / Account number: the sort code must correspond to an active UK clearing bank; check against SortCodes.co.uk or the Vocalink database

Summary Table: Signs by Document Type

Document type Priority signal Recommended check
Payslip Correct NI number format, accurate tax code for the period Cross-reference with HMRC PAYE reference
Bank statement Running balance consistency across months Request statements directly from the bank
Supplier invoice Valid company number, active VAT number Companies House + HMRC VAT checker
Identity document Correct typeface, coherent MRZ zone ICAO reference samples
Utility bill Real address, authentic supplier logo Royal Mail PAF + visual logo check
Companies House extract Filing history consistent with document date Direct Companies House lookup
Degree / certificate Verifiable certificate number, correct institution logo Contact issuing institution

Systematic Verification Procedure

Compliance and KYC teams processing large document volumes — particularly under FCA's Money Laundering Regulations 2017 and the obligations derived from the upcoming AMLD6 transposition — cannot apply this full checklist manually to every document.

The recommended procedure follows a three-tier triage model:

  1. Automated filtering (Levels 1 + 4): metadata extraction and control-key validation via API — this step can process hundreds of documents per hour
  2. Assisted forensic analysis (Levels 2 + 3): targeted visual examination of documents flagged by the automated filter
  3. Enhanced human review: for high-risk files, direct verification with the issuing authority

Our platform supports over 3,200 document types across 32 jurisdictions, enabling real-time structural comparison during verification. Visit the AI document detection page to see how this detection layer integrates with your existing controls.

Why Human Detection Alone Is Not Enough

AI tools now generate documents that pass the visual check of the majority of analysts, according to testing reported in NCSC guidance on AI threats to UK businesses. The control key of a company number, the structure of a passport's MRZ zone, the consistency of tax deductions on a payslip — these verifications require algorithmic checks that a human analyst cannot perform in seconds.

The Europol Internet Organised Crime Threat Assessment (IOCTA) 2024 highlights a marked increase in AI-assisted document fraud in the European financial sector, targeting digital onboarding and online lending in particular.

For deeper context on generation techniques, see our analysis of how AI generates fake documents and LLM-based document fraud threats.

Automating these checks via a specialist solution — with solutions adapted to your sector — reduces undetected fraud while maintaining a smooth customer experience. Explore our document verification guide for a full overview of available methods and find out more about CheckFile's pricing.

Frequently Asked Questions

Can an AI detection tool identify all fake documents?

No. AI detection tools perform well on known document types but remain limited against entirely novel formats or documents generated with very recent tools. Multi-layer detection (metadata + structure + cross-checks) remains the most robust method.

Does a valid company number in a document prove its authenticity?

No. A fraudster can copy an existing company number from a real business. The verification must cross-reference the number against the company name, address, and activity in official registries — not just validate the number's format.

Are bank statement PDFs easy to fake with AI?

Yes. LLMs can generate syntactically coherent statements in seconds. Falsification signals include: non-cumulative balances across months, transaction references too short or too long, and absence of SEPA-format bank reference numbers.

What UK regulation governs document verification in the KYC context?

Document verification obligations in KYC derive from the Money Laundering, Terrorist Financing and Transfer of Funds Regulations 2017, FCA guidance, and the forthcoming AMLD6 transposition. Any obliged entity must retain verification evidence for five years after the end of the business relationship.

How should a suspicious AI-generated document be escalated?

In the UK, suspicion of money laundering linked to a fraudulent document must be reported via a Suspicious Activity Report (SAR) to the National Crime Agency. Detection of an AI-generated document can constitute sufficient grounds to trigger this process.

For where this fits in the CheckFile offering, see our AI and deepfake detection approach.

Stay informed

Get our compliance insights and practical guides delivered to your inbox.

Ready to automate your checks?

Free pilot with your own documents. Results in 48h.