Guide10 min read

LLM and ChatGPT Fake Documents: The New Fraud Threat

ChatGPT and large language models now generate undetectable fake text documents. How fraudsters exploit LLMs, which documents are targeted, and how to protect your organisation.

CheckFile Team·June 28, 2026

Illustration for LLM and ChatGPT Fake Documents: The New Fraud Threat — Guide

Summarize this article with

Large language models (LLMs) — ChatGPT, GPT-4o, Claude, Gemini — have introduced a fundamentally new document fraud vector: the generation of coherent, grammatically flawless, contextually plausible fake text documents with no image manipulation, no pixel artefacts, and no signs of manual editing. Where previous fraudsters edited images in Photoshop, today's fraudsters dictate payslips, employment contracts, and bank statements to a chatbot. This guide examines the mechanics of LLM-generated document fraud, the documents most at risk, and detection strategies that work.

This article is for informational purposes only. Regulatory requirements evolve — consult the FCA or a specialist legal adviser for your specific situation.

Why LLMs Represent a Qualitative Shift in Document Fraud

LLMs differ fundamentally from generative image models (GANs, Stable Diffusion). They produce structured text, coherent figures, and professional formatting on demand — not manipulated pixels. A fraudster can generate a convincing UK payslip for a £38,000 salary in under two minutes, with no graphic design skills required. The output correctly references PAYE codes, National Insurance contributions at the 2026 rates, and pension deduction conventions consistent with NEST auto-enrolment rules.

According to the ACFE 2024 Report to the Nations, the median number of days before a document fraud scheme is detected is 87 — a figure that underlines the cost of not catching fake documents at the point of entry. The same report notes that 37% of fraud is still detected by manual review, a proportion that reflects continued human reliance on visual inspection — precisely where LLM-generated fakes are designed to fool.

The ENISA Threat Landscape 2024 identifies AI-assisted fraud as one of the five principal threats facing European organisations, with specific reference to the rise of synthetic textual content in identity verification and credit underwriting flows.

The FCA's Financial Crime Guide (FCG 3) explicitly requires firms to assess emerging fraud vectors in their annual financial crime risk assessments. LLM-generated documents now meet the threshold for inclusion as a distinct risk category.

How LLMs Compare to Prior Fraud Techniques

Technique	Attack vector	Traditional detection	2026 difficulty
Photoshop retouching	Modified pixels	ELA, EXIF metadata	Easy
Modified PDF templates	Replaced text fields	PDF analysis, metadata	Moderate
GAN / Stable Diffusion	Synthetic images	Visual artefacts, coherence	Hard
LLM (ChatGPT, GPT-4o)	Fully generated text	No classic artefacts	Very hard

Standard OCR checks read the text — they cannot detect that it was LLM-generated. Metadata checks identify modified PDFs — not documents created from scratch. This is the gap LLM-based fraud exploits.

Documents Most Targeted by LLM Fraud

Payslips and Proof of Income

The payslip is the document most frequently faked via LLM in mortgage, vehicle finance, and rental applications. An LLM can generate a complete payslip including employer name and address, employer PAYE reference, National Insurance number (formatted correctly), gross/net pay with deductions calculated at 2026 rates, and year-to-date cumulative figures. The resulting document passes basic visual inspection and OCR extraction.

Platforms on the darker corners of the internet openly advertise "AI payslip generators" requiring only the target salary, employer name, and pay period. Generation takes seconds; the resulting PDF is indistinguishable from a genuine payslip to a human reviewer.

Employment Contracts and Offer Letters

LLMs generate complete employment contracts — including probationary clauses, confidentiality provisions, and salary structures consistent with the advertised sector. Criminals use these to support payslip fraud when lenders request corroborating evidence.

Bank Statements

Bank statements produced entirely by LLM are more complex to make coherent (requiring realistic transaction histories), but the more common attack combines a legitimate PDF template with LLM-generated transaction data. The structural metadata of the template remains genuine; only the content has been replaced.

Reference Letters and Professional Credentials

Employment reference letters, professional accreditation certificates, and university degree confirmations are frequently LLM-generated in recruitment fraud. These documents contain no verifiable digital signature and are structurally simple to produce.

How to Detect LLM-Generated Documents

Linguistic and Textual Coherence Analysis

LLM-generated text exhibits statistical properties absent from authentic human documents:

Uniform perplexity: LLMs produce text at low entropy, lacking the register variations and stylistic imperfections of genuine HR documents
Absence of typographic variation: real documents contain non-breaking spaces, smart quotes, ligatures, and inconsistent formatting — hallmarks of real word-processing software that LLMs do not replicate systematically
Excessive numerical precision: LLM-generated payslips often show suspiciously round figures without the rounding artefacts typical of real payroll software

Cross-Document Consistency Validation

The highest-value detection occurs through cross-document validation: an LLM-generated payslip may reference an employer whose registered address does not match Companies House records, or a PAYE reference number whose format is valid but not registered with HMRC. These signals are invisible when examining each document in isolation — they require systemic validation against third-party data.

CheckFile deploys an additional layer of AI-generation signals as a complement to existing structural checks, calibrated to the client's sector risk level. This methodology combines forensic analysis of individual documents with cross-validation against third-party registries (Companies House, HMRC PAYE, credit reference agency data).

For broader context on AI-powered fraud detection techniques, see our guide on AI document fraud detection techniques and our article on how AI generates fake documents.

Specific Forensic Signals for LLM-Generated Documents

Signal	Description	Detection method
Semantic repetition	Near-identical phrasing across multiple submitted documents	Vector similarity analysis
Suspiciously round figures	Salaries to the nearest hundred, perfectly round deductions	Statistical decimal pattern check
Unverifiable employer references	Employer name and address exist but do not appear in Companies House	Companies House API
HMRC PAYE reference mismatch	Reference format valid but not verifiable	HMRC verification services
Abnormal font uniformity	Single-font documents lacking the mixed-font signatures of real payroll software	Font metadata extraction

Ready to automate your checks?

Free pilot with your own documents. Results in 48h.

Request a free pilot

Regulatory Framework: What UK Regulators Require

FCA and Anti-Money Laundering Obligations

The Money Laundering, Terrorist Financing and Transfer of Funds (Information on the Payer) Regulations 2017 (MLR 2017) require all regulated firms to verify customer identity on the basis of reliable, independent source documents. The FCA's Dear CEO letter of March 2025 on operational resilience specifically called out AI-generated content as an emerging vulnerability in financial crime controls.

As of January 2026, the FCA's Consumer Duty indirectly reinforces document verification standards by requiring firms to demonstrate that their onboarding processes do not create foreseeable harm — including harm resulting from accepting fraudulent applications that compete with legitimate customers.

HMRC and Employment Document Verification

For mortgage lenders and credit providers, HMRC's Employment Income Manual provides reference data for validating payslip formats and PAYE calculations. Lenders using HMRC's income verification service (available via the Digital Check Service) can validate self-reported income directly against tax records — bypassing the need to trust the payslip document at all.

The National Crime Agency (NCA) Suspicious Activity Reports (SARs) Annual Report 2024 noted a 34% increase in SARs related to identity fraud, with a significant proportion linked to supporting document fabrication.

ICO and Data Handling in Fraud Detection

Using AI-based detection tools that process personal data must comply with the UK GDPR and the Data Protection Act 2018. The ICO's guidance on AI and data protection requires firms to conduct a Data Protection Impact Assessment (DPIA) before deploying AI systems for fraud risk scoring that may affect individuals.

Building an Effective Defence

Step 1: Map All Document Submission Channels

Every channel through which documents enter your organisation is a potential LLM fraud vector: client-facing portal, email, API partner integrations, physical scan workflows. Digital portals — which allow copy-pasting of text — carry higher LLM fraud risk than physical document scans.

Step 2: Implement Systematic Cross-Validation

Validating payslip employer references against Companies House and HMRC data catches the majority of LLM fakes, because language models cannot access real-time UK public registries. A company name and address that does not appear in Companies House records is a high-confidence fraud signal.

Step 3: Add an LLM Signal Detection Layer

AI-generated text detection tools (perplexity analysis, burstiness scoring, stylistic fingerprinting) applied to textual documents score the LLM risk of each submission. This layer does not replace classical controls — it complements them.

Explore how CheckFile integrates these controls into your verification workflow to identify AI-generation signals without slowing down the user experience. Our security and compliance page details the control architecture applied to documents submitted in real time.

Step 4: Train KYC and Credit Teams

Compliance analysts must be trained to recognise textual and visual indicators of LLM-generated documents. Practitioners on professional forums consistently report that typographic perfection has become a red flag — the reverse of ten years ago. Round-number salaries, flawless grammar, and absence of any formatting quirk are now suspicious, not reassuring.

What Practitioners Report

Compliance professionals discussing this issue on forums dedicated to financial crime raise two recurrent questions:

"How do I distinguish a Word document converted to PDF from an LLM-generated document?" The technical answer: PDF metadata analysis (ProductID, CreationDate, ModificationDate) and font fingerprinting often reveal the authoring tool. A Word-to-PDF conversion retains Microsoft Office traces; an LLM-generated document formatted by Python code leaves a distinctly different metadata signature.

"Do LLMs make predictable mistakes?" Yes — LLMs produce figures that are coherent in appearance but statistically improbable (excessive round-number salaries, absence of year-on-year cumulative variation). They also generate standardised HR phrasing absent from real SME payslips.

For a broader framework, see our complete document verification guide and our dedicated AI and deepfake document detection landing page.

Frequently Asked Questions

Can LLMs really generate convincing payslips?

Yes. Current LLMs (GPT-4o, Claude 3.5, Gemini 1.5 Pro) produce syntactically correct payslips with National Insurance and PAYE calculations at the correct 2026 rates. PDF formatting can then be applied via code, making the document visually indistinguishable from a genuine payslip to a human reviewer. Detection requires forensic analysis and cross-validation of employer data against Companies House and HMRC records.

What is the difference between LLM fraud and visual deepfakes?

Visual deepfakes manipulate images (GANs, Stable Diffusion) and leave artefacts detectable by Error Level Analysis (ELA) or pixel coherence checks. LLM fakes are entirely textual — no visual artefacts, no image manipulation. Detection requires linguistic analysis and semantic coherence validation rather than visual forensics.

Are traditional OCR checks sufficient?

No. OCR reads textual content but cannot detect the origin of that text. An LLM-generated payslip passes all OCR checks because its textual content is syntactically correct. Detection requires complementary analysis: linguistic perplexity scoring, figure coherence, and third-party registry validation.

What are my regulatory obligations if I detect a fake document?

Regulated firms under the MLR 2017 must submit a Suspicious Activity Report (SAR) to the National Crime Agency if they know or suspect that a customer has committed fraud or money laundering. This includes document fraud discovered during the onboarding process. Failure to report is itself a criminal offence under the Proceeds of Crime Act 2002.

Does accepting an LLM-generated fake expose my organisation to liability?

For FCA-regulated lenders, accepting a fraudulent document in a credit application may constitute a breach of MLR 2017 Know Your Customer obligations, exposing the firm to FCA enforcement action. For landlords and non-regulated entities, liability is lower but the direct financial harm — unpaid rent, costly eviction proceedings — is immediate and significant.

For where this fits in the CheckFile offering, see our AI and deepfake detection approach.

Stay informed

Get our compliance insights and practical guides delivered to your inbox.

Ready to automate your checks?

Free pilot with your own documents. Results in 48h.

LLM and ChatGPT Fake Documents: The New Fraud Threat

Why LLMs Represent a Qualitative Shift in Document Fraud

How LLMs Compare to Prior Fraud Techniques

Documents Most Targeted by LLM Fraud

Payslips and Proof of Income

Employment Contracts and Offer Letters

Bank Statements

Reference Letters and Professional Credentials

How to Detect LLM-Generated Documents

Linguistic and Textual Coherence Analysis

Cross-Document Consistency Validation

Specific Forensic Signals for LLM-Generated Documents

Regulatory Framework: What UK Regulators Require

FCA and Anti-Money Laundering Obligations

HMRC and Employment Document Verification

ICO and Data Handling in Fraud Detection

Building an Effective Defence

Step 1: Map All Document Submission Channels

Step 2: Implement Systematic Cross-Validation

Step 3: Add an LLM Signal Detection Layer

Step 4: Train KYC and Credit Teams

What Practitioners Report

Frequently Asked Questions

Can LLMs really generate convincing payslips?

What is the difference between LLM fraud and visual deepfakes?

Are traditional OCR checks sufficient?

What are my regulatory obligations if I detect a fake document?

Does accepting an LLM-generated fake expose my organisation to liability?

Stay informed

Ready to automate your checks?

Related articles

How AI Generates Fake Documents — and How to Detect Them

Liveness Detection vs Document Forgery Detection: Key Differences

Error Level Analysis Explained: Spotting Forged Document Images