LLM and ChatGPT Fake Documents: The New Fraud Threat
ChatGPT and large language models now generate undetectable fake text documents. How fraudsters exploit LLMs, which documents are targeted, and how to protect your organisation.

Summarize this article with
Large language models (LLMs) โ ChatGPT, GPT-4o, Claude, Gemini โ have introduced a fundamentally new document fraud vector: the generation of coherent, grammatically flawless, contextually plausible fake text documents with no image manipulation, no pixel artefacts, and no signs of manual editing. Where previous fraudsters edited images in Photoshop, today's fraudsters dictate payslips, employment contracts, and bank statements to a chatbot. This guide examines the mechanics of LLM-generated document fraud, the documents most at risk, and detection strategies that work.
This article is for informational purposes only. Regulatory requirements evolve โ consult the FCA or a specialist legal adviser for your specific situation.
Why LLMs Represent a Qualitative Shift in Document Fraud
LLMs differ fundamentally from generative image models (GANs, Stable Diffusion). They produce structured text, coherent figures, and professional formatting on demand โ not manipulated pixels. A fraudster can generate a convincing UK payslip for a ยฃ38,000 salary in under two minutes, with no graphic design skills required. The output correctly references PAYE codes, National Insurance contributions at the 2026 rates, and pension deduction conventions consistent with NEST auto-enrolment rules.
According to the ACFE 2024 Report to the Nations, the median number of days before a document fraud scheme is detected is 87 โ a figure that underlines the cost of not catching fake documents at the point of entry. The same report notes that 37% of fraud is still detected by manual review, a proportion that reflects continued human reliance on visual inspection โ precisely where LLM-generated fakes are designed to fool.
The ENISA Threat Landscape 2024 identifies AI-assisted fraud as one of the five principal threats facing European organisations, with specific reference to the rise of synthetic textual content in identity verification and credit underwriting flows.
The FCA's Financial Crime Guide (FCG 3) explicitly requires firms to assess emerging fraud vectors in their annual financial crime risk assessments. LLM-generated documents now meet the threshold for inclusion as a distinct risk category.
How LLMs Compare to Prior Fraud Techniques
| Technique | Attack vector | Traditional detection | 2026 difficulty |
|---|---|---|---|
| Photoshop retouching | Modified pixels | ELA, EXIF metadata | Easy |
| Modified PDF templates | Replaced text fields | PDF analysis, metadata | Moderate |
| GAN / Stable Diffusion | Synthetic images | Visual artefacts, coherence | Hard |
| LLM (ChatGPT, GPT-4o) | Fully generated text | No classic artefacts | Very hard |
Standard OCR checks read the text โ they cannot detect that it was LLM-generated. Metadata checks identify modified PDFs โ not documents created from scratch. This is the gap LLM-based fraud exploits.
Documents Most Targeted by LLM Fraud
Payslips and Proof of Income
The payslip is the document most frequently faked via LLM in mortgage, vehicle finance, and rental applications. An LLM can generate a complete payslip including employer name and address, employer PAYE reference, National Insurance number (formatted correctly), gross/net pay with deductions calculated at 2026 rates, and year-to-date cumulative figures. The resulting document passes basic visual inspection and OCR extraction.
Platforms on the darker corners of the internet openly advertise "AI payslip generators" requiring only the target salary, employer name, and pay period. Generation takes seconds; the resulting PDF is indistinguishable from a genuine payslip to a human reviewer.
Employment Contracts and Offer Letters
LLMs generate complete employment contracts โ including probationary clauses, confidentiality provisions, and salary structures consistent with the advertised sector. Criminals use these to support payslip fraud when lenders request corroborating evidence.
Bank Statements
Bank statements produced entirely by LLM are more complex to make coherent (requiring realistic transaction histories), but the more common attack combines a legitimate PDF template with LLM-generated transaction data. The structural metadata of the template remains genuine; only the content has been replaced.
Reference Letters and Professional Credentials
Employment reference letters, professional accreditation certificates, and university degree confirmations are frequently LLM-generated in recruitment fraud. These documents contain no verifiable digital signature and are structurally simple to produce.
How to Detect LLM-Generated Documents
Linguistic and Textual Coherence Analysis
LLM-generated text exhibits statistical properties absent from authentic human documents:
- Uniform perplexity: LLMs produce text at low entropy, lacking the register variations and stylistic imperfections of genuine HR documents
- Absence of typographic variation: real documents contain non-breaking spaces, smart quotes, ligatures, and inconsistent formatting โ hallmarks of real word-processing software that LLMs do not replicate systematically
- Excessive numerical precision: LLM-generated payslips often show suspiciously round figures without the rounding artefacts typical of real payroll software
Cross-Document Consistency Validation
The highest-value detection occurs through cross-document validation: an LLM-generated payslip may reference an employer whose registered address does not match Companies House records, or a PAYE reference number whose format is valid but not registered with HMRC. These signals are invisible when examining each document in isolation โ they require systemic validation against third-party data.
CheckFile deploys an additional layer of AI-generation signals as a complement to existing structural checks, calibrated to the client's sector risk level. This methodology combines forensic analysis of individual documents with cross-validation against third-party registries (Companies House, HMRC PAYE, credit reference agency data).
For broader context on AI-powered fraud detection techniques, see our guide on AI document fraud detection techniques and our article on how AI generates fake documents.
Specific Forensic Signals for LLM-Generated Documents
| Signal | Description | Detection method |
|---|---|---|
| Semantic repetition | Near-identical phrasing across multiple submitted documents | Vector similarity analysis |
| Suspiciously round figures | Salaries to the nearest hundred, perfectly round deductions | Statistical decimal pattern check |
| Unverifiable employer references | Employer name and address exist but do not appear in Companies House | Companies House API |
| HMRC PAYE reference mismatch | Reference format valid but not verifiable | HMRC verification services |
| Abnormal font uniformity | Single-font documents lacking the mixed-font signatures of real payroll software | Font metadata extraction |
Ready to automate your checks?
Free pilot with your own documents. Results in 48h.
Request a free pilotRegulatory Framework: What UK Regulators Require
FCA and Anti-Money Laundering Obligations
The Money Laundering, Terrorist Financing and Transfer of Funds (Information on the Payer) Regulations 2017 (MLR 2017) require all regulated firms to verify customer identity on the basis of reliable, independent source documents. The FCA's Dear CEO letter of March 2025 on operational resilience specifically called out AI-generated content as an emerging vulnerability in financial crime controls.
As of January 2026, the FCA's Consumer Duty indirectly reinforces document verification standards by requiring firms to demonstrate that their onboarding processes do not create foreseeable harm โ including harm resulting from accepting fraudulent applications that compete with legitimate customers.
HMRC and Employment Document Verification
For mortgage lenders and credit providers, HMRC's Employment Income Manual provides reference data for validating payslip formats and PAYE calculations. Lenders using HMRC's income verification service (available via the Digital Check Service) can validate self-reported income directly against tax records โ bypassing the need to trust the payslip document at all.
The National Crime Agency (NCA) Suspicious Activity Reports (SARs) Annual Report 2024 noted a 34% increase in SARs related to identity fraud, with a significant proportion linked to supporting document fabrication.
ICO and Data Handling in Fraud Detection
Using AI-based detection tools that process personal data must comply with the UK GDPR and the Data Protection Act 2018. The ICO's guidance on AI and data protection requires firms to conduct a Data Protection Impact Assessment (DPIA) before deploying AI systems for fraud risk scoring that may affect individuals.
Building an Effective Defence
Step 1: Map All Document Submission Channels
Every channel through which documents enter your organisation is a potential LLM fraud vector: client-facing portal, email, API partner integrations, physical scan workflows. Digital portals โ which allow copy-pasting of text โ carry higher LLM fraud risk than physical document scans.
Step 2: Implement Systematic Cross-Validation
Validating payslip employer references against Companies House and HMRC data catches the majority of LLM fakes, because language models cannot access real-time UK public registries. A company name and address that does not appear in Companies House records is a high-confidence fraud signal.
Step 3: Add an LLM Signal Detection Layer
AI-generated text detection tools (perplexity analysis, burstiness scoring, stylistic fingerprinting) applied to textual documents score the LLM risk of each submission. This layer does not replace classical controls โ it complements them.
Explore how CheckFile integrates these controls into your verification workflow to identify AI-generation signals without slowing down the user experience. Our security and compliance page details the control architecture applied to documents submitted in real time.
Step 4: Train KYC and Credit Teams
Compliance analysts must be trained to recognise textual and visual indicators of LLM-generated documents. Practitioners on professional forums consistently report that typographic perfection has become a red flag โ the reverse of ten years ago. Round-number salaries, flawless grammar, and absence of any formatting quirk are now suspicious, not reassuring.
What Practitioners Report
Compliance professionals discussing this issue on forums dedicated to financial crime raise two recurrent questions:
"How do I distinguish a Word document converted to PDF from an LLM-generated document?" The technical answer: PDF metadata analysis (ProductID, CreationDate, ModificationDate) and font fingerprinting often reveal the authoring tool. A Word-to-PDF conversion retains Microsoft Office traces; an LLM-generated document formatted by Python code leaves a distinctly different metadata signature.
"Do LLMs make predictable mistakes?" Yes โ LLMs produce figures that are coherent in appearance but statistically improbable (excessive round-number salaries, absence of year-on-year cumulative variation). They also generate standardised HR phrasing absent from real SME payslips.
For a broader framework, see our complete document verification guide and our dedicated AI and deepfake document detection landing page.
Frequently Asked Questions
Can LLMs really generate convincing payslips?
Yes. Current LLMs (GPT-4o, Claude 3.5, Gemini 1.5 Pro) produce syntactically correct payslips with National Insurance and PAYE calculations at the correct 2026 rates. PDF formatting can then be applied via code, making the document visually indistinguishable from a genuine payslip to a human reviewer. Detection requires forensic analysis and cross-validation of employer data against Companies House and HMRC records.
What is the difference between LLM fraud and visual deepfakes?
Visual deepfakes manipulate images (GANs, Stable Diffusion) and leave artefacts detectable by Error Level Analysis (ELA) or pixel coherence checks. LLM fakes are entirely textual โ no visual artefacts, no image manipulation. Detection requires linguistic analysis and semantic coherence validation rather than visual forensics.
Are traditional OCR checks sufficient?
No. OCR reads textual content but cannot detect the origin of that text. An LLM-generated payslip passes all OCR checks because its textual content is syntactically correct. Detection requires complementary analysis: linguistic perplexity scoring, figure coherence, and third-party registry validation.
What are my regulatory obligations if I detect a fake document?
Regulated firms under the MLR 2017 must submit a Suspicious Activity Report (SAR) to the National Crime Agency if they know or suspect that a customer has committed fraud or money laundering. This includes document fraud discovered during the onboarding process. Failure to report is itself a criminal offence under the Proceeds of Crime Act 2002.
Does accepting an LLM-generated fake expose my organisation to liability?
For FCA-regulated lenders, accepting a fraudulent document in a credit application may constitute a breach of MLR 2017 Know Your Customer obligations, exposing the firm to FCA enforcement action. For landlords and non-regulated entities, liability is lower but the direct financial harm โ unpaid rent, costly eviction proceedings โ is immediate and significant.
For where this fits in the CheckFile offering, see our AI and deepfake detection approach.
Stay informed
Get our compliance insights and practical guides delivered to your inbox.