Skip to content
Case studiesPricingSecurityCompareBlog

Europe

Americas

Oceania

Guide12 min read

Document Validation API: Developer Guide

Integrate document validation into your application: REST API, webhooks, code examples in Python and Node.js.

CheckFile Team
CheckFile Teamยท
Illustration for Document Validation API: Developer Guide โ€” Guide

Summarize this article with

This guide covers everything you need to integrate automated document validation into your application -- from authentication to webhook handling. Whether you are building a client onboarding flow, a compliance pipeline, or a back-office automation tool, the CheckFile API gives you programmatic access to the same AI-powered validation engine used in the platform. You will find architecture decisions, endpoint references, code samples in Python and Node.js, webhook payloads, error handling strategies, and integration patterns that scale from prototype to production.

This article is for informational purposes only and does not constitute legal, financial, or regulatory advice. Regulatory references are accurate as of the publication date. Consult a qualified professional for guidance specific to your situation.

Architecture Overview

The CheckFile API follows a standard async processing model: documents upload via REST, queue for AI analysis, and results deliver through polling or webhook callbacks. Median processing time for a standard 8-12 document dossier is 12 seconds; P95 is 28 seconds. Our platform processes over 180,000 documents per month with 98.7% OCR accuracy and 99.97% availability. This decoupled architecture processes documents at scale without blocking your application.

The EU AI Act (Regulation 2024/1689, Art. 13) requires high-risk AI systems used in document processing for financial applications to provide transparent, traceable outputs -- a requirement that the CheckFile API satisfies through its deterministic rule engine layer, which produces auditable decision traces for every validation. In Australia, APRA's CPS 234 imposes similar information security and auditability requirements on regulated entities.

                                    +-------------------+
                                    |  Results API      |
                                    |  GET /files/{id}  |
                                    +--------^----------+
                                             |
Client App                                   | Poll or fetch
    |                                        |
    |  POST /v1/files                +-------+--------+
    +------------------------------->| Upload API     |
    |                                +-------+--------+
    |                                        |
    |                                        v
    |                               +--------+--------+
    |                               | Processing Queue|
    |                               | (AI validation) |
    |                               +--------+--------+
    |                                        |
    |         Webhook callback               |
    |<---------------------------------------+
    |         POST your-endpoint

Three key design decisions shape this architecture:

  1. Asynchronous by default. Document validation involves OCR, fraud detection, cross-referencing, and rule evaluation. These operations take 2-15 seconds depending on document complexity. The API accepts uploads immediately and processes them in the background.

  2. Dual delivery. You can poll the status endpoint or register a webhook. Polling works for simple integrations; webhooks are the recommended approach for production systems handling more than a few documents per minute.

  3. Idempotent uploads. Each upload returns a unique file_id. Re-uploading the same document with the same idempotency key returns the existing result instead of reprocessing, saving both time and API credits.

Authentication and Security

All API requests require authentication. The API supports two authentication methods depending on your use case.

API Key Authentication

For server-to-server integrations, pass your API key in the X-API-Key header:

curl -H "X-API-Key: ck_live_abc123..." \
     https://api.checkfile.ai/v1/files

API keys are scoped to your organisation. You can generate multiple keys in the dashboard -- one per environment (development, staging, production) is the recommended practice. Keys prefixed with ck_test_ hit the sandbox environment; keys prefixed with ck_live_ hit production.

OAuth 2.0 for User-Scoped Access

If your application acts on behalf of end users (e.g., a multi-tenant SaaS), use OAuth 2.0 with the authorisation code flow. This provides user-level audit trails and granular permission scoping.

POST /oauth/token
Content-Type: application/x-www-form-urlencoded

grant_type=authorization_code
&code=AUTH_CODE
&client_id=YOUR_CLIENT_ID
&client_secret=YOUR_CLIENT_SECRET
&redirect_uri=https://yourapp.com/callback

Access tokens expire after 1 hour. Use the refresh token to obtain new access tokens without re-authenticating.

Rate Limits

Rate limits are enforced per API key, measured in requests per minute:

Plan Rate Limit Burst Allowance Concurrent Uploads
Starter 100 req/min 150 req/min (30s window) 5
Business 500 req/min 750 req/min (30s window) 25
Enterprise Unlimited Unlimited Unlimited

When you exceed the rate limit, the API returns 429 Too Many Requests with a Retry-After header indicating how many seconds to wait. See pricing for plan details.

Transport Security

All traffic is encrypted with TLS 1.3. The API rejects connections using TLS 1.2 or earlier. Certificate pinning is available for Enterprise customers. All uploaded documents are encrypted at rest using AES-256 and automatically purged after the retention period configured in your account settings.

The Privacy Act 1988 (Cth) and Australian Privacy Principles require that document processing systems implement appropriate technical measures including encryption of personal information in transit and at rest (OAIC guidance). For APRA-regulated entities, CPS 234 imposes additional information security requirements on systems processing financial documents. The Australian Cyber Security Centre (ACSC) provides the Information Security Manual (ISM) as a complementary framework for transport and storage encryption. Read more about our security practices.

Core Endpoints

The API is organised around six primary endpoints:

Method Endpoint Description
POST /api/v1/files Upload a single document for validation
POST /api/v1/files/batch Upload multiple files as a dossier
GET /api/v1/files/{id}/status Check processing status of an upload
GET /api/v1/files/{id}/results Retrieve validation results
POST /api/v1/rules Configure custom business rules
GET /api/v1/webhooks List registered webhook endpoints

All endpoints accept and return JSON (except file upload endpoints, which accept multipart/form-data). The base URL is https://api.checkfile.ai. API versioning is path-based; the current stable version is v1.

Ready to automate your checks?

Free pilot with your own documents. Results in 48h.

Request a free pilot

Upload and Validate -- Step by Step

The most common workflow is: upload documents, wait for processing, retrieve results. Here is the complete flow in both Python and Node.js.

Python (requests)

import requests
import time

API_KEY = "ck_live_abc123..."
BASE_URL = "https://api.checkfile.ai/v1"
HEADERS = {"X-API-Key": API_KEY}

# Step 1: Upload a batch of documents as a dossier
response = requests.post(
    f"{BASE_URL}/files/batch",
    headers=HEADERS,
    files=[
        ("files", ("contract.pdf", open("contract.pdf", "rb"), "application/pdf")),
        ("files", ("agreement.pdf", open("agreement.pdf", "rb"), "application/pdf")),
        ("files", ("asic_extract.pdf", open("asic_extract.pdf", "rb"), "application/pdf")),
    ],
    data={"rule_set": "equipment-leasing"}
)
response.raise_for_status()
file_id = response.json()["id"]
print(f"Dossier uploaded: {file_id}")

# Step 2: Poll for completion
while True:
    status_resp = requests.get(
        f"{BASE_URL}/files/{file_id}/status",
        headers=HEADERS
    )
    status = status_resp.json()["status"]
    if status in ("completed", "failed"):
        break
    time.sleep(2)  # Poll every 2 seconds

# Step 3: Retrieve results
results = requests.get(
    f"{BASE_URL}/files/{file_id}/results",
    headers=HEADERS
).json()

for doc in results["documents"]:
    print(f"{doc['filename']}: {doc['verdict']} (confidence: {doc['confidence']})")
    if doc["alerts"]:
        for alert in doc["alerts"]:
            print(f"  - {alert['severity']}: {alert['message']}")

Node.js (fetch + FormData)

import fs from 'node:fs';

const API_KEY = process.env.CHECKFILE_API_KEY;
const BASE_URL = 'https://api.checkfile.ai/v1';

// Step 1: Upload a batch of documents
const form = new FormData();
form.append('files', new Blob([fs.readFileSync('contract.pdf')]), 'contract.pdf');
form.append('files', new Blob([fs.readFileSync('agreement.pdf')]), 'agreement.pdf');
form.append('files', new Blob([fs.readFileSync('asic_extract.pdf')]), 'asic_extract.pdf');
form.append('rule_set', 'equipment-leasing');

const uploadRes = await fetch(`${BASE_URL}/files/batch`, {
  method: 'POST',
  headers: { 'X-API-Key': API_KEY },
  body: form,
});
const { id: fileId } = await uploadRes.json();
console.log(`Dossier uploaded: ${fileId}`);

// Step 2: Poll for completion
let status = 'processing';
while (status !== 'completed' && status !== 'failed') {
  await new Promise((r) => setTimeout(r, 2000));
  const statusRes = await fetch(`${BASE_URL}/files/${fileId}/status`, {
    headers: { 'X-API-Key': API_KEY },
  });
  ({ status } = await statusRes.json());
}

// Step 3: Retrieve results
const results = await fetch(`${BASE_URL}/files/${fileId}/results`, {
  headers: { 'X-API-Key': API_KEY },
}).then((r) => r.json());

for (const doc of results.documents) {
  console.log(`${doc.filename}: ${doc.verdict} (confidence: ${doc.confidence})`);
  for (const alert of doc.alerts) {
    console.log(`  - ${alert.severity}: ${alert.message}`);
  }
}

Both examples follow the same three-step pattern: upload, poll, retrieve. For production systems, replace the polling loop with a webhook listener (covered in the next section).

Webhook Payloads

Webhooks eliminate the need for polling. Register a webhook URL in the dashboard or via the API, and CheckFile will POST a signed JSON payload to your endpoint when processing completes or an alert is detected.

Validation Complete Event

{
  "event": "validation.completed",
  "timestamp": "2026-02-09T14:32:08Z",
  "data": {
    "file_id": "dossier_8f3a2b1c",
    "rule_set": "equipment-leasing",
    "verdict": "approved",
    "processing_time_ms": 4280,
    "documents": [
      {
        "filename": "contract.pdf",
        "type": "contract",
        "verdict": "valid",
        "confidence": 0.97,
        "alerts": []
      },
      {
        "filename": "agreement.pdf",
        "type": "agreement",
        "verdict": "valid",
        "confidence": 0.95,
        "alerts": []
      },
      {
        "filename": "asic_extract.pdf",
        "type": "company_registration",
        "verdict": "valid",
        "confidence": 0.99,
        "alerts": []
      }
    ]
  }
}

Verifying Webhook Signatures

Every webhook request includes an X-Checkfile-Signature header containing an HMAC-SHA256 signature. Verify it against your webhook secret to ensure the payload was not tampered with:

import hmac
import hashlib

def verify_signature(payload: bytes, signature: str, secret: str) -> bool:
    expected = hmac.new(
        secret.encode(),
        payload,
        hashlib.sha256
    ).hexdigest()
    return hmac.compare_digest(f"sha256={expected}", signature)

Error Handling Best Practices

Standard HTTP status codes map to distinct error classes.

Code Meaning Cause Recommended Action
400 Bad Request Malformed request body, unsupported file type, missing required field Fix the request. Check the error.details field for specifics.
401 Unauthorised Invalid or missing API key Verify your API key. Check for whitespace or truncation.
413 Payload Too Large File exceeds the 50 MB per-document limit Compress the file or split multi-page documents.
429 Too Many Requests Rate limit exceeded Back off using the Retry-After header value.
500 Internal Server Error Unexpected server-side failure Retry with exponential backoff. If persistent, contact support.

Retry Strategy with Exponential Backoff

For transient errors (429, 500, 502, 503), implement exponential backoff with jitter:

import time
import random
import requests

def api_request_with_retry(method, url, max_retries=5, **kwargs):
    for attempt in range(max_retries):
        response = requests.request(method, url, **kwargs)

        if response.status_code < 400:
            return response

        if response.status_code in (429, 500, 502, 503):
            base_delay = min(2 ** attempt, 60)  # Cap at 60 seconds
            jitter = random.uniform(0, base_delay * 0.5)
            delay = base_delay + jitter

            if response.status_code == 429:
                retry_after = response.headers.get("Retry-After")
                if retry_after:
                    delay = float(retry_after)

            time.sleep(delay)
            continue

        # Non-retryable error
        response.raise_for_status()

    raise Exception(f"Max retries exceeded for {url}")

Key principles: never retry 400-level client errors (except 429), always respect the Retry-After header, and add jitter to avoid thundering herd problems when multiple clients retry simultaneously.

Performance Benchmarks

Processing times depend on document count, complexity, and the rule set applied:

Scenario Document Count Rule Set Median Processing Time P95 Processing Time
Single identity document 1 Default 2.1s 4.8s
Single contract (multi-page) 1 Default 3.4s 6.2s
Standard dossier 8-12 Equipment leasing 12s 28s
Complex dossier 15-20 Full compliance 22s 45s
Batch (100 dossiers) 800-1,200 Equipment leasing 8 min 14 min

Get Started

The fastest path from zero to a working integration:

  1. Create an account and generate a test API key (ck_test_ prefix) from the dashboard.
  2. Upload a test document using the curl example above or the code samples in this guide.
  3. Register a webhook to receive results asynchronously.
  4. Configure a rule set that matches your business requirements.
  5. Switch to production by replacing your test key with a live key (ck_live_ prefix).

For Australian organisations, the API supports verification of documents against the Australian Business Register (ABR) and ASIC company records, enabling automated ABN/ACN cross-referencing as part of the validation workflow.

Full endpoint documentation, SDKs (Python, Node.js, Go), and an interactive API explorer are available at docs.checkfile.ai. If you have questions about which plan fits your volume, see pricing or contact the engineering team directly.

For a comprehensive overview, see our document verification complete guide.

Frequently Asked Questions

How does the CheckFile API handle authentication for server-to-server integrations?

For server-to-server integrations, pass your API key in the X-API-Key header with every request. API keys are scoped to your organisation and should be generated separately for each environment: development keys carry a ck_test_ prefix and hit the sandbox, while production keys carry a ck_live_ prefix. Rotate keys every 90 days using the dual-key support to avoid downtime during rotation. Never store API keys in source code or in a committed .env file -- use a secrets manager such as HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault.

What is the difference between polling and webhooks for retrieving validation results?

Polling means your application repeatedly queries the status endpoint until processing completes, which introduces unnecessary latency and API request overhead. Webhooks invert the flow: CheckFile posts a signed JSON payload to your registered endpoint as soon as processing finishes, eliminating polling entirely. For production systems processing more than a few documents per minute, webhooks are the recommended approach.

How do I verify that a webhook payload has not been tampered with?

Every webhook request includes an X-Checkfile-Signature header containing an HMAC-SHA256 signature computed using your webhook secret. To verify the payload, compute the expected signature by running HMAC-SHA256 on the raw request body with your secret, then use a constant-time comparison function to check it against the header value. Never compare signatures with a standard equality operator, as that is vulnerable to timing attacks.

What file size limits apply and how can I optimise upload performance?

Individual documents are accepted up to 50 MB per file. For latency-sensitive applications, reducing PDF file sizes through compression before upload significantly improves throughput without affecting validation accuracy. For large batches, the batch endpoint accepts up to 20 files per request and delivers 3 to 4 times better performance than equivalent individual uploads.

What retry strategy should I implement for transient API errors?

Implement exponential backoff with jitter for transient errors including 429 rate limit responses, 500 internal server errors, 502 bad gateway, and 503 service unavailable. Cap the base delay at 60 seconds, add a random jitter of up to 50 percent of the base delay to prevent thundering herd problems, and respect the Retry-After header value when it is present in a 429 response.


The information presented in this article is provided for informational purposes only and does not constitute legal advice. Regulatory obligations vary by state and territory and by organisation size. Consult a legal professional for analysis specific to your situation.

Stay informed

Get our compliance insights and practical guides delivered to your inbox.

Ready to automate your checks?

Free pilot with your own documents. Results in 48h.