Document Validation API: Developer Guide
Integrate document validation into your application: REST API, webhooks, code examples in Python and Node.js. Developer guide with authentication and error handling.

Summarize this article with
This guide covers everything you need to integrate automated document validation into your application -- from authentication to webhook handling. Whether you are building a client onboarding flow, a compliance pipeline, or a back-office automation tool, the CheckFile API gives you programmatic access to the same AI-powered validation engine used in the platform. You will find architecture decisions, endpoint references, code samples in Python and Node.js, webhook payloads, error handling strategies, and integration patterns that scale from prototype to production.
Architecture Overview
The CheckFile API follows a standard async processing model: documents upload via REST, queue for AI analysis, and results deliver through polling or webhook callbacks. Median processing time for a standard 8-12 document dossier is 12 seconds; P95 is 28 seconds. This decoupled architecture processes documents at scale without blocking your application.
The EU AI Act (Regulation 2024/1689, Art. 13) requires high-risk AI systems used in document processing for financial applications to provide transparent, traceable outputs -- a requirement that the CheckFile API satisfies through its deterministic rule engine layer, which produces auditable decision traces for every validation (EU AI Act, EUR-Lex).
+-------------------+
| Results API |
| GET /files/{id} |
+--------^----------+
|
Client App | Poll or fetch
| |
| POST /v1/files +-------+--------+
+------------------------------->| Upload API |
| +-------+--------+
| |
| v
| +--------+--------+
| | Processing Queue|
| | (AI validation) |
| +--------+--------+
| |
| Webhook callback |
|<---------------------------------------+
| POST your-endpoint
Three key design decisions shape this architecture:
-
Asynchronous by default. Document validation involves OCR, fraud detection, cross-referencing, and rule evaluation. These operations take 2-15 seconds depending on document complexity. The API accepts uploads immediately and processes them in the background.
-
Dual delivery. You can poll the status endpoint or register a webhook. Polling works for simple integrations; webhooks are the recommended approach for production systems handling more than a few documents per minute.
-
Idempotent uploads. Each upload returns a unique
file_id. Re-uploading the same document with the same idempotency key returns the existing result instead of reprocessing, saving both time and API credits.
Authentication and Security
All API requests require authentication. The API supports two authentication methods depending on your use case.
API Key Authentication
For server-to-server integrations, pass your API key in the X-API-Key header:
curl -H "X-API-Key: ck_live_abc123..." \
https://api.checkfile.ai/v1/files
API keys are scoped to your organization. You can generate multiple keys in the dashboard -- one per environment (development, staging, production) is the recommended practice. Keys prefixed with ck_test_ hit the sandbox environment; keys prefixed with ck_live_ hit production.
OAuth 2.0 for User-Scoped Access
If your application acts on behalf of end users (e.g., a multi-tenant SaaS), use OAuth 2.0 with the authorization code flow. This provides user-level audit trails and granular permission scoping.
POST /oauth/token
Content-Type: application/x-www-form-urlencoded
grant_type=authorization_code
&code=AUTH_CODE
&client_id=YOUR_CLIENT_ID
&client_secret=YOUR_CLIENT_SECRET
&redirect_uri=https://yourapp.com/callback
Access tokens expire after 1 hour. Use the refresh token to obtain new access tokens without re-authenticating.
Rate Limits
Rate limits are enforced per API key, measured in requests per minute:
| Plan | Rate Limit | Burst Allowance | Concurrent Uploads |
|---|---|---|---|
| Starter | 100 req/min | 150 req/min (30s window) | 5 |
| Business | 500 req/min | 750 req/min (30s window) | 25 |
| Enterprise | Unlimited | Unlimited | Unlimited |
When you exceed the rate limit, the API returns 429 Too Many Requests with a Retry-After header indicating how many seconds to wait. See pricing for plan details.
Transport Security
All traffic is encrypted with TLS 1.3. The API rejects connections using TLS 1.2 or earlier. Certificate pinning is available for Enterprise customers. All uploaded documents are encrypted at rest using AES-256 and automatically purged after the retention period configured in your account settings.
GDPR (Regulation 2016/679, Art. 32) requires that document processing systems implement appropriate technical measures including encryption of personal data in transit and at rest -- requirements satisfied by TLS 1.3 transport encryption and AES-256 storage encryption with configurable automatic purging (EUR-Lex GDPR). Read more about our security practices.
Core Endpoints
The API is organized around six primary endpoints:
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/v1/files |
Upload a single document for validation |
POST |
/api/v1/files/batch |
Upload multiple files as a dossier |
GET |
/api/v1/files/{id}/status |
Check processing status of an upload |
GET |
/api/v1/files/{id}/results |
Retrieve validation results |
POST |
/api/v1/rules |
Configure custom business rules |
GET |
/api/v1/webhooks |
List registered webhook endpoints |
All endpoints accept and return JSON (except file upload endpoints, which accept multipart/form-data). The base URL is https://api.checkfile.ai. API versioning is path-based; the current stable version is v1.
Upload and Validate -- Step by Step
The most common workflow is: upload documents, wait for processing, retrieve results. Here is the complete flow in both Python and Node.js.
Python (requests)
import requests
import time
API_KEY = "ck_live_abc123..."
BASE_URL = "https://api.checkfile.ai/v1"
HEADERS = {"X-API-Key": API_KEY}
# Step 1: Upload a batch of documents as a dossier
response = requests.post(
f"{BASE_URL}/files/batch",
headers=HEADERS,
files=[
("files", ("contract.pdf", open("contract.pdf", "rb"), "application/pdf")),
("files", ("agreement.pdf", open("agreement.pdf", "rb"), "application/pdf")),
("files", ("kbis.pdf", open("kbis.pdf", "rb"), "application/pdf")),
],
data={"rule_set": "equipment-leasing"}
)
response.raise_for_status()
file_id = response.json()["id"]
print(f"Dossier uploaded: {file_id}")
# Step 2: Poll for completion
while True:
status_resp = requests.get(
f"{BASE_URL}/files/{file_id}/status",
headers=HEADERS
)
status = status_resp.json()["status"]
if status in ("completed", "failed"):
break
time.sleep(2) # Poll every 2 seconds
# Step 3: Retrieve results
results = requests.get(
f"{BASE_URL}/files/{file_id}/results",
headers=HEADERS
).json()
for doc in results["documents"]:
print(f"{doc['filename']}: {doc['verdict']} (confidence: {doc['confidence']})")
if doc["alerts"]:
for alert in doc["alerts"]:
print(f" - {alert['severity']}: {alert['message']}")
Node.js (fetch + FormData)
import fs from 'node:fs';
const API_KEY = process.env.CHECKFILE_API_KEY;
const BASE_URL = 'https://api.checkfile.ai/v1';
// Step 1: Upload a batch of documents
const form = new FormData();
form.append('files', new Blob([fs.readFileSync('contract.pdf')]), 'contract.pdf');
form.append('files', new Blob([fs.readFileSync('agreement.pdf')]), 'agreement.pdf');
form.append('files', new Blob([fs.readFileSync('kbis.pdf')]), 'kbis.pdf');
form.append('rule_set', 'equipment-leasing');
const uploadRes = await fetch(`${BASE_URL}/files/batch`, {
method: 'POST',
headers: { 'X-API-Key': API_KEY },
body: form,
});
const { id: fileId } = await uploadRes.json();
console.log(`Dossier uploaded: ${fileId}`);
// Step 2: Poll for completion
let status = 'processing';
while (status !== 'completed' && status !== 'failed') {
await new Promise((r) => setTimeout(r, 2000));
const statusRes = await fetch(`${BASE_URL}/files/${fileId}/status`, {
headers: { 'X-API-Key': API_KEY },
});
({ status } = await statusRes.json());
}
// Step 3: Retrieve results
const results = await fetch(`${BASE_URL}/files/${fileId}/results`, {
headers: { 'X-API-Key': API_KEY },
}).then((r) => r.json());
for (const doc of results.documents) {
console.log(`${doc.filename}: ${doc.verdict} (confidence: ${doc.confidence})`);
for (const alert of doc.alerts) {
console.log(` - ${alert.severity}: ${alert.message}`);
}
}
Both examples follow the same three-step pattern: upload, poll, retrieve. For production systems, replace the polling loop with a webhook listener (covered in the next section).
Webhook Payloads
Webhooks eliminate the need for polling. Register a webhook URL in the dashboard or via the API, and CheckFile will POST a signed JSON payload to your endpoint when processing completes or an alert is detected.
Validation Complete Event
{
"event": "validation.completed",
"timestamp": "2026-02-09T14:32:08Z",
"data": {
"file_id": "dossier_8f3a2b1c",
"rule_set": "equipment-leasing",
"verdict": "approved",
"processing_time_ms": 4280,
"documents": [
{
"filename": "contract.pdf",
"type": "contract",
"verdict": "valid",
"confidence": 0.97,
"alerts": []
},
{
"filename": "agreement.pdf",
"type": "agreement",
"verdict": "valid",
"confidence": 0.95,
"alerts": []
},
{
"filename": "kbis.pdf",
"type": "company_registration",
"verdict": "valid",
"confidence": 0.99,
"alerts": []
}
]
}
}
Alert Detected Event
When the AI detects a potential issue during validation, an alert event fires immediately -- before the full validation completes. This lets your application react to high-severity issues in real time.
{
"event": "validation.alert",
"timestamp": "2026-02-09T14:32:06Z",
"data": {
"file_id": "dossier_8f3a2b1c",
"document": {
"filename": "kbis.pdf",
"type": "company_registration"
},
"alert": {
"code": "DOC_EXPIRED",
"severity": "high",
"message": "Company registration certificate expired on 2025-11-30",
"field": "validity_date",
"extracted_value": "2025-11-30",
"expected": "Document must be less than 3 months old"
}
}
}
Verifying Webhook Signatures
Every webhook request includes an X-Checkfile-Signature header containing an HMAC-SHA256 signature. Verify it against your webhook secret to ensure the payload was not tampered with:
import hmac
import hashlib
def verify_signature(payload: bytes, signature: str, secret: str) -> bool:
expected = hmac.new(
secret.encode(),
payload,
hashlib.sha256
).hexdigest()
return hmac.compare_digest(f"sha256={expected}", signature)
Error Handling Best Practices
Standard HTTP status codes map to distinct error classes. The most common production errors are 429 (rate limit), 413 (file size), and 401 (authentication), each with a distinct remediation path.
OWASP API Security Top 10 (2023) identifies rate limiting and authentication failures as the most prevalent API vulnerabilities -- patterns that the CheckFile API mitigates through per-key rate limits with Retry-After headers and HMAC-SHA256 webhook signature verification (OWASP API Security).
| Code | Meaning | Cause | Recommended Action |
|---|---|---|---|
400 |
Bad Request | Malformed request body, unsupported file type, missing required field | Fix the request. Check the error.details field for specifics. |
401 |
Unauthorized | Invalid or missing API key | Verify your API key. Check for whitespace or truncation. |
413 |
Payload Too Large | File exceeds the 50 MB per-document limit | Compress the file or split multi-page documents. |
429 |
Too Many Requests | Rate limit exceeded | Back off using the Retry-After header value. |
500 |
Internal Server Error | Unexpected server-side failure | Retry with exponential backoff. If persistent, contact support. |
Retry Strategy with Exponential Backoff
For transient errors (429, 500, 502, 503), implement exponential backoff with jitter:
import time
import random
import requests
def api_request_with_retry(method, url, max_retries=5, **kwargs):
for attempt in range(max_retries):
response = requests.request(method, url, **kwargs)
if response.status_code < 400:
return response
if response.status_code in (429, 500, 502, 503):
base_delay = min(2 ** attempt, 60) # Cap at 60 seconds
jitter = random.uniform(0, base_delay * 0.5)
delay = base_delay + jitter
if response.status_code == 429:
retry_after = response.headers.get("Retry-After")
if retry_after:
delay = float(retry_after)
time.sleep(delay)
continue
# Non-retryable error
response.raise_for_status()
raise Exception(f"Max retries exceeded for {url}")
Key principles: never retry 400-level client errors (except 429), always respect the Retry-After header, and add jitter to avoid thundering herd problems when multiple clients retry simultaneously.
Integration Patterns
Three patterns cover the majority of integration scenarios. Choose the one that matches your latency and throughput requirements.
Pattern 1: Real-Time Validation (Upload, Poll, Display)
Best for: user-facing flows where the end user waits for results (e.g., onboarding forms).
User submits document
|
v
POST /v1/files --> returns file_id
|
v
GET /v1/files/{id}/status (poll every 2s)
|
v
status == "completed"
|
v
GET /v1/files/{id}/results --> display to user
Typical latency: 3-8 seconds for a single document. Show a progress indicator to the user. If processing exceeds 15 seconds, display a message and offer to notify via email.
Pattern 2: Batch Processing (Upload Batch, Webhook Notification)
Best for: back-office workflows processing large volumes (e.g., nightly imports, bulk onboarding).
System uploads N dossiers
|
v
POST /v1/files/batch (for each dossier)
|
v
Processing queue handles all dossiers
|
v
Webhook fires for each completed dossier
|
v
Your webhook handler updates internal database
This pattern decouples submission from processing entirely. Your system fires and forgets, then reacts to webhook events. For very large batches (1,000+ dossiers), stagger uploads at 50 per minute to stay within rate limits and avoid overwhelming your own webhook handler.
Pattern 3: CRM Integration (Salesforce, HubSpot, Custom)
Best for: teams that manage client dossiers inside a CRM and want validation status to sync automatically.
CRM triggers upload (e.g., deal stage change)
|
v
Middleware calls POST /v1/files/batch
|
v
Webhook fires on completion
|
v
Middleware maps result to CRM fields
|
v
CRM record updated (custom fields: validation_status, alerts)
The middleware layer (a lightweight serverless function or integration platform like Zapier/Make) translates between the CheckFile API and your CRM's data model. Common CRM field mappings:
| CheckFile Field | CRM Field | Example Value |
|---|---|---|
verdict |
Validation Status | Approved / Requires Review / Rejected |
documents[].alerts |
Validation Notes | "ID expired on 2025-11-30" |
processing_time_ms |
Processing Duration | 4280 |
confidence |
Confidence Score | 0.97 |
Performance Benchmarks
Processing times depend on document count, complexity, and the rule set applied. The following benchmarks were measured on production infrastructure under typical load:
| Scenario | Document Count | Rule Set | Median Processing Time | P95 Processing Time |
|---|---|---|---|---|
| Single identity document | 1 | Default | 2.1s | 4.8s |
| Single contract (multi-page) | 1 | Default | 3.4s | 6.2s |
| Standard dossier | 8-12 | Equipment leasing | 12s | 28s |
| Complex dossier | 15-20 | Full compliance | 22s | 45s |
| Batch (100 dossiers) | 800-1,200 | Equipment leasing | 8 min | 14 min |
| Batch (500 dossiers) | 4,000-6,000 | Equipment leasing | 35 min | 52 min |
These times reflect end-to-end processing including OCR, AI analysis, cross-referencing, and rule evaluation. Upload time (network transfer) is not included and depends on your connection speed and file sizes.
For latency-sensitive applications, two optimizations help:
-
Pre-upload compression. Reducing PDF file size from 5 MB to 1 MB cuts upload time without affecting validation accuracy. The API accepts files up to 50 MB, but smaller files move through the pipeline faster.
-
Parallel uploads. The batch endpoint accepts up to 20 files per request. For larger dossiers, split into multiple batch requests and upload in parallel (within your concurrency limit).
Get Started
The fastest path from zero to a working integration:
- Create an account and generate a test API key (
ck_test_prefix) from the dashboard. - Upload a test document using the
curlexample above or the code samples in this guide. - Register a webhook to receive results asynchronously.
- Configure a rule set that matches your business requirements.
- Switch to production by replacing your test key with a live key (
ck_live_prefix).
Full endpoint documentation, SDKs (Python, Node.js, Go), and an interactive API explorer are available at docs.checkfile.ai. If you have questions about which plan fits your volume, see pricing or contact the engineering team directly.
Frequently Asked Questions
How does the CheckFile API handle authentication for server-to-server integrations?
For server-to-server integrations, pass your API key in the X-API-Key header with every request. API keys are scoped to your organization and should be generated separately for each environment: development keys carry a ck_test_ prefix and hit the sandbox, while production keys carry a ck_live_ prefix. Rotate keys every 90 days using the dual-key support to avoid downtime during rotation. Never store API keys in source code or in a committed .env file โ use a secrets manager such as HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault.
What is the difference between polling and webhooks for retrieving validation results?
Polling means your application repeatedly queries the status endpoint until processing completes, which introduces unnecessary latency and API request overhead. Webhooks invert the flow: CheckFile posts a signed JSON payload to your registered endpoint as soon as processing finishes, eliminating polling entirely. For production systems processing more than a few documents per minute, webhooks are the recommended approach. The webhook payload includes the full validation result, per-document verdicts, extracted fields, and rule evaluation details, so a single push notification contains everything needed to update your CRM or ERP.
How do I verify that a webhook payload has not been tampered with?
Every webhook request includes an X-Checkfile-Signature header containing an HMAC-SHA256 signature computed using your webhook secret. To verify the payload, compute the expected signature by running HMAC-SHA256 on the raw request body with your secret, then use a constant-time comparison function to check it against the header value. Never compare signatures with a standard equality operator, as that is vulnerable to timing attacks. The Python and Node.js code examples in this guide demonstrate the correct implementation pattern.
What file size limits apply and how can I optimize upload performance?
Individual documents are accepted up to 50 MB per file. For latency-sensitive applications, reducing PDF file sizes through compression before upload significantly improves throughput without affecting validation accuracy. For large batches, the batch endpoint accepts up to 20 files per request and delivers 3 to 4 times better performance than equivalent individual uploads. For a 500-dossier batch, stagger uploads at 50 per minute to stay within rate limits and avoid overloading your own webhook handler.
What retry strategy should I implement for transient API errors?
Implement exponential backoff with jitter for transient errors including 429 rate limit responses, 500 internal server errors, 502 bad gateway, and 503 service unavailable. Cap the base delay at 60 seconds, add a random jitter of up to 50 percent of the base delay to prevent thundering herd problems, and respect the Retry-After header value when it is present in a 429 response. Never retry 400-level client errors other than 429, as these indicate a problem with the request itself that will not resolve through retrying.
Related reading: For ERP-specific integration patterns including Salesforce and SAP, see our API, webhooks, and ERP integration guide. If you are evaluating whether to build validation in-house or use an API like this one, our build vs buy analysis provides a detailed 3-year cost comparison. For webhook security best practices aligned with OWASP API Security guidelines, always verify HMAC signatures as shown in the examples above.