Legal, clinical, and public-records workflows

PDF Redaction SDK for true content removal

Detect PII in finished PDFs, route low-confidence findings to review, and permanently remove approved content with an audit trail.

Built for developers shipping redaction workflows in legal, clinical, and regulated document pipelines. Use the cloud API or deploy on-prem when data residency matters.

True redaction, not overlays

Approved findings are permanently removed from the PDF.

Audit trail per finding

Entity type, confidence score, page location, and review outcome.

Published benchmark results

6 common PII categories with precision, recall, and F1.

Cloud or on-prem deployment

Hosted API or documents kept inside your own environment.

Choose your SDK

Detect, review, and permanently remove sensitive content

Run finished PDFs through detection, keep humans in the loop where confidence drops, and ship clean files without sending reviewers back to manual cleanup.

Detect PII with confidence scores

The model works on document structure and context instead of pattern matching alone, so names, addresses, and mixed-format identifiers can be reviewed with confidence data.

Review the edge cases

High-confidence findings can move fast. Lower-confidence findings stay in your review workflow with labeled entities, page locations, and a record of every decision.

Remove approved content from the file

Redaction happens at the PDF level. The underlying content is deleted from the file instead of being visually masked with overlays.

From uploaded PDF to shareable output

Ingest

Feed in PDFs: scanned, digital, or mixed.

OCR

Extract text from scanned pages, images, and non-standard text layers.

Analyze

Semantic engine parses document structure, context, and entity relationships.

Classify

ML engine labels every detected entity with a confidence score.

You Decide

Your logic sets the rules: which labels, what threshold, what action.

Redact

Binary-level removal. Clean at the file level, not cosmetically masked.

Redaction workflows for legal, clinical, and public-records documents

Match the redaction workflow to the document set, reviewer, and compliance pressure you are working under.

Legal & eDiscovery

Automated PII detection for discovery, contract redaction, and FOIA compliance. Confidence-scored findings with audit trails for court.

See the legal solution →

Clinical Trials & Healthcare

CSR redaction for EMA Policy 0070, patient de-identification for HIPAA, and TMF batch processing with recall-optimized detection.

See the clinical trials solution →

Financial Services

Redact PII from loan applications, KYC files, and audit trails. Confidence scoring tuned to your risk tolerance.

Coming soon →

Government & Public Records

FOIA-ready document preparation with automated PII removal. Deployable on-prem for strict data residency requirements.

Coming soon →

Published benchmark results and deployment proof

Before you automate redaction, you need benchmark data, review logs, and deployment controls you can defend.

Detection Performance

By HIPAA entity category, measured on our English-language benchmark dataset:

Category	Recall	Precision	F1 Score
Person	96.28%	97.43%	0.969
Dates of Birth	92.57%	100%	0.961
Account Number / SSN	93.93%	85.27%	0.894
Addresses	91.22%	99.43%	0.951
Phone / Fax Numbers	96.3%	94.12%	0.952
Email Addresses	99.98%	99.58%	0.998

In Production

In our current pilot with a legal services provider, the SDK processes thousands of pages per month with high accuracy on first pass. Manual review time dropped significantly compared to their previous workflow.

Compliance-ready deployment options

Infrastructure and document handling

Own ML infrastructure

No third-party AI providers

Cloud or on-prem deployment

Keep documents in the hosted API or inside your own environment. The redaction workflow stays under your operational controls instead of being routed through external LLM services.

Frameworks teams map this to

HIPAA

GDPR

CCPA

FOIA

21 CFR Part 11

EMA Policy 0070

Permanent removal, audit trails, and deployment control support regulated document workflows. Your review policy, infrastructure, and retention rules determine the final compliance posture.

Current limits for image-only and cross-page content

Image redaction, handwritten signatures, and entities that span page breaks still need extra handling.

Non-text content

The engine processes text. It does not detect or redact faces in photographs, visible handwritten signatures, logos, or other graphical elements.

Languages beyond our current set

We support English, German, Spanish, French, and Italian. Additional languages are on our roadmap — talk to us if your use case requires others.

Entities that span page boundaries

The engine analyzes each page independently. If an entity (such as a name or address) starts on one page and continues on the next, we may miss part of it. This is a known gap for documents with dense, flowing text across page breaks.

Deploy the redaction workflow that fits your environment

Get the SDK plus the deployment help, threshold tuning, and operational support needed to keep the workflow reliable in production.

Consulting

We help you scope the integration — document types, entity categories, confidence thresholds, edge cases specific to your domain.

Implementation

Hands-on support for deployment, whether you're calling our cloud API or installing on-prem in a locked-down environment.

Ongoing Support

SLAs, model updates, new language and entity support as we ship it, and a dedicated account contact for enterprise customers.

Usage-based pricing for cloud and on-prem redaction

Cloud — Pay as you go

$0.20 / page

Detection, redaction, and audit trail included. Requires Pro plan ($199/month). No minimum volume.

Example: 10,000 pages/month = $199 + $2,000 = $2,199/month

On-Prem / Enterprise

Custom

For teams that need data residency, high-volume pricing, custom SLAs, or dedicated support.

Python, Node.js, and Java redaction SDKs

Choose the implementation guide that matches your backend stack. Each page covers detection, review thresholds, and permanent removal.

Python Redaction →Node.js Redaction →Java Redaction →

Python SDK →Node.js / TypeScript SDK →Java SDK →

How-to: Redact PDFs →How-to: Batch Redaction →

Send us a document from your workflow

Bring a discovery file, patient packet, or public-records release. We will show where detection, human review, and permanent removal fit, and tell you plainly whether PDFDancer is the right tool.