Your documents, cleaned and structured in minutes
01
Upload
Upload your messy data — PDFs, CSVs, Word docs, text files
02
PII Scrubbing
All PII is stripped on our server. Names, TFNs, emails replaced with safe tokens
03
AI Cleaning
Safe tokenised data sent to our AI. It cleans, structures, and labels it — never sees real PII
04
Remapping
Results mapped back to real values on your server. Demasking happens locally
05
Clean Dataset
Clean labelled dataset returned to your company, ready to use
06
You Delete Everything
Nothing stored permanently. You control what stays and what goes
See the pipeline in action
Raw Input
"Invoice from John Smith, TFN 123 456 789, email john@acme.com, phone 0412 345 678 for $45,000 dated 12/03/2024"
5 PII entities detected
PII Scrubbing
Tokenised Text → AI
"Invoice from PERSON_001, TFN_001, email EMAIL_001, phone PHONE_001 for $45,000 dated DATE_001"
Structured Output
{
"customer": "PERSON_001"
"tfn": "TFN_001"
"email": "EMAIL_001"
"amount": "$45,000"
"confidence": 0.97
}
De-masked Output
{
"customer": "John Smith"
"tfn": "123 456 789"
"email": "john@acme.com"
"amount": "$45,000"
"confidence": 0.97
}
Tokens swapped back locally — never stored
How verification works
Auto-verified — no human review needed
"...balance outstanding as at DATE_001 per attached statement..."
invoice_batch_03.pdf — page 2
Human corrected — saved as training label
"...as per the terms outlined in the attached schedule..."
contract_draft_v3.docx — page 4
Human corrected — saved as training label