For RCM engineering and product teams
Test data for claims extraction pipelines. Real claims can't become test corpora under HIPAA. These documents are generated, never derived from customer data, with deterministic labels for every field.
No forms. Send a doc to tim@aginor.ai, get up to 20 labeled variants back in about 48 hours.
Every hard EOB in production is PHI. You can't keep it, can't share it with a model vendor, can't check it into an eval harness. The documents you most need to test against are the ones compliance takes off the table.
An EOB where line items don't sum to the paid amount is the failure your clean test set never catches. Payers produce documents like that every day. Test data has to include them on purpose.
Every payer formats EOBs and remits differently, and new variants keep coming. The long tail outpaces any internal labeling effort, and clinician labeling time is the most expensive kind.
A new model drops and your extraction numbers move. Without a fixed labeled corpus you can re-run, you can't tell what got better and what silently broke on specific payer formats.
An EOB where line items don't sum to the paid amount. The failure your clean test set never catches.
A hard EOB, CMS-1500, UB-04, or remit. A de-identified sample or a representative template, whatever your compliance team allows. PDF, XLSX, CSV, scans.
Up to 20 variants: same layout and format, entirely new synthetic data, difficulty dialed where you need it. No LLM anywhere in the generation path. Typical turnaround is 48 hours.
Deterministic ground truth ships with every variant. No annotation queue, no clinician labeling bottleneck. The suite re-runs unchanged on every model upgrade.
Healthcare runs on the generation engine we built for insurance: config-driven fuzzing with deterministic, correct-by-construction ground truth. On the insurance corpus, every frontier model we tested fabricated under 1% of numeric values on clean documents and more than 6% on the hardest tiers. The OpenAI flagships passed 17%. GPT-5.4 read a $42.0M revenue line and reported $21.65M. Insurance is where the prebuilt template library is deepest today; healthcare runs clone-and-variants, building around the documents you send.
Why not have an LLM draft synthetic EOBs? Because hard cases have to stay internally consistent: line items that sum, values that agree across the claim, labels that are right every time. LLM generation can't guarantee any of that. Correct-by-construction generation can.
Email one hard document, de-identified or representative. You get up to 20 labeled variants back, same layout, new synthetic data, ground truth attached, in about 48 hours.
Email one hard documentPrefer to write your own email? tim@aginor.ai works.