The synthetic documents that broke every frontier model we tested, with ground-truth labels attached.
HIPAA, SOC2, DPAs. The hard cases you'd most want to regression-test against are the ones you can't legally keep, reuse, or share with a new model vendor.
Every new customer brings template variants, scan quality you haven't seen, field combinations your prompt didn't anticipate. The long tail outpaces any internal labeling effort.
A new frontier model drops and your numbers change. Without a fixed adversarial corpus you can re-run, you can't tell whether the upgrade helped, hurt, or broke specific document types.
"Premium" means three different things across forms. Field paths don't match what the model emits. You need synthetic data built around the ambiguous cases to know whether your pipeline survives them.
Every frontier model we tested hallucinated values on these documents. GPT-5.4 reported a $42M revenue figure as $21.65M.
Complete document packets with ground truth at three levels: document, field, and bounding box. Loss runs, ACORD forms, SOVs, dec pages, broker narratives, and more, each rendered through 82 carrier-specific templates with 56 visual variants sourced from real reference PDFs.
Send us one hard document. We send back up to 20 variants. Same layout and format you sent, with new underlying data and adversarial patterns injected so your model can't memorize what it saw in training. Ground-truth labels attached. For IDP teams that have a known edge case and want a regression suite around it.
Tell us the doc type, format variants, and edge cases that matter most for your pipeline.
Our engine builds the documents or logs with the specific layout problems, format variation, and corruption you asked for.
Same idea as computer vision: we placed the data, so we already know what's in it. Every file comes with ground truth. No annotation step, no SME bottleneck.
At Moveworks I spent years breaking 250+ AI agents serving Fortune 500 companies as the security eng on those rollouts. Before that I spent a decade in security research finding bugs in Apple, Chrome, and Qualcomm. Aginor came from putting those two things together: I know what production data does to agents, and I know how to generate the inputs that break them.
I read every email. Reach me at tim@aginor.ai.
Send us a doc your pipeline struggles with. We send back up to 20 variants with new data and adversarial patterns, ground truth attached.
Send us a doc