AI and Validibot Part 1 : Your AI Is Guessing. Make Sure It Guessed Right.

AI is getting very good at generating structured data.¹ ¹ How good? One recent benchmark found that even the best models usually got the individual pieces of a JSON extraction right, but only got the entire document fully correct about half the time. The output shape is often fine. The values are the problem. See LLMStructBench (Feb 2026). JSON configs, simulation inputs, compliance documents, you name it. The problem is that "very good" and "correct" are not the same thing. So let's see how we can use Validibot to put a deterministic safety net underneath your AI-generated outputs.

The Trust Gap

You've probably seen the pitch by now. Point an LLM at your spec, tell it what you need, and out comes a JSON file, an XML config, a simulation input deck, or a compliance report. What used to take an afternoon of copy-paste and cross-referencing now takes thirty seconds. Yes, that's impressive. (It actually would have blown our minds just a few years ago. Funny how we get jaded so quickly.)

But there's a catch. LLMs are probabilistic. They don't compute answers, they predict them. That means the output might be structurally valid JSON but contain a value that's physically impossible. It might have the right field names but silently drop a required one. It might produce something that looks exactly right and is subtly, dangerously wrong.

If you're generating a marketing email, that's a proofreading problem. If you're generating an energy model input file, a structural analysis config, or a regulatory submission, it's a liability.

AI is great at producing data, but it can't tell you whether the data it produced is correct. That's not what it's for. You need something else for that. Something deterministic, auditable, and repeatable. You need validation.

Why "Just Check It Manually" Doesn't Scale

The most common response to AI trust issues is to put a human in the loop. Review every output before it goes anywhere. That works, until you're generating fifty files a day across three projects and your reviewer is also trying to do their actual job.

Manual review also has its own reliability problem. Humans get tired. They skim. They develop trust in tools that have been "mostly right" for the last few weeks and stop looking as carefully. The failure mode of manual review isn't that someone catches zero errors. It's that they catch most errors and miss the one that matters.

What you really want is a system that checks every output, every time, against the same rules, with the same rigor, and produces a clear pass/fail result that you can audit later. The boring, reliable kind of checking. The kind that doesn't get tired after lunch.

Validibot to the Rescue

Validibot is a data validation platform. You define validation workflows (chains of checks that run against a submitted file) and Validibot executes them deterministically. Same input, same rules, same result. Every time.

It was originally built for engineering domains where validation really matters: building energy models, simulation files, systems engineering artifacts. But the same architecture works well as a safety net for AI-generated data of any kind.

The idea is simple: your AI generates the data, but Validibot validates it before it goes anywhere. The AI is fast and creative. The validator is methodical and pedantic. Together, they're very useful.

A Concrete Example: Validating AI-Generated JSON

Say you're using an LLM to generate equipment specification files for a facilities management system. Each file is a JSON document describing a piece of HVAC equipment: its type, capacity, a normalized efficiency score, installation date, and maintenance schedule.

Your AI assistant is pretty good at this. You give it a work order or a manufacturer's datasheet and it produces a nice, clean JSON file. But "pretty good" means that maybe one in twenty files has a problem: a missing field, a normalized efficiency score above 1.0, a maintenance interval of zero days, or a capacity value in the wrong units.

Here's how you'd set up Validibot to catch those problems automatically.

Step 1: Schema Validation

Create a JSON Schema Validator in Validibot with a schema that defines what a valid equipment spec looks like. Required fields, allowed types, value formats.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "required": ["equipmentId", "type", "capacity", "normalizedEfficiencyScore",
               "installDate", "maintenanceIntervalDays"],
  "properties": {
    "equipmentId": {
      "type": "string",
      "pattern": "^EQ-[A-Z]{2}-[0-9]{4}$"
    },
    "type": {
      "type": "string",
      "enum": ["AHU", "chiller", "boiler", "RTU", "heatPump", "VRF"]
    },
    "capacity": {
      "type": "object",
      "required": ["value", "unit"],
      "properties": {
        "value": { "type": "number", "exclusiveMinimum": 0 },
        "unit": { "type": "string", "enum": ["kW", "tons", "BTU/hr"] }
      }
    },
    "normalizedEfficiencyScore": {
      "type": "number",
      "minimum": 0,
      "maximum": 1
    },
    "installDate": {
      "type": "string",
      "format": "date"
    },
    "maintenanceIntervalDays": {
      "type": "integer",
      "minimum": 1,
      "maximum": 365
    }
  }
}

This catches the structural problems: missing fields, wrong types, IDs that don't match your naming convention. It's the first line of defense. Fast, cheap, and unambiguous.

Step 2: Business Rule Validation

Schema validation tells you the data is well-formed. It doesn't tell you the data makes sense. For that, add a Basic Validator as the second step in your workflow. Basic Validators use CEL (Common Expression Language) assertions to check cross-field logic and domain constraints.

// A chiller shouldn't be rated below 10 kW
!(type == "chiller" && capacity.unit == "kW")
  || capacity.value >= 10.0

// Heat pumps must have a reasonable normalized efficiency score
!(type == "heatPump") || normalizedEfficiencyScore >= 0.3

// Maintenance interval must be reasonable for the equipment type
!(type == "boiler") || maintenanceIntervalDays <= 90

These are the kinds of rules that a human reviewer would (hopefully) catch. The difference is that CEL assertions catch them every single time, and they produce a clear finding you can audit.

Step 3: The Part No Other Tool Does

So schema checks and business rules are useful, but you can get those from plenty of tools. The reason Validibot exists is what comes next: running actual simulations and complex domain logic as part of the validation pipeline.

Say your HVAC equipment spec includes a heat pump with a claimed COP (coefficient of performance) of 4.2 at an outdoor temperature of -15°C. Is that realistic? A CEL assertion can check that the COP is positive and below some upper bound. But it can't tell you whether 4.2 at -15°C is physically plausible for that specific compressor type and refrigerant. That requires a thermodynamic model.

Validibot's advanced validator types let you plug in exactly that kind of logic. EnergyPlus validators, FMU validators, and Custom Validators all run domain-specific computations as workflow steps. You package your simulation or domain model as a Docker container, register it as a Custom Validator, and Validibot feeds the submitted data into your container, runs the computation, and evaluates the output against CEL assertions you define on the results.

For building energy workflows, that might be an EnergyPlus model or an FMU simulation that takes the equipment parameters and computes whether the claimed performance numbers are consistent with known physics. For other domains, it could be a finite element solver, a pharmacokinetic model, a financial risk simulation, or any other computation that can run in a container.

This is the layer where you catch the errors that static checks never will. An AI might generate a JSON file with perfectly valid structure, reasonable-looking field values, and correct types, but with a combination of parameters that produces nonsensical results when you actually run the numbers. The only way to catch that is to run the numbers. Validibot lets you do that as an automated step in the same workflow, right after the schema and business rule checks.

Step 4: Wire It Into a Workflow

Now chain all three into a single workflow: schema check first, then business rules, then simulation. Each step only runs if the previous one passed. No point running a thermodynamic model on data that failed structural validation.

Your AI generates the equipment spec. Your workflow validates it across all three layers. Your team gets either a green checkmark or a list of exactly what's wrong, and at which layer it went wrong. No guessing, no skimming, no "looks fine to me."

What You Get That You Don't Get From AI Alone

This isn't just "another check." It's a different category of thing.

For one, it's deterministic. The same input always produces the same validation result. There's no temperature parameter, no random seed, no "ask it again and see if the answer changes." Your validation is a function, not a conversation.

It's also auditable. Every validation run in Validibot produces a timestamped record with a SHA-256 evidence hash, so you can verify that results haven't been tampered with after the fact. Six months from now, when someone asks "how do we know this data was correct?", you have an answer that isn't "we checked it, trust me."

And it separates concerns properly. Your AI does what it's good at (generating structured data quickly from messy inputs) and your validator does what it's good at (checking that data against known rules). Neither one is trying to do the other's job. When your requirements change, you update the ruleset and every future validation uses the new rules. The AI doesn't need to "learn" anything. The validator just enforces the new version.

It's Not Just JSON

The equipment spec example uses JSON, but the same layered approach works for XML files (validated with XSD, RelaxNG, or DTD), tabular data, EnergyPlus IDF files, FMUs, and other domain-specific formats. If your AI is generating any kind of structured artifact, the pattern is the same: static checks first, domain rules second, and simulation or heavy computation third if you need it.

Getting Started

If you're already using AI to generate structured data (or thinking about it) and you're wondering how to trust the output, here's what I'd suggest:

Start by writing down the rules that a careful human reviewer would check. What fields are required? What values are physically or logically impossible? What cross-field relationships have to hold? Those rules become your Validibot validators. Then set up a workflow that chains them together: schema first, business rules second, simulation third if you need it.

The whole point is that you don't have to choose between AI speed and validation rigor. You can have both, as long as you don't expect the same tool to do both jobs.

Try Validibot, or get in touch if you want to talk through a specific use case. I'm always interested in hearing about the weird, high-stakes data problems that keep people up at night.

Next: Let the Agent Validate Its Own Work

Everything in this post assumes a human submits the AI-generated file to Validibot. But what if the AI agent could do that itself? Generate the data, submit it for validation, read the issues, fix the problems, and keep going until everything passes, all without a human in the loop.

That's what Part 2 is about. In Give Your AI Agent a Validation Step, we walk through exactly how to wire the Anthropic Agent SDK into Validibot, with working Python code for two integration approaches: the quick-and-dirty Bash tool method and a proper MCP tool for production use.