AI and Validibot Part 2 : Give Your AI Agent a Validation Step

Why This Matters

Part 1 showed how to build a validation workflow. But if a human still has to submit every file and interpret the results, you've just moved the bottleneck. Your agent generates fifty equipment specs from fifty supplier datasheets, and someone has to submit each one, read the results, and chase the AI for fixes. That's not automation. That's babysitting.

The better pattern is to let an AI agent call Validibot itself. Generate the data, submit it for validation, read the issues, fix any problems, and repeat until it passes. By the time a human sees the output, it's already been validated against your exact rules. The agent does the grunt work. The human does the thinking.

This post walks through exactly how to set that up using the Anthropic Agent SDK and the Validibot CLI, with working Python code for two approaches.

What We're Building

Same scenario as Part 1: an agent that reads product specification documents and generates standardized JSON files for a procurement system. Not glamorous, but it's everywhere.

The difference is that now the agent handles the full loop:

Read a product spec document
Extract the relevant fields into a JSON file
Submit that JSON to Validibot for validation
If validation fails, read the issues and fix the problems
Repeat until it passes

The Validibot workflow from Part 1 is already running. This post focuses on the agent side: how to connect the Anthropic Agent SDK to Validibot so the agent can validate its work programmatically.

Prerequisites

You'll need:

A running Validibot instance (self-hosted or Validibot Cloud) with a validation workflow configured
The Validibot CLI installed and configured (see below)
The Anthropic Agent SDK (pip install claude-agent-sdk)
An ANTHROPIC_API_KEY environment variable set

CLI setup, if you haven't already:

pip install validibot-cli
validibot config set-server https://validibot.your-company.com
validibot login

If you're using Validibot Cloud, point set-server at https://app.validibot.com. For workflows scoped to a specific org, you'll also pass --org my-org on the validate command.

The Agent SDK bundles the Claude Code CLI automatically, so you don't need to install that separately.

Approach 1: The Simple Way (Bash Tool)

The fastest way to give an agent access to Validibot is through the built-in Bash tool. The agent can run validibot validate just like you would from a terminal. No custom tool code needed.

import asyncio
from pathlib import Path
from claude_agent_sdk import query, ClaudeAgentOptions, AssistantMessage, TextBlock


SYSTEM_PROMPT = """You are a procurement data agent. Your job is to read product
specification documents and produce standardized JSON files.

After generating a JSON file, you MUST validate it by running:

    validibot validate  -w product-spec-check --json

If validation fails, read the issues in the JSON output. Each issue tells
you which field failed and why. Fix the problems in the JSON file and validate
again. Do not return results to the user until validation passes.

The JSON schema requires these fields:
- productId (string, format: "PRD-XXXX")
- name (string)
- category (one of: "electrical", "mechanical", "chemical", "structural")
- supplier (string)
- unitPrice (object with "value" and "currency")
- specifications (object with domain-specific key-value pairs)
- certifications (array of certification strings)
- leadTimeDays (integer, 1-365)
"""


async def main():
    async for message in query(
        prompt=(
            "Read the file supplier_datasheet.txt in the current directory. "
            "Extract the product specs and produce a validated JSON file."
        ),
        options=ClaudeAgentOptions(
            system_prompt=SYSTEM_PROMPT,
            allowed_tools=["Read", "Write", "Edit", "Bash"],
            cwd=str(Path.cwd()),
        ),
    ):
        if isinstance(message, AssistantMessage):
            for block in message.content:
                if isinstance(block, TextBlock):
                    print(block.text)


asyncio.run(main())

That's it. The agent reads the document, writes a JSON file, shells out to validibot validate, parses the result, and iterates if needed. The --json flag gives the agent structured output it can reason about, not just a pass/fail.

The key is in the system prompt. You're telling the agent that validation is a required step, not optional. And you're telling it how to interpret the issues if validation fails. The agent doesn't need to "know" your business rules. It just needs to know how to call the validator and react to the output.

What the agent sees when validation fails

When the agent runs validibot validate with --json, a failed validation returns something like:

{
  "id": "run_abc123",
  "workflow_slug": "product-spec-check",
  "state": "COMPLETED",
  "result": "FAIL",
  "status": "FAILED",
  "steps": [
    {
      "name": "JSON Schema Check",
      "status": "FAILED",
      "issues": [
        {
          "severity": "ERROR",
          "message": "'leadTimeDays' is a required property",
          "path": "$"
        }
      ]
    }
  ]
}

The agent reads this, sees that leadTimeDays is missing, edits the JSON file to add it, and runs the validation again. No human intervention required. The loop continues until all checks pass or the agent runs out of turns.

Approach 2: Custom MCP Tool

The Bash approach works well, but it has a drawback: the agent has access to a full shell. If you want tighter control (especially in production), you can wrap Validibot as a custom MCP tool using the Agent SDK's built-in MCP server support. This way the agent can only call Validibot, not run arbitrary commands.

import asyncio
import json
import subprocess
from pathlib import Path
from claude_agent_sdk import (
    query,
    tool,
    create_sdk_mcp_server,
    ClaudeAgentOptions,
    AssistantMessage,
    TextBlock,
)


@tool(
    name="validate_file",
    description=(
        "Validate a file against a Validibot workflow. "
        "Returns structured JSON with pass/fail status and issues."
    ),
    input_schema={
        "type": "object",
        "properties": {
            "file_path": {"type": "string", "description": "Path to the file to validate"},
            "workflow": {"type": "string", "description": "Workflow slug or ID"},
        },
        "required": ["file_path", "workflow"],
    },
)
async def validate_file(args: dict) -> dict:
    """Run validibot validate and return the result."""
    file_path = args["file_path"]
    workflow = args["workflow"]

    result = subprocess.run(
        ["validibot", "validate", file_path, "-w", workflow, "--json"],
        capture_output=True,
        text=True,
        timeout=120,
    )

    # Parse the JSON output if possible
    try:
        output = json.loads(result.stdout)
    except json.JSONDecodeError:
        output = {"raw_output": result.stdout, "raw_error": result.stderr}

    return {
        "content": [
            {
                "type": "text",
                "text": json.dumps(
                    {
                        "exit_code": result.returncode,
                        "passed": result.returncode == 0,
                        "result": output,
                    },
                    indent=2,
                ),
            }
        ]
    }


# Create an in-process MCP server with the tool
validibot_server = create_sdk_mcp_server(
    name="validibot",
    version="1.0.0",
    tools=[validate_file],
)


SYSTEM_PROMPT = """You are a procurement data agent. Your job is to read product
specification documents and produce standardized JSON files.

After generating a JSON file, you MUST validate it using the validate_file tool
with workflow="product-spec-check". If validation fails, read the issues, fix
the problems, and validate again. Do not return results to the user until
validation passes."""


async def main():
    async for message in query(
        prompt=(
            "Read the file supplier_datasheet.txt in the current directory. "
            "Extract the product specs and produce a validated JSON file."
        ),
        options=ClaudeAgentOptions(
            system_prompt=SYSTEM_PROMPT,
            allowed_tools=[
                "Read", "Write", "Edit",
                "mcp__validibot__validate_file",
            ],
            mcp_servers={"validibot": validibot_server},
            cwd=str(Path.cwd()),
        ),
    ):
        if isinstance(message, AssistantMessage):
            for block in message.content:
                if isinstance(block, TextBlock):
                    print(block.text)


asyncio.run(main())

The @tool decorator defines a Validibot validation tool with a typed schema. The create_sdk_mcp_server function wraps it as an in-process MCP server, so there's no separate MCP transport process to manage. (The tool still shells out to the Validibot CLI under the hood, but the agent calls it like any other tool.)

Notice the allowed_tools list. The agent can read, write, and edit files, and it can call mcp__validibot__validate_file. That's it. No shell access, no web browsing, no other tools. In production, this kind of scoping matters.

Which Approach Should You Use?

Use the Bash approach if you're prototyping, building internal tools, or running the agent in a controlled environment where shell access isn't a concern. It's simpler, there's less code, and you get it working in five minutes.

Use the MCP tool approach if you're building something for production, running the agent on behalf of other users, or you want to limit exactly what the agent can do. The tool boundary gives you explicit control over what arguments the agent can pass and what actions it can take.

Both approaches produce the same result: the agent generates data, validates it against your Validibot workflow, and iterates until it passes.

The Validate-Fix Loop in Practice

The real power here isn't just "agent calls validator." It's the feedback loop. When Validibot returns structured issues, the agent gets specific, actionable information about what's wrong. Not "validation failed" but "the field leadTimeDays is missing" or "the value of unitPrice.value must be greater than 0."

This is fundamentally different from asking the AI to self-review. If you ask an LLM "is this JSON correct?", it will do its best to check, but it's using the same probabilistic reasoning that generated the errors in the first place. Validibot applies deterministic rules. It doesn't guess. It doesn't get tired. It either passes or it doesn't.

And as we covered in Part 1, Validibot workflows can include steps that run actual simulations and domain-specific computations (EnergyPlus models, FMU co-simulations, custom Docker containers with whatever libraries your domain requires). The agent doesn't need those tools installed locally. It just calls the validator and gets back a pass/fail with detailed issues.

The agent reads the issue, understands the constraint, and makes the correction. Occasionally it takes three passes for trickier cross-field issues. But the point is that by the time the output reaches a human, it's already been validated against your exact rules.

Going Further

A few ideas for extending this pattern.

Your agent can call different Validibot workflows depending on the type of data it's generating. One workflow for product specs, another for compliance documents, another for simulation inputs. The agent picks the right one based on context.

The Agent SDK supports hooks that run after tool calls, which are useful if you want an agent-side audit trail as well. Validibot already records the submission, workflow, result, and timestamps for each validation run. A hook gives you a lightweight log of when the agent called the validator and what came back. Something like:

from datetime import datetime
from claude_agent_sdk import HookMatcher


async def log_validation(input_data, tool_use_id, context):
    """PostToolUse hook for audit logging."""
    with open("./validation_audit.log", "a") as f:
        f.write(
            f"{datetime.now().isoformat()} | "
            f"tool={input_data['tool_name']} | "
            f"input={input_data['tool_input']} | "
            f"result={input_data['tool_response']}\n"
        )
    return {"continue_": True}


# Add to your ClaudeAgentOptions:
# hooks={
#     "PostToolUse": [
#         HookMatcher(
#             matcher="mcp__validibot__validate_file",
#             hooks=[log_validation],
#         )
#     ]
# }

For complex workflows, you can use the Agent SDK's subagent feature to split the work. A main agent reads the input documents and delegates data extraction to a specialist subagent, which generates and validates the output before reporting back. The main agent never sees invalid data.

And you can run the whole thing as part of a CI/CD pipeline. A GitHub Action triggers the agent, the agent processes incoming files and validates them through Validibot, and the pipeline fails if any output doesn't pass. This works well for batch processing scenarios where you're onboarding a lot of supplier data at once.

So What

The Anthropic Agent SDK gives your agents the ability to use tools. Validibot gives those agents access to deterministic, auditable validation. Put AI and Validibot together and you get an agent that doesn't just generate data, it generates validated data.

The agent is still probabilistic. It still makes mistakes. But it also catches and fixes those mistakes before anyone else has to deal with them.

If you want to try this with your own data and workflows, start with Validibot or get in touch. I'd be interested to hear what kinds of data your agents are generating and what validation rules you'd want to put around them.