Building Verifiable LLM Based Agents

AI agents that generate, critique, and mathematically verify their own output — so you only ever receive solutions that are provably correct.

The Problem: Trusting AI Output You Can't Verify

Large Language Models are powerful generators, but they produce plausible-looking output that can silently break hard rules. For high-stakes engineering, legal, or safety-critical work, "looks correct" is not good enough. Our goal is a class of AI agents that don't just generate an answer but verify it against every constraint and only return solutions that are mathematically proven to be correct.

The user supplies only the problem and its rules, no worked examples and no information dump. The system reasons out a solution on its own, checks itself, and guarantees the result satisfies every requirement.

This is a closed, self-correcting loop: one agent proposes, another tries to disprove it, and any proposal that fails is rejected and regenerated until it passes. The output is bounded by the constraints — not by the model's confidence.

Our Proving Ground: Engineering Drawings

To prove this approach works, we apply it to a genuinely hard problem: autonomously generating legally compliant architectural layouts. We chose it because every output is objectively right or wrong, the rules are strict and geometric, and correctness can be checked with formal methods — making it an ideal benchmark for whether an AI can truly produce a verified solution.

A two-agent loop drives it: Agent 1 (The Formulator) (Claude, Gemini, ChatGPT, etc.) generates the spatial geometry and Python CAD scripts, while Agent 2 (The Math Cop) (Kimi AI, DeepSeek, etc.) acts only as a verifier, running the coordinates through the Microsoft Z3 Theorem Prover. The engineering drawing is the demonstration — verifiable AI is the goal.

Platform Capabilities

🧠 Agent 1: Generation

  • Translates raw dimensions into Python scripts
  • Utilizes ezdxf library for CAD construction
  • Focuses on maximizing buildable outer periphery

⚖️ Agent 2: Verification (Math Cop)

  • Uses Microsoft Z3 Solver to mathematically prove rules
  • Guarantees the house doesn't intersect the tree
  • Respects the single entry point and boundary offsets

🔄 Autonomous Feedback Loop

  • Self-correcting mechanism upon Z3 violation
  • Agent 2 sends the exact failed constraints back to Agent 1
  • Iterates dynamically until a Pass is achieved

📐 DXF Output Generation

  • The system saves the final .dxf engineering drawing
  • Ready for immediate import into AutoCAD
  • Ensures zero manual CAD drafting is required

How It Works — Demonstrated on Architectural Layouts

1

Initialization

Defining the plot Dimensions, the Seattle Building Code (SBC) rules (e.g. tree penalty zone).

2

Generation

The formulation LLM uses the ezdxf library to generate a Python script that calculates the maximum buildable outer periphery.

3

Verification

The Math Cop stage. Agent 2 uses the Microsoft Z3 Solver to mathematically prove the house doesn't intersect the tree and respects the single entry point and offsets.

Fails / Feedback Loop: Agent 2 sends the exact failed constraints back to Agent 1.
4

DXF Output

Pass: The system saves the final .dxf engineering drawing.

5

LLM Benchmarking

Evaluating each LLM combination across the pipeline on constraint adherence accuracy, iteration convergence rate, and output quality to identify the optimal pairing for verifiable agentic engineering tasks.

Development Team

AADI GWALIORKAR

AAYUSH AGGARWAL

ADITYA JAIN

ANSHUL NAIK

BHAVSAR DEVAM JATIN

BHAVYA RAKESH SHAH

DEVANK KOUL

DHWANI BANDI

HARSHIT GROVER

KAVYA PARAS SHAH

MARJIT DASGUPTA

MEHTA PREM

NITESH KUMAR VARMA

OJAS CHANDRASHEKHAR MARATHE

PRASAD GUNJIKAR

ROHAN KAPOOR

SAKSHAM GARG

SHREYAS GUPTA

SIDAKBIR SINGH BAINS

SIDDH KUMAR JAIN

SWARA WANKHEDE

TRIYANSH AGRAWAL