Building Verifiable LLM Based Agents
AI agents that generate, critique, and mathematically verify their own output — so you only ever receive solutions that are provably correct.
The Problem: Trusting AI Output You Can't Verify
Large Language Models are powerful generators, but they produce plausible-looking output that can silently break hard rules. For high-stakes engineering, legal, or safety-critical work, "looks correct" is not good enough. Our goal is a class of AI agents that don't just generate an answer but verify it against every constraint and only return solutions that are mathematically proven to be correct.
The user supplies only the problem and its rules, no worked examples and no information dump. The system reasons out a solution on its own, checks itself, and guarantees the result satisfies every requirement.
This is a closed, self-correcting loop: one agent proposes, another tries to disprove it, and any proposal that fails is rejected and regenerated until it passes. The output is bounded by the constraints — not by the model's confidence.
Our Proving Ground: Engineering Drawings
To prove this approach works, we apply it to a genuinely hard problem: autonomously generating legally compliant architectural layouts. We chose it because every output is objectively right or wrong, the rules are strict and geometric, and correctness can be checked with formal methods — making it an ideal benchmark for whether an AI can truly produce a verified solution.
A two-agent loop drives it: Agent 1 (The Formulator) (Claude, Gemini, ChatGPT, etc.) generates the spatial geometry and Python CAD scripts, while Agent 2 (The Math Cop) (Kimi AI, DeepSeek, etc.) acts only as a verifier, running the coordinates through the Microsoft Z3 Theorem Prover. The engineering drawing is the demonstration — verifiable AI is the goal.
Platform Capabilities
🧠 Agent 1: Generation
- Translates raw dimensions into Python scripts
- Utilizes ezdxf library for CAD construction
- Focuses on maximizing buildable outer periphery
⚖️ Agent 2: Verification (Math Cop)
- Uses Microsoft Z3 Solver to mathematically prove rules
- Guarantees the house doesn't intersect the tree
- Respects the single entry point and boundary offsets
🔄 Autonomous Feedback Loop
- Self-correcting mechanism upon Z3 violation
- Agent 2 sends the exact failed constraints back to Agent 1
- Iterates dynamically until a Pass is achieved
📐 DXF Output Generation
- The system saves the final .dxf engineering drawing
- Ready for immediate import into AutoCAD
- Ensures zero manual CAD drafting is required
How It Works — Demonstrated on Architectural Layouts
Initialization
Defining the plot Dimensions, the Seattle Building Code (SBC) rules (e.g. tree penalty zone).
Generation
The formulation LLM uses the ezdxf library to generate a Python script that calculates the maximum buildable outer periphery.
Verification
The Math Cop stage. Agent 2 uses the Microsoft Z3 Solver to mathematically prove the house doesn't intersect the tree and respects the single entry point and offsets.
DXF Output
Pass: The system saves the final .dxf engineering drawing.
LLM Benchmarking
Evaluating each LLM combination across the pipeline on constraint adherence accuracy, iteration convergence rate, and output quality to identify the optimal pairing for verifiable agentic engineering tasks.
Development Team
AADI GWALIORKAR
AAYUSH AGGARWAL
ADITYA JAIN
ANSHUL NAIK
BHAVSAR DEVAM JATIN
BHAVYA RAKESH SHAH
DEVANK KOUL
DHWANI BANDI
HARSHIT GROVER
KAVYA PARAS SHAH
MARJIT DASGUPTA
MEHTA PREM
NITESH KUMAR VARMA
OJAS CHANDRASHEKHAR MARATHE
PRASAD GUNJIKAR
ROHAN KAPOOR
SAKSHAM GARG
SHREYAS GUPTA
SIDAKBIR SINGH BAINS
SIDDH KUMAR JAIN
SWARA WANKHEDE
TRIYANSH AGRAWAL