By Yusuf Elborey

Schema-First LLM Apps: Make 'Tool Calling' Reliable with JSON Schema + Validation + Repair Loops

llmtool-callingjson-schemavalidationrepair-loopsstructured-outputspythonproductionreliability

Schema-First Pipeline

Most LLM apps break in boring ways. Missing fields. Wrong types. Partial JSON. “Almost correct” outputs that pass parsing but fail validation.

This article shows a practical pattern to make structured outputs dependable. Define a strict JSON Schema. Force the model to comply. Validate every response. Auto-repair once or twice, then fail safely. Log everything so you can fix prompts and schemas over time.

The Problem: “Structured Output” That Isn’t Structured

You ask the model to return JSON. It does. Sometimes. Other times you get trailing commas, missing keys, wrong enums, strings instead of numbers.

Here’s what happens in practice:

response = llm.generate("Extract customer info: 'John Doe, john@example.com, 555-1234'")
data = json.loads(response)

The model might return:

  • {"name": "John Doe", "email": "john@example.com", "phone": "555-1234"} ✅ Works
  • {"name": "John Doe", "email": "john@example.com",} ❌ Trailing comma
  • {"name": "John Doe", "email": "john@example.com" ❌ Missing closing brace
  • Here's the JSON: {"name": "John Doe", ...} ❌ Extra text
  • {"name": "John Doe", "email": null, "phone": 5551234} ❌ Wrong types

One broken response breaks your entire pipeline. Your API returns 500 errors. Your workflow stops. Your users wait.

Why “Just Parse JSON” Is Not a Strategy

Parsing JSON is easy. Getting valid JSON that matches your schema is hard.

try:
    data = json.loads(response)
    # Great, it's valid JSON. But is it the right shape?
    customer_id = data["customer_id"]  # KeyError if missing
    priority = data["priority"]  # Might be "high" instead of 1-5
    tags = data["tags"]  # Might be a string instead of a list
except json.JSONDecodeError:
    # What now? Retry? Fail? Log and move on?
    pass

This approach fails because:

  • Valid JSON doesn’t mean valid schema
  • Missing fields cause runtime errors
  • Wrong types cause downstream failures
  • No feedback loop to improve

Schema-First Thinking

Start from the downstream system’s needs. What does your code expect? What shape does your database need? What format does your API require?

Design the schema like a public API contract. It’s the interface between the model and your system. Make it explicit. Make it strict. Make it testable.

Start from Downstream Needs

Don’t ask “what can the model extract?” Ask “what does my system need?”

If your database has a priority column that’s an integer 1-5, your schema should enforce that. Not “high/medium/low” that you convert later. Not a string that might be “urgent” or “critical” or “HIGH”.

# Bad: Model returns string, you convert later
priority_str = data["priority"]  # "high", "medium", "low"
priority_map = {"low": 1, "medium": 3, "high": 5}
priority = priority_map.get(priority_str, 3)  # What if it's "urgent"?

# Good: Schema enforces integer 1-5
priority = data["priority"]  # Always 1, 2, 3, 4, or 5

Design Like a Public API Contract

Your schema is a contract. It defines what’s required, what’s optional, what’s allowed.

from pydantic import BaseModel, Field
from typing import Literal

class CustomerExtraction(BaseModel):
    """Extract customer information from text."""
    
    name: str = Field(description="Customer's full name")
    email: str = Field(description="Email address, must be valid format")
    phone: str | None = Field(None, description="Phone number if found")
    priority: Literal[1, 2, 3, 4, 5] = Field(
        description="Priority level: 1=lowest, 5=highest"
    )
    tags: list[str] = Field(
        default_factory=list,
        description="Relevant tags for categorization"
    )

This schema is explicit. It’s testable. It’s self-documenting. The model knows exactly what to return.

Keep Schemas Small and Composable

One schema per task. Don’t create a mega-schema that handles everything. Create small, focused schemas that compose.

# Bad: One schema for everything
class MegaExtraction(BaseModel):
    customer: dict
    order: dict
    payment: dict
    shipping: dict
    # ... 50 more fields

# Good: Small, focused schemas
class CustomerInfo(BaseModel):
    name: str
    email: str

class OrderInfo(BaseModel):
    order_id: str
    total: float

# Compose when needed
class FullExtraction(BaseModel):
    customer: CustomerInfo
    order: OrderInfo

Small schemas are easier to test. Easier to validate. Easier to fix when they break.

A Practical Pipeline

Here’s the pattern: Prompt → model output → parse → validate → (optional) repair → accept/reject.

Step 1: Parse

Extract JSON from the response. Handle extra text, markdown code blocks, trailing commas.

import json
import re

def extract_json(text: str) -> dict | None:
    """Extract JSON from text, handling common issues."""
    # Remove markdown code blocks
    text = re.sub(r'```json\s*', '', text)
    text = re.sub(r'```\s*$', '', text)
    
    # Find JSON object
    match = re.search(r'\{.*\}', text, re.DOTALL)
    if not match:
        return None
    
    json_str = match.group(0)
    
    # Try to parse
    try:
        return json.loads(json_str)
    except json.JSONDecodeError:
        # Try fixing trailing commas
        json_str = re.sub(r',\s*}', '}', json_str)
        json_str = re.sub(r',\s*]', ']', json_str)
        try:
            return json.loads(json_str)
        except json.JSONDecodeError:
            return None

Step 2: Validate

Check that the parsed JSON matches your schema. Get specific error messages.

from pydantic import ValidationError

def validate_output(data: dict, schema: type[BaseModel]) -> tuple[BaseModel | None, str | None]:
    """Validate data against schema, return model or error."""
    try:
        model = schema(**data)
        return model, None
    except ValidationError as e:
        # Format errors for repair
        errors = []
        for error in e.errors():
            path = " -> ".join(str(x) for x in error["loc"])
            errors.append(f"{path}: {error['msg']}")
        return None, "; ".join(errors)

Step 3: Repair (Optional)

If validation fails, send the errors back to the model. Ask it to fix only the invalid parts. Retry once or twice. Don’t loop forever.

def repair_loop(
    prompt: str,
    schema: type[BaseModel],
    max_retries: int = 2
) -> BaseModel | None:
    """Call model, validate, repair if needed."""
    
    current_prompt = prompt
    schema_json = schema.model_json_schema()
    
    for attempt in range(max_retries + 1):
        # Call model
        response = llm.generate(current_prompt)
        data = extract_json(response)
        
        if data is None:
            if attempt < max_retries:
                current_prompt = f"{prompt}\n\nPlease return valid JSON only."
                continue
            return None
        
        # Validate
        model, error = validate_output(data, schema)
        if model is not None:
            return model
        
        # Repair
        if attempt < max_retries:
            repair_prompt = f"""Previous response had validation errors:
{error}

Please correct only the invalid fields and return valid JSON matching this schema:
{json.dumps(schema_json, indent=2)}

Original request: {prompt}"""
            current_prompt = repair_prompt
        else:
            return None
    
    return None

Step 4: Accept or Reject

If repair succeeds, use the output. If it fails, log the error and use a safe fallback.

def safe_extract(prompt: str, schema: type[BaseModel]) -> BaseModel:
    """Extract with repair loop, fallback on failure."""
    result = repair_loop(prompt, schema, max_retries=2)
    
    if result is None:
        # Log failure
        logger.error(f"Failed to extract after retries: {prompt[:100]}")
        # Return safe default
        return schema.model_validate({})  # Or raise exception
    
    return result

What “Repair” Means (And What It Must Not Do)

Repair means asking the model to fix validation errors. It doesn’t mean:

  • Guessing missing fields
  • Making up data
  • Ignoring errors
  • Looping forever

Repair should:

  • Show specific validation errors
  • Ask for corrections only
  • Retry with the same prompt + error feedback
  • Stop after 1-2 retries

Here’s what repair looks like:

# Model returns: {"name": "John", "priority": "high"}
# Schema expects: {"name": str, "priority": Literal[1,2,3,4,5]}

# Validation error: "priority: Input should be 1, 2, 3, 4, or 5"

# Repair prompt:
"""
Previous response had validation errors:
priority: Input should be 1, 2, 3, 4, or 5

Please correct only the invalid fields and return valid JSON matching this schema:
{
  "name": {"type": "string"},
  "priority": {"type": "integer", "enum": [1, 2, 3, 4, 5]}
}

Original request: Extract customer info...
"""

The model sees the error. It knows what to fix. It returns corrected JSON.

Validation Rules That Matter

Not all validation is equal. Some rules matter more than others.

Required Fields

Missing required fields break downstream code. Enforce them strictly.

class OrderExtraction(BaseModel):
    order_id: str  # Required
    customer_id: str  # Required
    total: float  # Required
    notes: str | None = None  # Optional

Enums

Enums prevent invalid values. Use them for fixed sets.

class TicketClassification(BaseModel):
    status: Literal["open", "in_progress", "resolved", "closed"]
    priority: Literal["low", "medium", "high", "urgent"]

Min/Max

Numbers should be in valid ranges.

class PriorityScore(BaseModel):
    score: int = Field(ge=1, le=5)  # 1-5 only
    confidence: float = Field(ge=0.0, le=1.0)  # 0.0-1.0 only

Formats

Dates, emails, URLs should match formats.

from pydantic import EmailStr, HttpUrl
from datetime import datetime

class ContactInfo(BaseModel):
    email: EmailStr  # Validates email format
    website: HttpUrl | None = None  # Validates URL format
    created_at: datetime  # Validates ISO datetime

Nested Objects

Nested objects need validation too.

class Address(BaseModel):
    street: str
    city: str
    zip: str

class Customer(BaseModel):
    name: str
    address: Address  # Nested validation

Strict vs Tolerant Validation

Be strict where it matters. Be tolerant where it doesn’t.

Be strict for:

  • Required fields that break code
  • Enums that route to different handlers
  • Types that cause runtime errors
  • Formats that downstream systems require

Be tolerant for:

  • Optional fields that have defaults
  • Extra fields you don’t use
  • Minor formatting differences
  • Fields that are “nice to have”
# Strict: This breaks if missing
class DatabaseRecord(BaseModel):
    id: str  # Required, strict
    status: Literal["active", "inactive"]  # Required, strict enum

# Tolerant: This has defaults
class UserPreferences(BaseModel):
    theme: str = "light"  # Default, tolerant
    notifications: bool = True  # Default, tolerant
    extra_data: dict = Field(default_factory=dict)  # Ignore extra fields

Repair Loop Design

The repair loop is simple. Call model. Validate. If invalid, send errors back. Retry. Stop after max retries.

How to Ask the Model to Correct Only Invalid Parts

Show specific errors. Show the schema. Ask for corrections.

def build_repair_prompt(
    original_prompt: str,
    validation_errors: str,
    schema: type[BaseModel]
) -> str:
    """Build prompt asking model to fix validation errors."""
    
    schema_json = schema.model_json_schema()
    
    return f"""Your previous response had validation errors:
{validation_errors}

Please correct ONLY the fields mentioned in the errors above. Keep all other fields exactly as they were.

Return valid JSON matching this schema:
{json.dumps(schema_json, indent=2)}

Original request:
{original_prompt}"""

Retry Limits

Retry once or twice. Not more. If it fails after 2 retries, it’s probably not going to work.

MAX_REPAIR_RETRIES = 2  # Usually 1-2 is enough

Hard Stop with Safe Fallback

After max retries, stop. Don’t keep looping. Use a safe fallback.

def extract_with_fallback(
    prompt: str,
    schema: type[BaseModel],
    fallback: BaseModel | None = None
) -> BaseModel:
    """Extract with repair, use fallback on failure."""
    result = repair_loop(prompt, schema, max_retries=2)
    
    if result is None:
        if fallback is not None:
            logger.warning("Using fallback after repair failure")
            return fallback
        raise ValueError("Failed to extract after retries")
    
    return result

Security and Safety Basics

Treat model output as untrusted input. Always validate. Never trust.

Treat Model Output as Untrusted Input

The model might return anything. Validate everything.

# Bad: Trust the model
user_id = response["user_id"]
db.query(f"SELECT * FROM users WHERE id = {user_id}")  # SQL injection risk

# Good: Validate first
validated = UserIdSchema(**response)
user_id = validated.user_id  # Validated, safe
db.query("SELECT * FROM users WHERE id = ?", (user_id,))  # Parameterized

Allowlist Tool Names and Argument Shapes

Don’t let the model choose arbitrary tool names. Use an allowlist.

ALLOWED_TOOLS = {
    "get_user_info": GetUserInfoArgs,
    "update_ticket": UpdateTicketArgs,
    "send_notification": SendNotificationArgs,
}

def safe_tool_call(tool_name: str, args: dict) -> dict:
    """Execute tool only if name and args are allowed."""
    if tool_name not in ALLOWED_TOOLS:
        raise ValueError(f"Tool {tool_name} not allowed")
    
    validator = ALLOWED_TOOLS[tool_name]
    validated = validator(**args)
    return execute_tool(tool_name, validated)

Never Let the Model Choose Raw SQL, Shell Commands, or URLs Without Constraints

This is dangerous. Don’t do it.

# Bad: Model chooses SQL
sql = response["query"]
db.execute(sql)  # SQL injection

# Bad: Model chooses shell command
cmd = response["command"]
os.system(cmd)  # Command injection

# Bad: Model chooses URL
url = response["url"]
requests.get(url)  # SSRF risk

# Good: Model chooses from allowed options
action = response["action"]  # "read", "write", "delete"
if action == "read":
    db.read(id=response["id"])
elif action == "write":
    db.write(id=response["id"], data=response["data"])

Testing Strategy

Test your schemas. Test your validation. Test your repair loops.

Golden Test Cases

Create test cases for valid and invalid outputs.

VALID_CASES = [
    {
        "input": "Extract: John Doe, john@example.com",
        "expected": {"name": "John Doe", "email": "john@example.com"}
    },
    {
        "input": "Extract: Jane Smith",
        "expected": {"name": "Jane Smith", "email": None}
    },
]

INVALID_CASES = [
    {
        "input": "Extract: John Doe",
        "response": '{"name": "John Doe"}',  # Missing email (if required)
        "expected_error": "email: Field required"
    },
    {
        "input": "Extract: test@example",
        "response": '{"email": "test@example"}',  # Invalid email format
        "expected_error": "email: Invalid email format"
    },
]

Property-Based Testing

Test edge cases automatically.

from hypothesis import given, strategies as st

@given(
    name=st.text(min_size=1, max_size=100),
    email=st.emails(),
    priority=st.integers(min_value=1, max_value=5)
)
def test_customer_extraction(name, email, priority):
    """Test extraction with random valid inputs."""
    prompt = f"Extract: {name}, {email}, priority {priority}"
    result = extract_with_repair(prompt, CustomerExtraction)
    assert result.name == name
    assert result.email == email
    assert result.priority == priority

Regression Tests When Schemas Evolve

When you change a schema, test that old outputs still work (or fail gracefully).

def test_schema_migration():
    """Test that schema changes don't break existing code."""
    old_output = {"name": "John", "email": "john@example.com"}
    
    # Old schema had "email" as optional
    # New schema has "email" as required
    # Migration should handle this
    
    try:
        result = CustomerExtractionV2(**old_output)
        assert result.email is not None
    except ValidationError:
        # If migration fails, should have fallback
        result = migrate_old_to_new(old_output)
        assert result.email is not None

Production Checklist

Before deploying, check these:

Metrics

Track validation failure rate, repair success rate, tool error rate.

metrics = {
    "validation_failures": 0,
    "repair_attempts": 0,
    "repair_successes": 0,
    "tool_errors": 0,
}

def track_validation_failure():
    metrics["validation_failures"] += 1

def track_repair_attempt():
    metrics["repair_attempts"] += 1

def track_repair_success():
    metrics["repair_successes"] += 1

Logging

Log schema version, prompt version, model version, error class.

import logging

logger = logging.getLogger(__name__)

def log_extraction_attempt(
    prompt: str,
    schema_version: str,
    prompt_version: str,
    model_version: str,
    success: bool,
    error: str | None = None
):
    """Log extraction attempt with full context."""
    logger.info(
        f"Extraction attempt: "
        f"schema={schema_version} "
        f"prompt={prompt_version} "
        f"model={model_version} "
        f"success={success} "
        f"error={error}"
    )

Rollout: Shadow Mode for Schema Changes

When you change a schema, run both old and new in parallel. Compare results.

def shadow_mode_extract(prompt: str, new_schema: type[BaseModel]):
    """Extract with new schema, but also run old schema for comparison."""
    new_result = extract_with_repair(prompt, new_schema)
    old_result = extract_with_repair(prompt, OLD_SCHEMA)
    
    # Log differences
    if new_result != old_result:
        logger.warning(
            f"Schema change detected difference: "
            f"old={old_result} new={new_result}"
        )
    
    return new_result

Code Samples

The code repository includes three runnable examples:

  1. Schema + Validator: JSON Schema definition and Python validation with clear error reporting
  2. Repair Loop: Function that calls LLM, validates, repairs on failure, retries max 2 times
  3. Tool Execution Wrapper: Safe dispatcher that maps tool names to functions, validates args before calling, catches exceptions

See the GitHub repository for complete, runnable code.

Summary

Schema-first design makes LLM apps reliable. Start with schemas. Validate everything. Repair when needed. Stop after retries. Log for improvement.

The pattern is simple:

  1. Define strict JSON Schema
  2. Parse model output
  3. Validate against schema
  4. Repair on failure (1-2 retries)
  5. Use safe fallback if repair fails
  6. Log everything

This approach reduces brittle glue code and runtime surprises. It makes tool calling dependable. It makes structured outputs reliable.

Start with schemas. Validate strictly. Repair carefully. Log everything. Your future self will thank you.

Discussion

Join the conversation and share your thoughts

Discussion

0 / 5000