Structured Output

TL;DR

CMDOP structured output lets AI return typed, validated data instead of raw text. Define response shapes using Pydantic models with field descriptions, enums, optional fields, and nested schemas. AI output is automatically validated against your constraints. Supports complex analysis patterns like security scans and capacity planning with direct attribute access on results.

AI returns typed data instead of text. Use Pydantic models to define the structure.

How do I get structured data from AI?


from cmdop import AsyncCMDOPClient
from pydantic import BaseModel, Field
 
# Define the expected response shape with typed fields and descriptions
class ServerHealth(BaseModel):
    hostname: str
    cpu_percent: float = Field(description="CPU usage percentage")
    memory_percent: float = Field(description="Memory usage percentage")
    disk_percent: float = Field(description="Disk usage percentage")
    is_healthy: bool
 
async with AsyncCMDOPClient.remote(api_key="cmd_xxx") as client:
    await client.terminal.set_machine("prod-server")
 
    # Pass output_schema so AI returns structured data matching the model
    result = await client.agent.run(
        prompt="Check server health status",
        output_schema=ServerHealth
    )
 
    # Access fields directly with full type safety -- no text parsing needed
    health: ServerHealth = result.output
 
    if not health.is_healthy:
        send_alert(f"{health.hostname} is unhealthy!")
 
    if health.cpu_percent > 90:
        send_alert(f"High CPU: {health.cpu_percent}%")

Why use structured output instead of text parsing?


# Without structured output: fragile text parsing required
output, _ = await client.terminal.execute("check-health.sh")
# Output: "CPU: 45%, Memory: 62%, Disk: 78%"
# Now you have to parse this text...
 
# With structured output: direct typed attribute access
result = await client.agent.run("Check health", output_schema=ServerHealth)
if result.output.cpu_percent > 90:  # Direct access!
    alert()

How do I design Pydantic schemas for AI output?

Simple Schema


# Flat schema for a single measurement
class DiskUsage(BaseModel):
    path: str
    total_gb: float
    used_gb: float
    free_gb: float
    percent_used: float
 
result = await client.agent.run(
    "Check disk usage for /var",
    output_schema=DiskUsage
)
 
print(f"Free: {result.output.free_gb} GB")

With Lists


# Schema for a single process entry
class Process(BaseModel):
    pid: int
    name: str
    cpu_percent: float
    memory_mb: float
 
# Wrapper schema containing a list of processes
class ProcessList(BaseModel):
    processes: list[Process]
    total_count: int
 
result = await client.agent.run(
    "List top 10 processes by CPU",
    output_schema=ProcessList
)
 
# Iterate over the typed list of processes
for proc in result.output.processes:
    print(f"PID {proc.pid}: {proc.name} - {proc.cpu_percent}%")

Nested Schemas


# Child schemas representing sub-components of a server
class Service(BaseModel):
    name: str
    status: str  # running, stopped, failed
    port: int | None
 
class Database(BaseModel):
    type: str  # postgres, mysql, etc
    version: str
    connections: int
 
# Parent schema composing child schemas into a full picture
class ServerStatus(BaseModel):
    hostname: str
    services: list[Service]
    database: Database | None
    uptime_hours: float
    issues: list[str]
 
result = await client.agent.run(
    "Get complete server status including services and database",
    output_schema=ServerStatus
)

Enums and Literals


from enum import Enum
from typing import Literal
 
# Enum constrains AI to return only valid severity levels
class Severity(str, Enum):
    CRITICAL = "critical"
    HIGH = "high"
    MEDIUM = "medium"
    LOW = "low"
 
# Literal restricts category to an explicit set of values
class Alert(BaseModel):
    title: str
    severity: Severity
    category: Literal["security", "performance", "availability"]
    description: str
 
result = await client.agent.run(
    "Check for issues and report any alerts",
    output_schema=Alert
)
 
# Compare against enum values for routing decisions
if result.output.severity == Severity.CRITICAL:
    page_oncall(result.output)

Optional Fields


# Use None default for fields that may not always be available
class LogAnalysis(BaseModel):
    total_lines: int
    error_count: int
    warning_count: int
    first_error: str | None = None  # Optional -- might have zero errors
    last_error: str | None = None
    error_pattern: str | None = None
 
result = await client.agent.run(
    "Analyze /var/log/app.log for errors",
    output_schema=LogAnalysis
)
 
# Safely access optional fields after checking error_count
if result.output.error_count > 0:
    print(f"First error: {result.output.first_error}")

How do field descriptions help AI produce better output?

Descriptions help AI understand what you want:


# Field descriptions act as instructions telling AI exactly what data to collect
class NetworkStatus(BaseModel):
    hostname: str = Field(description="The server hostname")
    public_ip: str = Field(description="Public-facing IP address")
    private_ip: str = Field(description="Internal/private IP address")
    open_ports: list[int] = Field(description="Ports listening for connections")
    active_connections: int = Field(description="Current number of TCP connections")
    bandwidth_mbps: float = Field(description="Current bandwidth usage in Mbps")

How does Pydantic validation work with AI output?

Pydantic validates the output:


# Validation constraints ensure AI returns data within acceptable ranges
class Config(BaseModel):
    port: int = Field(ge=1, le=65535)  # Must be valid port range
    timeout_seconds: float = Field(gt=0)  # Must be positive
    log_level: Literal["debug", "info", "warn", "error"]
 
result = await client.agent.run(
    "Get application config",
    output_schema=Config
)
# AI output is automatically validated against these constraints

Error Handling


from cmdop.exceptions import SchemaValidationError
 
try:
    result = await client.agent.run(
        "Get server info",
        output_schema=ServerInfo
    )
except SchemaValidationError as e:
    # Catch validation failures when AI output doesn't match the schema
    print(f"AI returned invalid data: {e}")
    # Fall back to unstructured text output
    result = await client.agent.run("Get server info")
    print(result.text)

How do I collect multiple findings in a single result?


# Individual security finding with severity and fix recommendation
class SecurityFinding(BaseModel):
    severity: str
    category: str
    description: str
    file_path: str | None
    recommendation: str
 
# Top-level scan result containing a list of all findings
class SecurityScan(BaseModel):
    hostname: str
    scan_duration_seconds: float
    findings: list[SecurityFinding]
    overall_risk: Literal["low", "medium", "high", "critical"]
 
result = await client.agent.run(
    "Perform security audit: check permissions, open ports, outdated packages",
    output_schema=SecurityScan
)
 
# Filter findings by severity and auto-create tickets for critical ones
critical = [f for f in result.output.findings if f.severity == "critical"]
if critical:
    create_tickets(critical)

How do I model complex analysis with trends and predictions?


# Tracks a single resource metric over time with a 1-hour prediction
class ResourceTrend(BaseModel):
    metric: str
    current_value: float
    avg_24h: float
    trend: Literal["increasing", "stable", "decreasing"]
    prediction_1h: float
 
# Full capacity analysis aggregating multiple resource trends
class CapacityAnalysis(BaseModel):
    hostname: str
    resources: list[ResourceTrend]
    bottleneck: str | None
    recommendations: list[str]
    urgent_action_needed: bool
 
result = await client.agent.run(
    "Analyze resource usage trends and predict capacity issues",
    output_schema=CapacityAnalysis
)
 
# Trigger alerts when AI predicts imminent capacity problems
if result.output.urgent_action_needed:
    alert_team(result.output.bottleneck, result.output.recommendations)

What are the best practices for structured output schemas?

1. Be Specific in Prompts


# Good: explicitly state which metrics to check and how
result = await client.agent.run(
    "Check CPU usage for the last minute, memory usage including buffers, "
    "and disk usage for the root partition",
    output_schema=ServerHealth
)
 
# Vague: AI has to guess what "health" means
result = await client.agent.run(
    "Check health",
    output_schema=ServerHealth
)

2. Use Descriptive Field Names


# Good: field names clearly communicate their meaning
class LogStats(BaseModel):
    error_count_last_hour: int
    unique_error_types: int
    most_common_error: str
 
# Confusing: ambiguous names force AI to guess
class LogStats(BaseModel):
    count: int
    types: int
    error: str

3. Add Field Descriptions for Ambiguous Fields


# Descriptions disambiguate units and aggregation methods
class Metrics(BaseModel):
    latency: float = Field(description="P99 latency in milliseconds")
    throughput: float = Field(description="Requests per second")

4. Use Optional for Uncertain Data


# Mark fields as optional when the data might not exist
class ProcessInfo(BaseModel):
    pid: int
    name: str
    user: str
    start_time: str | None = None  # Might not be available

Task Execution — Let AI run commands
Fleet Management — Multi-machine orchestration