Structured Output
TL;DR
CMDOP structured output lets AI return typed, validated data instead of raw text. Define response shapes using Pydantic models with field descriptions, enums, optional fields, and nested schemas. AI output is automatically validated against your constraints. Supports complex analysis patterns like security scans and capacity planning with direct attribute access on results.
AI returns typed data instead of text. Use Pydantic models to define the structure.
How do I get structured data from AI?
from cmdop import AsyncCMDOPClient
from pydantic import BaseModel, Field
# Define the expected response shape with typed fields and descriptions
class ServerHealth(BaseModel):
hostname: str
cpu_percent: float = Field(description="CPU usage percentage")
memory_percent: float = Field(description="Memory usage percentage")
disk_percent: float = Field(description="Disk usage percentage")
is_healthy: bool
async with AsyncCMDOPClient.remote(api_key="cmd_xxx") as client:
await client.terminal.set_machine("prod-server")
# Pass output_schema so AI returns structured data matching the model
result = await client.agent.run(
prompt="Check server health status",
output_schema=ServerHealth
)
# Access fields directly with full type safety -- no text parsing needed
health: ServerHealth = result.output
if not health.is_healthy:
send_alert(f"{health.hostname} is unhealthy!")
if health.cpu_percent > 90:
send_alert(f"High CPU: {health.cpu_percent}%")Why use structured output instead of text parsing?
# Without structured output: fragile text parsing required
output, _ = await client.terminal.execute("check-health.sh")
# Output: "CPU: 45%, Memory: 62%, Disk: 78%"
# Now you have to parse this text...
# With structured output: direct typed attribute access
result = await client.agent.run("Check health", output_schema=ServerHealth)
if result.output.cpu_percent > 90: # Direct access!
alert()How do I design Pydantic schemas for AI output?
Simple Schema
# Flat schema for a single measurement
class DiskUsage(BaseModel):
path: str
total_gb: float
used_gb: float
free_gb: float
percent_used: float
result = await client.agent.run(
"Check disk usage for /var",
output_schema=DiskUsage
)
print(f"Free: {result.output.free_gb} GB")With Lists
# Schema for a single process entry
class Process(BaseModel):
pid: int
name: str
cpu_percent: float
memory_mb: float
# Wrapper schema containing a list of processes
class ProcessList(BaseModel):
processes: list[Process]
total_count: int
result = await client.agent.run(
"List top 10 processes by CPU",
output_schema=ProcessList
)
# Iterate over the typed list of processes
for proc in result.output.processes:
print(f"PID {proc.pid}: {proc.name} - {proc.cpu_percent}%")Nested Schemas
# Child schemas representing sub-components of a server
class Service(BaseModel):
name: str
status: str # running, stopped, failed
port: int | None
class Database(BaseModel):
type: str # postgres, mysql, etc
version: str
connections: int
# Parent schema composing child schemas into a full picture
class ServerStatus(BaseModel):
hostname: str
services: list[Service]
database: Database | None
uptime_hours: float
issues: list[str]
result = await client.agent.run(
"Get complete server status including services and database",
output_schema=ServerStatus
)Enums and Literals
from enum import Enum
from typing import Literal
# Enum constrains AI to return only valid severity levels
class Severity(str, Enum):
CRITICAL = "critical"
HIGH = "high"
MEDIUM = "medium"
LOW = "low"
# Literal restricts category to an explicit set of values
class Alert(BaseModel):
title: str
severity: Severity
category: Literal["security", "performance", "availability"]
description: str
result = await client.agent.run(
"Check for issues and report any alerts",
output_schema=Alert
)
# Compare against enum values for routing decisions
if result.output.severity == Severity.CRITICAL:
page_oncall(result.output)Optional Fields
# Use None default for fields that may not always be available
class LogAnalysis(BaseModel):
total_lines: int
error_count: int
warning_count: int
first_error: str | None = None # Optional -- might have zero errors
last_error: str | None = None
error_pattern: str | None = None
result = await client.agent.run(
"Analyze /var/log/app.log for errors",
output_schema=LogAnalysis
)
# Safely access optional fields after checking error_count
if result.output.error_count > 0:
print(f"First error: {result.output.first_error}")How do field descriptions help AI produce better output?
Descriptions help AI understand what you want:
# Field descriptions act as instructions telling AI exactly what data to collect
class NetworkStatus(BaseModel):
hostname: str = Field(description="The server hostname")
public_ip: str = Field(description="Public-facing IP address")
private_ip: str = Field(description="Internal/private IP address")
open_ports: list[int] = Field(description="Ports listening for connections")
active_connections: int = Field(description="Current number of TCP connections")
bandwidth_mbps: float = Field(description="Current bandwidth usage in Mbps")How does Pydantic validation work with AI output?
Pydantic validates the output:
# Validation constraints ensure AI returns data within acceptable ranges
class Config(BaseModel):
port: int = Field(ge=1, le=65535) # Must be valid port range
timeout_seconds: float = Field(gt=0) # Must be positive
log_level: Literal["debug", "info", "warn", "error"]
result = await client.agent.run(
"Get application config",
output_schema=Config
)
# AI output is automatically validated against these constraintsError Handling
from cmdop.exceptions import SchemaValidationError
try:
result = await client.agent.run(
"Get server info",
output_schema=ServerInfo
)
except SchemaValidationError as e:
# Catch validation failures when AI output doesn't match the schema
print(f"AI returned invalid data: {e}")
# Fall back to unstructured text output
result = await client.agent.run("Get server info")
print(result.text)How do I collect multiple findings in a single result?
# Individual security finding with severity and fix recommendation
class SecurityFinding(BaseModel):
severity: str
category: str
description: str
file_path: str | None
recommendation: str
# Top-level scan result containing a list of all findings
class SecurityScan(BaseModel):
hostname: str
scan_duration_seconds: float
findings: list[SecurityFinding]
overall_risk: Literal["low", "medium", "high", "critical"]
result = await client.agent.run(
"Perform security audit: check permissions, open ports, outdated packages",
output_schema=SecurityScan
)
# Filter findings by severity and auto-create tickets for critical ones
critical = [f for f in result.output.findings if f.severity == "critical"]
if critical:
create_tickets(critical)How do I model complex analysis with trends and predictions?
# Tracks a single resource metric over time with a 1-hour prediction
class ResourceTrend(BaseModel):
metric: str
current_value: float
avg_24h: float
trend: Literal["increasing", "stable", "decreasing"]
prediction_1h: float
# Full capacity analysis aggregating multiple resource trends
class CapacityAnalysis(BaseModel):
hostname: str
resources: list[ResourceTrend]
bottleneck: str | None
recommendations: list[str]
urgent_action_needed: bool
result = await client.agent.run(
"Analyze resource usage trends and predict capacity issues",
output_schema=CapacityAnalysis
)
# Trigger alerts when AI predicts imminent capacity problems
if result.output.urgent_action_needed:
alert_team(result.output.bottleneck, result.output.recommendations)What are the best practices for structured output schemas?
1. Be Specific in Prompts
# Good: explicitly state which metrics to check and how
result = await client.agent.run(
"Check CPU usage for the last minute, memory usage including buffers, "
"and disk usage for the root partition",
output_schema=ServerHealth
)
# Vague: AI has to guess what "health" means
result = await client.agent.run(
"Check health",
output_schema=ServerHealth
)2. Use Descriptive Field Names
# Good: field names clearly communicate their meaning
class LogStats(BaseModel):
error_count_last_hour: int
unique_error_types: int
most_common_error: str
# Confusing: ambiguous names force AI to guess
class LogStats(BaseModel):
count: int
types: int
error: str3. Add Field Descriptions for Ambiguous Fields
# Descriptions disambiguate units and aggregation methods
class Metrics(BaseModel):
latency: float = Field(description="P99 latency in milliseconds")
throughput: float = Field(description="Requests per second")4. Use Optional for Uncertain Data
# Mark fields as optional when the data might not exist
class ProcessInfo(BaseModel):
pid: int
name: str
user: str
start_time: str | None = None # Might not be availableNext
- Task Execution β Let AI run commands
- Fleet Management β Multi-machine orchestration
Last updated on