07 BEST PRACTICES
Recurring themes and patterns for reliable AI agent systems.
Principle 1: Don't Rely Solely on Context Window
The most fundamental insight of context engineering: external state management is not optional for production systems.
Why This Matters
The context window is working memory, not long-term storage. Treating it as a database leads to:
- Performance degradation as context fills
- Information loss when context is truncated
- Inconsistency across sessions
- Unpredictable behavior at scale
What To Do Instead
- Externalize state: Use databases, vector stores, and file systems
- Retrieve selectively: Fetch only what's needed for current task
- Summarize aggressively: Compress old information
- Track explicitly: Use state machines for workflow progress
Principle 2: Context Budget Discipline
Every token in the context window has a cost—not just financial, but cognitive (for the model).
Budget Allocation Strategy
Treat context like a scarce resource. Allocate budgets to each component and enforce them:
Total Budget: 100K tokens
- System Prompt: 5K (5%)
- Conversation History: 30K (30%)
- Tool Definitions: 10K (10%)
- Retrieved Context: 35K (35%)
- Working Memory: 20K (20%)
Total Budget: 100K tokens
- System Prompt: 5K (5%)
- Conversation History: 30K (30%)
- Tool Definitions: 10K (10%)
- Retrieved Context: 35K (35%)
- Working Memory: 20K (20%)
Enforcement Mechanisms
- Monitor token usage in real-time
- Trigger summarization when budgets approach limits
- Prioritize high-value information when making trade-offs
- Log budget violations for analysis
Principle 3: Memory Hierarchy
Implement a multi-tier memory system, just like computer architecture.
Tier 1: Working Memory (Context Window)
Characteristics:
- Fast access
- Limited capacity
- Volatile (lost when context is cleared)
Use for:
- Current conversation
- Active task state
- Recently retrieved information
Tier 2: Session Memory (Database)
Characteristics:
- Medium access speed
- Large capacity
- Persistent within session
Use for:
- Conversation summaries
- User preferences
- Task progress
Tier 3: Long-term Memory (Vector Store + Database)
Characteristics:
- Slower access (retrieval required)
- Unlimited capacity
- Persistent across sessions
Use for:
- Historical interactions
- Knowledge base
- Learned patterns
Principle 4: Retrieval Quality Over Quantity
More context is not always better. Irrelevant information is worse than no information.
Quality Metrics
- Relevance: Does this information help with the current task?
- Recency: Is this information still current?
- Authority: Is this information from a trusted source?
- Specificity: Is this information specific enough to be useful?
Retrieval Best Practices
- Semantic search first: Understand intent, not just keywords
- Rerank results: Order by relevance to current context
- Filter by metadata: Use recency, source, and type filters
- Limit results: Top-K retrieval (typically K=3-10)
- Provide context: Include surrounding information for retrieved chunks
Principle 5: Explicit State Management
Don't let the agent infer state—track it explicitly.
State Machine Pattern
type AgentState =
| { stage: 'idle' }
| { stage: 'gathering_requirements', data: Requirements }
| { stage: 'planning', data: Plan }
| { stage: 'executing', data: Execution }
| { stage: 'reviewing', data: Review }
| { stage: 'complete', data: Result };
// Always know exactly where you are
function getCurrentStage(state: AgentState): string {
return state.stage;
}
// Validate transitions
function canTransition(from: AgentState, to: AgentState['stage']): boolean {
const validTransitions = {
idle: ['gathering_requirements'],
gathering_requirements: ['planning', 'idle'],
planning: ['executing', 'gathering_requirements'],
executing: ['reviewing', 'planning'],
reviewing: ['complete', 'executing'],
complete: ['idle']
};
return validTransitions[from.stage].includes(to);
}
type AgentState =
| { stage: 'idle' }
| { stage: 'gathering_requirements', data: Requirements }
| { stage: 'planning', data: Plan }
| { stage: 'executing', data: Execution }
| { stage: 'reviewing', data: Review }
| { stage: 'complete', data: Result };
// Always know exactly where you are
function getCurrentStage(state: AgentState): string {
return state.stage;
}
// Validate transitions
function canTransition(from: AgentState, to: AgentState['stage']): boolean {
const validTransitions = {
idle: ['gathering_requirements'],
gathering_requirements: ['planning', 'idle'],
planning: ['executing', 'gathering_requirements'],
executing: ['reviewing', 'planning'],
reviewing: ['complete', 'executing'],
complete: ['idle']
};
return validTransitions[from.stage].includes(to);
}
Benefits
- Clarity: Agent always knows its current state
- Validation: Prevent invalid state transitions
- Recovery: Easy to resume after interruption
- Debugging: Clear audit trail
Principle 6: Tool Design Discipline
Tools are the agent's hands—design them carefully.
Single Responsibility
Each tool should do one thing well. Avoid "Swiss Army knife" tools that try to do everything.
Bad:
manipulate_data(action: 'read' | 'write' | 'delete' | 'transform', ...)
manipulate_data(action: 'read' | 'write' | 'delete' | 'transform', ...)
Good:
read_data(query: string)
write_data(data: any)
delete_data(id: string)
transform_data(data: any, transformation: Transform)
read_data(query: string)
write_data(data: any)
delete_data(id: string)
transform_data(data: any, transformation: Transform)
Clear Interfaces
Use explicit schemas. The agent should know exactly what inputs are required and what outputs to expect.
interface Tool {
name: string;
description: string;
inputSchema: JSONSchema;
outputSchema: JSONSchema;
examples: Array<{ input: any; output: any }>;
}
interface Tool {
name: string;
description: string;
inputSchema: JSONSchema;
outputSchema: JSONSchema;
examples: Array<{ input: any; output: any }>;
}
Error Handling
Tools should return structured errors, not throw exceptions.
type ToolResult<T> =
| { success: true; data: T }
| { success: false; error: { code: string; message: string; retry: boolean } };
type ToolResult<T> =
| { success: true; data: T }
| { success: false; error: { code: string; message: string; retry: boolean } };
Principle 7: Observability is Not Optional
You can't improve what you don't measure.
What to Log
- Context assembly: What went into the context and why
- Token usage: Per component, per request
- Retrieval results: What was retrieved and relevance scores
- Tool calls: Which tools were called, inputs, outputs, latency
- State transitions: When and why state changed
- Errors: Full context when errors occur
What to Monitor
- Performance: Latency (p50, p95, p99)
- Cost: Tokens per request, cost per request
- Quality: Task completion rate, user satisfaction
- Reliability: Error rates, uptime
- Efficiency: Context utilization, retrieval accuracy
Dashboards
Create real-time dashboards for:
- Token usage trends
- Cost projections
- Error rates and types
- Performance metrics
- User satisfaction scores
Principle 8: Test Like You Mean It
Context engineering systems are complex—test thoroughly.
Unit Tests
Test individual components in isolation:
- Summarization quality
- Retrieval relevance
- Tool execution
- State transitions
Integration Tests
Test the full context assembly pipeline:
- Does the right information make it into context?
- Are budgets respected?
- Do tools work together correctly?
End-to-End Tests
Test agent behavior across sessions:
- Does the agent maintain consistency?
- Can it resume after interruption?
- Does it remember important information?
Load Tests
Test performance under pressure:
- How does latency scale with context size?
- What happens when retrieval is slow?
- Can the system handle concurrent requests?
Principle 9: Fail Gracefully
Things will go wrong. Plan for it.
Graceful Degradation
When context budget is exceeded:
- Summarize aggressively
- Drop low-priority information
- Notify user of reduced functionality
- Continue with best effort
Error Recovery
When tools fail:
- Retry with exponential backoff
- Fall back to alternative tools
- Ask user for help if needed
- Log for post-mortem analysis
Circuit Breakers
Protect against cascading failures:
- Stop calling failing services
- Return cached results when available
- Fail fast rather than hanging
Principle 10: Iterate Based on Data
Context engineering is an empirical discipline. Let data guide your decisions.
A/B Testing
Test different strategies:
- Summarization frequency
- Retrieval methods
- Context budget allocations
- Tool usage patterns
User Feedback
Collect and act on feedback:
- Task completion surveys
- Consistency ratings
- Qualitative feedback
- Usage analytics
Continuous Improvement
Regularly review and optimize:
- Analyze failure modes
- Identify optimization opportunities
- Update strategies based on learnings
- Share knowledge with team
Anti-Patterns to Avoid
❌ Context Dumping
Don't dump everything into context "just in case". Be selective.
❌ Stateless Agents
Don't rely solely on context window for state. Use external storage.
❌ Ignoring Token Costs
Don't ignore token usage. Monitor and optimize continuously.
❌ Over-Engineering
Don't build complex systems before you need them. Start simple, iterate.
❌ Neglecting Observability
Don't deploy without monitoring. You'll be flying blind.
Conclusion
Achieving consistency in AI agent systems requires architectural discipline, not just clever prompts. The principles outlined here represent hard-won lessons from production deployments:
- Externalize state—don't rely on context window alone
- Budget tokens—treat context as a scarce resource
- Implement memory hierarchy—working, session, and long-term
- Prioritize retrieval quality—relevance over quantity
- Track state explicitly—use state machines
- Design tools carefully—single responsibility, clear interfaces
- Measure everything—observability is not optional
- Test thoroughly—unit, integration, end-to-end, load
- Fail gracefully—plan for errors
- Iterate based on data—empirical optimization
The future of AI agents lies not in ever-larger context windows, but in intelligent context engineering that mirrors how humans manage attention and memory.