07 BEST PRACTICES

Recurring themes and patterns for reliable AI agent systems.


Principle 1: Don't Rely Solely on Context Window

The most fundamental insight of context engineering: external state management is not optional for production systems.

Why This Matters

The context window is working memory, not long-term storage. Treating it as a database leads to:

  • Performance degradation as context fills
  • Information loss when context is truncated
  • Inconsistency across sessions
  • Unpredictable behavior at scale

What To Do Instead

  • Externalize state: Use databases, vector stores, and file systems
  • Retrieve selectively: Fetch only what's needed for current task
  • Summarize aggressively: Compress old information
  • Track explicitly: Use state machines for workflow progress

Principle 2: Context Budget Discipline

Every token in the context window has a cost—not just financial, but cognitive (for the model).

Budget Allocation Strategy

Treat context like a scarce resource. Allocate budgets to each component and enforce them:

Total Budget: 100K tokens

- System Prompt: 5K (5%)
- Conversation History: 30K (30%)
- Tool Definitions: 10K (10%)
- Retrieved Context: 35K (35%)
- Working Memory: 20K (20%)

Enforcement Mechanisms

  • Monitor token usage in real-time
  • Trigger summarization when budgets approach limits
  • Prioritize high-value information when making trade-offs
  • Log budget violations for analysis

Principle 3: Memory Hierarchy

Implement a multi-tier memory system, just like computer architecture.

Tier 1: Working Memory (Context Window)

Characteristics:

  • Fast access
  • Limited capacity
  • Volatile (lost when context is cleared)

Use for:

  • Current conversation
  • Active task state
  • Recently retrieved information

Tier 2: Session Memory (Database)

Characteristics:

  • Medium access speed
  • Large capacity
  • Persistent within session

Use for:

  • Conversation summaries
  • User preferences
  • Task progress

Tier 3: Long-term Memory (Vector Store + Database)

Characteristics:

  • Slower access (retrieval required)
  • Unlimited capacity
  • Persistent across sessions

Use for:

  • Historical interactions
  • Knowledge base
  • Learned patterns

Principle 4: Retrieval Quality Over Quantity

More context is not always better. Irrelevant information is worse than no information.

Quality Metrics

  • Relevance: Does this information help with the current task?
  • Recency: Is this information still current?
  • Authority: Is this information from a trusted source?
  • Specificity: Is this information specific enough to be useful?

Retrieval Best Practices

  1. Semantic search first: Understand intent, not just keywords
  2. Rerank results: Order by relevance to current context
  3. Filter by metadata: Use recency, source, and type filters
  4. Limit results: Top-K retrieval (typically K=3-10)
  5. Provide context: Include surrounding information for retrieved chunks

Principle 5: Explicit State Management

Don't let the agent infer state—track it explicitly.

State Machine Pattern

typescript
type AgentState = 
  | { stage: 'idle' }
  | { stage: 'gathering_requirements', data: Requirements }
  | { stage: 'planning', data: Plan }
  | { stage: 'executing', data: Execution }
  | { stage: 'reviewing', data: Review }
  | { stage: 'complete', data: Result };

// Always know exactly where you are
function getCurrentStage(state: AgentState): string {
  return state.stage;
}

// Validate transitions
function canTransition(from: AgentState, to: AgentState['stage']): boolean {
  const validTransitions = {
    idle: ['gathering_requirements'],
    gathering_requirements: ['planning', 'idle'],
    planning: ['executing', 'gathering_requirements'],
    executing: ['reviewing', 'planning'],
    reviewing: ['complete', 'executing'],
    complete: ['idle']
  };
  
  return validTransitions[from.stage].includes(to);
}

Benefits

  • Clarity: Agent always knows its current state
  • Validation: Prevent invalid state transitions
  • Recovery: Easy to resume after interruption
  • Debugging: Clear audit trail

Principle 6: Tool Design Discipline

Tools are the agent's hands—design them carefully.

Single Responsibility

Each tool should do one thing well. Avoid "Swiss Army knife" tools that try to do everything.

Bad:

typescript
manipulate_data(action: 'read' | 'write' | 'delete' | 'transform', ...)

Good:

typescript
read_data(query: string)
write_data(data: any)
delete_data(id: string)
transform_data(data: any, transformation: Transform)

Clear Interfaces

Use explicit schemas. The agent should know exactly what inputs are required and what outputs to expect.

typescript
interface Tool {
  name: string;
  description: string;
  inputSchema: JSONSchema;
  outputSchema: JSONSchema;
  examples: Array<{ input: any; output: any }>;
}

Error Handling

Tools should return structured errors, not throw exceptions.

typescript
type ToolResult<T> = 
  | { success: true; data: T }
  | { success: false; error: { code: string; message: string; retry: boolean } };

Principle 7: Observability is Not Optional

You can't improve what you don't measure.

What to Log

  • Context assembly: What went into the context and why
  • Token usage: Per component, per request
  • Retrieval results: What was retrieved and relevance scores
  • Tool calls: Which tools were called, inputs, outputs, latency
  • State transitions: When and why state changed
  • Errors: Full context when errors occur

What to Monitor

  • Performance: Latency (p50, p95, p99)
  • Cost: Tokens per request, cost per request
  • Quality: Task completion rate, user satisfaction
  • Reliability: Error rates, uptime
  • Efficiency: Context utilization, retrieval accuracy

Dashboards

Create real-time dashboards for:

  • Token usage trends
  • Cost projections
  • Error rates and types
  • Performance metrics
  • User satisfaction scores

Principle 8: Test Like You Mean It

Context engineering systems are complex—test thoroughly.

Unit Tests

Test individual components in isolation:

  • Summarization quality
  • Retrieval relevance
  • Tool execution
  • State transitions

Integration Tests

Test the full context assembly pipeline:

  • Does the right information make it into context?
  • Are budgets respected?
  • Do tools work together correctly?

End-to-End Tests

Test agent behavior across sessions:

  • Does the agent maintain consistency?
  • Can it resume after interruption?
  • Does it remember important information?

Load Tests

Test performance under pressure:

  • How does latency scale with context size?
  • What happens when retrieval is slow?
  • Can the system handle concurrent requests?

Principle 9: Fail Gracefully

Things will go wrong. Plan for it.

Graceful Degradation

When context budget is exceeded:

  1. Summarize aggressively
  2. Drop low-priority information
  3. Notify user of reduced functionality
  4. Continue with best effort

Error Recovery

When tools fail:

  1. Retry with exponential backoff
  2. Fall back to alternative tools
  3. Ask user for help if needed
  4. Log for post-mortem analysis

Circuit Breakers

Protect against cascading failures:

  • Stop calling failing services
  • Return cached results when available
  • Fail fast rather than hanging

Principle 10: Iterate Based on Data

Context engineering is an empirical discipline. Let data guide your decisions.

A/B Testing

Test different strategies:

  • Summarization frequency
  • Retrieval methods
  • Context budget allocations
  • Tool usage patterns

User Feedback

Collect and act on feedback:

  • Task completion surveys
  • Consistency ratings
  • Qualitative feedback
  • Usage analytics

Continuous Improvement

Regularly review and optimize:

  • Analyze failure modes
  • Identify optimization opportunities
  • Update strategies based on learnings
  • Share knowledge with team

Anti-Patterns to Avoid

❌ Context Dumping

Don't dump everything into context "just in case". Be selective.

❌ Stateless Agents

Don't rely solely on context window for state. Use external storage.

❌ Ignoring Token Costs

Don't ignore token usage. Monitor and optimize continuously.

❌ Over-Engineering

Don't build complex systems before you need them. Start simple, iterate.

❌ Neglecting Observability

Don't deploy without monitoring. You'll be flying blind.


Conclusion

Achieving consistency in AI agent systems requires architectural discipline, not just clever prompts. The principles outlined here represent hard-won lessons from production deployments:

  1. Externalize state—don't rely on context window alone
  2. Budget tokens—treat context as a scarce resource
  3. Implement memory hierarchy—working, session, and long-term
  4. Prioritize retrieval quality—relevance over quantity
  5. Track state explicitly—use state machines
  6. Design tools carefully—single responsibility, clear interfaces
  7. Measure everything—observability is not optional
  8. Test thoroughly—unit, integration, end-to-end, load
  9. Fail gracefully—plan for errors
  10. Iterate based on data—empirical optimization

The future of AI agents lies not in ever-larger context windows, but in intelligent context engineering that mirrors how humans manage attention and memory.