03 CORE STRATEGIES

Architectural patterns for reliable AI agent systems.

The Solution: External Memory & Structured State

The fundamental insight of context engineering is simple: Don't rely solely on the context window for state management. Instead, architect systems that externalize memory, manage state explicitly, and provide just-in-time information retrieval.


Strategy 1: System Prompt Architecture

The system prompt establishes foundational context: identity, capabilities, constraints, and behavioral guidelines.

Best Practices

  • Define agent role explicitly: "You are a financial analyst assistant specialized in SaaS metrics..."
  • Provide concrete examples rather than abstract rules
  • Use structured delimiters: XML tags, markdown sections for clear boundaries
  • Place critical constraints prominently: Models exhibit positional biasβ€”put important rules at the beginning and end

Example Structure

<role>You are a senior software architect...</role>

<constraints>
- Never modify database schema without explicit approval
- Always validate input data before processing
- Prefer composition over inheritance
</constraints>

<examples>
<!-- Concrete code examples demonstrating expected behavior -->
</examples>

Strategy 2: Memory Management

Effective context engineering requires active memory strategies to prevent context rot.

Summarization

Compress older exchanges while preserving key facts. After every N messages, generate a summary of decisions, facts, and state.

Implementation:

  • Trigger summarization at regular intervals (e.g., every 10 exchanges)
  • Store summaries in external database
  • Inject relevant summaries into new context as needed

Windowing

Keep only recent N messages in active context. Older messages are archived but can be retrieved if needed.

Trade-offs:

  • βœ… Keeps context lean and focused
  • ❌ May lose important historical context
  • πŸ’‘ Combine with selective retention for best results

Selective Retention

Preserve critical information, prune conversational filler. Not all messages are equally important.

Heuristics for retention:

  • User instructions and requirements
  • Agent commitments and decisions
  • Error messages and corrections
  • Explicit memory commands ("Remember that...")

Episodic Memory

Write important state to external storage, retrieve on demand. This mirrors human long-term memory.

Architecture:

User Query β†’ Check Episodic Memory β†’ Retrieve Relevant Episodes β†’ 
Inject into Context β†’ Generate Response β†’ Store New Episode

Strategy 3: RAG Implementation

Retrieval Augmented Generation provides just-in-time information without bloating the context window.

Semantic Chunking

Break documents into meaningful segments that can be independently retrieved and understood.

Best practices:

  • Chunk by logical boundaries (sections, paragraphs, code blocks)
  • Maintain context overlap between chunks
  • Include metadata (source, date, author)

Hybrid Search

Combine keyword and semantic search for optimal retrieval accuracy.

Approach:

  1. Keyword search: Fast, precise for exact matches
  2. Semantic search: Captures meaning, handles synonyms
  3. Fusion: Combine and rerank results

Reranking

Order results by relevance to current context, not just similarity to query.

Techniques:

  • Cross-encoder models for pairwise relevance scoring
  • LLM-based reranking ("Which of these passages is most relevant?")
  • Metadata filtering (recency, source authority)

Strategy 4: State Machines

Explicit workflow state tracking prevents agents from losing their place in multi-step processes.

Implementation Pattern

typescript
type WorkflowState = 
  | { stage: 'requirements_gathering', data: Requirements }
  | { stage: 'design', data: DesignSpec }
  | { stage: 'implementation', data: CodeArtifacts }
  | { stage: 'testing', data: TestResults }
  | { stage: 'deployment', data: DeploymentConfig };

// Agent always knows current stage and available transitions

Benefits

  • Clarity: Agent knows exactly what stage it's in
  • Validation: Prevent invalid state transitions
  • Recovery: Easy to resume after interruption
  • Debugging: Clear audit trail of state changes

Strategy 5: Context Budgets

Hard limits on token allocation per subsystem prevent any single component from monopolizing the context window.

Allocation Example

For a 128K token context window:

  • System Prompt: 4K tokens (3%)
  • Conversation History: 40K tokens (31%)
  • Tool Definitions: 8K tokens (6%)
  • Retrieved Documents: 32K tokens (25%)
  • Working Memory: 16K tokens (13%)
  • Response Generation: 28K tokens (22%)

Enforcement

  • Monitor token usage in real-time
  • Implement automatic summarization when budgets are exceeded
  • Prioritize high-value information when making trade-offs

Strategy 6: Tool Integration

Provide agents with specific, well-defined capabilities rather than expecting them to do everything through generation.

Tool Design Principles

  1. Single Responsibility: Each tool does one thing well
  2. Clear Interfaces: Explicit input/output schemas
  3. Error Handling: Tools return structured errors, not exceptions
  4. Idempotency: Safe to call multiple times with same inputs

Example Tools

  • search_codebase(query: string): Semantic code search
  • get_design_system(): Retrieve current design tokens
  • save_decision(decision: string): Store important choices
  • query_database(sql: string): Execute read-only queries

Putting It All Together

Effective context engineering combines these strategies into a cohesive architecture:

  1. System Prompt establishes identity and constraints
  2. Memory Management keeps context lean and relevant
  3. RAG provides just-in-time information access
  4. State Machines track workflow progress
  5. Context Budgets prevent resource exhaustion
  6. Tools extend capabilities beyond generation

The result: AI agents that maintain consistency, recall accurately, and scale to complex, long-running tasks.