[!NOTE] Context = Shared Awareness
Humans can't coordinate unless they are looking at the same map.
- Behavioral: A pilot and a co-pilot share a cockpit to ensure they are seeing the same instruments.
- Engineering: Context Engineering provides the "Shared Awareness" for LLMs, moving beyond static prompts to dynamic, multi-plane context delivery.
There's a moment every AI engineer experiences: you've crafted the perfect prompt, tested it dozens of times, and deployed it to production. Then it fails spectacularly on real-world inputs. The problem usually isn't the prompt itself—it's everything around the prompt. This is where context engineering comes in.
The Four Planes of Contextual Awareness
To build truly universal agents, we move beyond simple lists and frame context as the Four Planes of Awareness. This framework ensures the agent understands not just the data, but the time, source, and structure of its environment.
| Plane | Specific Layer | The Contextual Question |
|---|---|---|
| The Temporal Plane | Memories | What has recently happened? Facts and information retained from conversations. |
| The Source Plane | Documents | What is statically known? Files and content indexed for retrieval (RAG). |
| The Atomic Plane | Entities | Who/What are we talking about? People, places, and concepts extracted from text. |
| The Relational Plane | Relationships | How are they connected? The semantic and logical links between entities. |
"The companies winning at AI aren't the ones with the best prompts. They're the ones with the most sophisticated context pipelines."
The Context Engineering Stack
graph TD
A[**System Context**] --> E{**LLM**}
B[**RAG Context**] --> E
C[**History Context**] --> E
D[**Tool Context**] --> E
E --> F[**Structured Output**]
style A fill:#1e293b,stroke:#334155,stroke-width:2px,color:#fff
style B fill:#1e293b,stroke:#334155,stroke-width:2px,color:#fff
style C fill:#1e293b,stroke:#334155,stroke-width:2px,color:#fff
style D fill:#1e293b,stroke:#334155,stroke-width:2px,color:#fff
style E fill:#0f172a,stroke:#a855f7,stroke-width:4px,color:#fff
style F fill:#0f172a,stroke:#06b6d4,stroke-width:2px,color:#fff
Layer 1: System Context (Static)
| Element | Purpose | Example |
|---|---|---|
| System prompt | Define behavior and constraints | "You are a financial analyst. Be precise with numbers." |
| Persona definition | Set tone and expertise level | "Respond as a senior engineer explaining to a junior." |
| Output format | Enforce structure | "Always respond in JSON with keys: answer, confidence, sources" |
| Guardrails | Safety and compliance | "Never provide medical advice. Redirect to professionals." |
Best Practice: Keep system context as minimal as possible. Every token counts.
Layer 2: Dynamic Context (RAG)
graph LR
A(Query) --> B(Embed)
B --> C(Search)
C --> D(Get)
D --> E(Add)
E --> F(Gen)
RAG challenges to solve:
- Chunking strategy — How you split documents matters enormously
- Retrieval quality — Garbage in, garbage out
- Context ordering — Place most relevant information at the beginning or end
Layer 3: Conversational Memory
| Memory Type | Use Case | Implementation |
|---|---|---|
| Buffer Memory | Last N messages | Simple array, FIFO eviction |
| Summary Memory | Compressed conversation history | LLM summarizes older turns |
| Vector Memory | Semantic retrieval of past context | Embed and search history |
| Entity Memory | Track key entities mentioned | Extract and maintain state |
The Hard Problem: Context windows are finite, and attention quality degrades across long contexts. A 128K window exists, but models (especially medium-tier) struggle with effective reasoning beyond 8k-16k tokens. Critical information gets "lost in the middle."
Layer 4: Tool & Action Context
Tool calls add to context. Key considerations:
- Tool descriptions are part of your context budget
- Result formatting affects how well the LLM can use the information
- Error handling context helps the LLM recover gracefully
Context Window Optimization
The Token Budget Problem
| Component | Token Range | Priority |
|---|---|---|
| System prompt | 200-800 | High (static) |
| RAG documents | 1,000-10,000 | Medium (dynamic) |
| Conversation history | 500-4,000 | Medium (compressible) |
| Tool outputs | 100-2,000 | High (ephemeral) |
| User query | 50-500 | Critical |
| Output space | 500-2,000 | Reserved |
Compression Strategies
| Strategy | Description |
|---|---|
| Summarization | Use an LLM to compress older context |
| Semantic Pruning | Only include context relevant to the current query |
| Hierarchical Context | Store detailed context externally, inject summaries |
| Structured Extraction | Convert verbose text to structured data |
Practical Patterns
Pattern 1: The Focused Expert
Minimize system context, maximize retrieval relevance. Ground all answers in provided context.
Pattern 2: The Guided Reasoner
Include chain-of-thought instructions. Ask the model to show its reasoning before answering.
Pattern 3: The Stateful Agent
Maintain explicit state in context for multi-step workflows. Track goals, completed steps, and pending actions.
Common Anti-Patterns
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Context Stuffing | Throwing everything in without curation | Semantic retrieval + compression |
| Prompt Spaghetti | System prompts that grow organically | Regular refactoring, version control |
| Memory Amnesia | No state between turns in long conversations | Implement appropriate memory strategy |
| Tool Overload | Too many tool definitions bloating context | Dynamic tool selection based on query |
| Lost in the Middle | Critical info buried in middle of context | Position key information at start or end |
Getting Started
| Phase | Focus | Deliverable |
|---|---|---|
| Foundation | Define your context requirements | System prompt, output format, basic RAG |
| Optimization | Improve retrieval and compression | Chunking strategy, memory system |
| Evaluation | Measure and iterate | Metrics dashboard, A/B testing framework |
| Production | Scale and monitor | Observability, cost tracking, caching |
The Bottom Line
Context engineering is where AI engineering matures from prompt crafting to systems thinking. It is the shift from Memory-first (storing everything) to Decision-first (reconstructing meaning at runtime). The best AI products aren't those with clever prompts—they're the ones with sophisticated context pipelines that deliver the right information, at the right time, in the right format.
Start by auditing your current context: What's in it? What's missing? What's wasting tokens?
References & Further Reading
- Anthropic: Building Effective Agents — Anthropic's guide on context management for AI systems.
- OpenAI: Prompt Engineering Guide — Official guidelines on structuring prompts.
- Lost in the Middle (arXiv) — Stanford research on how LLMs attend to long contexts.
- LangChain: Memory Documentation — Practical memory pattern implementations.
Related Playbooks
- The Engineering Manifesto — AlphaPebble's core philosophy for building high-stakes autonomous AI systems.
- LLM Coding Workflow — Apply context engineering principles to AI-assisted development.
- Agentic Engineering — Take context-aware LLMs to the next level with autonomous agents.
- Data Engineering Fundamentals — Build the data infrastructure that feeds high-quality context.
- Knowledge Graph Engineering — Structured knowledge for more precise retrieval.
- Precedent Engineering — The logical successor to context: capturing how humans use information to decide.
- Enterprise Context Layer — Platform architecture for cross-system context delivery.
This playbook is maintained by the AlphaPebble team. For implementation support, get in touch.
