[!NOTE] Activity Streams = Human Narrative
Humans don't remember the world as a database of rows; we remember it as a story.
- Behavioral: When an expert explains a failure, they say: "First this happened, then he did that, then the alarm went off."
- Engineering: Activity-Stream Engineering replicates this "narrative memory," allowing agents to see the "movie" of work rather than just a snapshot of state.
Knowledge Base RAG is table stakes. If your AI knows what your company knows (documents), but doesn't know what your company does (processes), it remains a passive researcher rather than an active agent. To bridge this gap, we move from Knowledge RAG to what we call Process RAG—an emerging pattern for retrieving the "How" of organizational work.
The foundation of Process RAG is the Activity Stream, built on the principles of Event Sourcing, Stateful Stream Processing, and Graph Topology.
Unlike observability traces, Activity Streams capture semantic intent and reasoning, not just execution telemetry. By applying a Perspective Engine to these streams, we move from static data persistence to intent-driven reconstruction.
The Perspective Engine vs. The Engineer
Context graphs are born from a computer-science instinct: model the world as nodes, edges, and provenance. That works when the system is logical and stable.
The Problem: State vs. History
Traditional databases store the current state (e.g., ticket_status: "closed"). But for an agent to learn or audit a process, it needs the history (the sequence of actions that led to "closed").
Context isn't just a snapshot; it's the movie of the process.
2. The Solution: Activity Schema
Instead of complex, join-heavy relational schemas, we adapt the Activity Schema—an open standard originally designed for customer analytics and data warehousing—to model the high-velocity, intent-rich events of AI Engineering Systems.
The Single-Table Standard
Every action is modeled as an immutable event in a unified stream. This allows you to treat your entire business process as a queryable data source.
graph LR
RAW[Raw Events] --> Map[Schema Mapper]
Map --> AS((Activity Stream))
AS --> Q[Analyst Query]
AS --> AI[Agent Context]
classDef v-large font-size:24px,font-weight:bold;
class RAW,Map,AS,Q,AI v-large;
| Timestamp | Entity | Activity | Context (JSON) | Metadata |
|---|---|---|---|---|
| 10:00:01 | User_A | Opened_Ticket | {"priority": "high"} | "Web-UI" |
| 10:05:20 | Agent_1 | Tool_Call | {"tool": "DB_Search"} | "Query: 'policy'" |
| 10:05:45 | Agent_1 | Proposed_Action | {"action": "Refund"} | "Conf: 0.92" |
3. Architecture: Coordinated Progress as a Graph
Computation is essentially a graph. In his series on Coordinated Progress, Jack Vanlightly posits that at every level of abstraction, computation reveals itself as a graph.
Nodes, Edges, and Sub-Graphs
- Nodes: Microservices, FaaS functions, and AI Engineering Systems.
- Edges: The communication medium (RPC, Kafka Streams, Event Buses).
- Workflows: Connected sub-graphs that represent a logical piece of business value (e.g., "Order-to-Cash").
Choreography vs. Orchestration
- Choreography: Reactive, decoupled coordination where agents react to events in a decentralized "dance."
- Orchestration: Centralized control flow where a lead agent or workflow engine explicitly directs the graph progress.
graph LR
A[Agent: Researcher] -->|Event: Search_Query| B[Agent: Analyst]
B -->|RPC: Send_Draft| C[Microservice: Email]
B -->|Event: Proposed_Analysis| D[Agent: Auditor]
classDef v-large font-size:24px,font-weight:bold;
class A,B,C,D v-large;
4. The High-Velocity Reasoning Fabric
To move from "Logs" to "Intelligence," we must treat the stream as a Reasoning Fabric—a high-velocity layer where agents don't just act, but react to the shifting topology of the business.
The Reasoning Fabric
By exposing the "Edges" (communication between agents) as part of the stream, you create a Reasoning Fabric. Every decision made by an agent is traceable back to the specific event that triggered it. This is the difference between an AI that "just says things" and an AI that can "explain its work."
Event-Sourced Agent State
The defining trait of a production-grade Activity Stream is Complete Replayability. By storing every agent "thought," "tool call," and "observation" as a discrete event, we enable:
- Time-Travel Debugging: Replay the exact event sequence that led to a specific agent failure.
- Parallel Reality Simulation: Branch the stream at a specific timestamp to test "What-If" decisions in a sandboxed environment.
- In-Flight Context Injection: Injecting fresh business context directly into an active stream window without restarting the agent loop.
graph LR
E1(Observation) --> P1{Router}
P1 -->|High Velocity| A1[Agent: Fast-React]
P1 -->|Complex| A2[Agent: Reasoning-o1]
A1 --> AS[(Activity Store)]
A2 --> AS
AS --> Mirror[Causal Mirror]
classDef v-large font-size:24px,font-weight:bold;
class E1,P1,A1,A2,AS,Mirror v-large;
The Actor-Model Foundation
The "Reasoning Fabric" of an Activity Stream is effectively an implementation of Carl Hewitt's Actor Theory. By treating every agent as an autonomous Actor that receives messages, updates its own state, and sends new messages, you create a system that is inherently concurrent, scalable, and—most importantly—mathematically traceable.
5. Advanced: Inference via Graph Neural Networks (GNNs)
While LLMs reason over text, Graph Neural Networks (GNNs) reason over topology. By treating the Activity Stream as an evolving graph, we can use GNNs for deep process inference—though for most 2026 production systems, this remains a powerful tool for offline pattern mining and structural retrieval rather than real-time per-decision inference.
In practice, GNNs are most effective for building Process Embeddings that allow agents to retrieve successfully executed workflows from history that share a similar "shape" or topology with the current task.
GNNs learn from graph topology; classical algorithms (BFS, k-shortest paths) traverse it—your Activity Stream feeds both.
A. Topology Embedding
GNNs can convert complex multi-agent process sub-graphs into vector embeddings. This allows for Structural RAG: retrieving not just "similar text" but "similar process architectures" that successfully solved a problem in the past.
B. Pattern Prediction & Anomaly Detection
Transformers (LLMs) are sequence-aware, but GNNs are topology-aware.
graph LR
subgraph Seq ["1D: Sequence (LLM)"]
direction LR
S1([Node A]) --> S2([Node B]) --> S3([Node C])
end
subgraph Top ["2D: Topology (GNN)"]
direction LR
T1((Node A)) --- T2((Node B))
T2 --- T3((Node C))
T1 --- T3
T2 --- T4((Node D))
end
classDef v-large font-size:24px,font-weight:bold;
class S1,S2,S3,T1,T2,T3,T4 v-large;
- Next-Step Prediction: A GNN can predict the most likely next node in a branched workflow more reliably than a linear sequence model.
- Structural Anomalies: Identifying "Coordinated Progress" that has stalled or deviated from the expected graph structure (e.g., a missing verification node).
5. Stream Processing Patterns
To move from raw events to "Process Reality," we apply patterns from Apache Flink.
A. Windowing for Task Inference
We use Tumbling or Sliding Windows to aggregate atomic signals into discrete tasks. This creates the "summarization" layer required for the Dual-Engine Architecture.
graph LR
S1[Click] --> W1[Window: 5min]
S2[Search] --> W1
S3[Copy] --> W1
W1 --> T1[Inferred Task: Research]
classDef v-large font-size:24px,font-weight:bold;
class S1,S2,S3,W1,T1 v-large;
B. Stateful Traces
Using Stateful Stream Processing, agents maintain a "cursor" on a process. This allows an agent to resume a multi-turn workflow by "playing back" the stream to rebuild its mental context.
Summary Checklist
- Immutability: Are activities stored as an append-only log (Kafka-style)?
- Activity Schema: Does your data contract follow the one-table Activity Schema standard?
- Graph Mapping: Can you visualize your multi-agent interactions as a directed graph (with temporal or windowed DAGs)?
- Windowing: Do you use temporal windowing to group atomic actions into logical "Tasks"?
- GNN Potential: Are you logging enough topological data (source -> destination relationships) to enable future GNN-based inference?
The Bottom Line
[!IMPORTANT] Logs are the source of truth; the Graph is the meaning. To balance the "Engineer" and the "Perspective Engine," do not treat context as a static asset. Store raw, immutable activity (the Engineer) and reconstruct the graph at runtime through the lens of a specific decision intent (the Perspective Engine).
References & Further Reading
- Coordinated Progress - Jack Vanlightly — Seeing the system as a graph of nodes and edges.
- Event Sourcing - Wikipedia — The foundation of history-as-state.
- Apache Flink: Concepts — Windowing and Stateful Processing.
- Graph Neural Networks (arXiv) — A comprehensive survey on graph-based deep learning.
- Activity Schema v2.0 — Modeling activity data as a single table.
Related Playbooks
- The Engineering Manifesto — AlphaPebble's core philosophy for building high-stakes autonomous AI systems.
- Context Engineering — Providing LLMs with the right information.
- Dual-Engine Architecture — Using code to enforce stream contracts.
- Knowledge Graph Engineering — Where static knowledge meets dynamic activity graphs.
- Precedent Engineering — How agents earn autonomy through recorded human judgment.
- Enterprise Context Layer — Platform architecture for cross-system context delivery.
This playbook is maintained by the AlphaPebble team. For implementation support, get in touch.
