Issue #31: Session Memory and Context Compaction for Production AI Agents

11 min read | June 6, 2026

A lot of context engineering advice still collapses into one vague instruction: give the model more relevant context. That is not enough for a production agent that runs across many turns, tool calls, and user corrections.

That is the wrong shape for production AI engineering. If a long-running agent keeps every turn forever, it eventually carries stale notes, superseded instructions, accidental secrets, and too much low-value history into the next model call. The hard part is no longer storing context. The hard part is deciding what survives.

In this issue, we build a local C# session-memory pipeline that replays a frozen multi-turn coding-agent session, pins durable task facts, expires short-lived notes, resolves conflicting updates deterministically, compacts older turns into a rolling summary, assembles a bounded carry-forward context for the next turn, and optionally sends that packed state to a local Ollama model for the next draft response.

What You Are Building

You are building a production-shaped context-engineering workflow that keeps long-running session state explicit:

Load runtime configuration from appsettings.json and CTXMEM_ environment overrides
Replay a frozen multi-turn session dataset from JSON
Normalize fact writes before storage, including max-length validation and secret redaction
Keep pinned goals and constraints durable across turns
Expire short-lived notes before they silently leak into later turns
Resolve conflicting fact writes so newer trusted values replace older ones deterministically
Compact older turns into a rolling summary while preserving a small recent-turn window
Pack the final carry-forward context under an explicit token budget
Optionally invoke a local Ollama model using only the packed carry-forward prompt
Persist the full turn-by-turn report as JSON for later inspection

This is the control layer that becomes necessary once the conversation itself stops being a safe unit of context.

System Structure

The architecture is intentionally small. A frozen session dataset provides turn-by-turn user and tool events. A deterministic write policy validates and redacts memory candidates. The compaction engine expires stale facts, applies conflict rules, builds rolling summaries, and packs the next-turn context under budget. A separate prompt composer turns that packed state into a bounded prompt for an optional local Ollama model. The report captures both the deterministic carry-forward state and the model outcome.

The diagram below shows the high-level control flow:

Runtime Configuration First

The app starts by loading the compaction profile before any session replay begins:

{
  "App": {
    "DatasetPath": "data/coding_agent_session.json",
    "ReportDirectory": "data/reports",
    "ModelContextWindowTokens": 2200,
    "ReservedOutputTokens": 450,
    "FixedPromptTokens": 240,
    "MaxRecentTurns": 2,
    "MaxSummaryEntries": 4,
    "DefaultFactTtlMinutes": 180,
    "ShortLivedFactTtlMinutes": 30,
    "MaxFactValueChars": 280,
    "EnableModelCall": true,
    "OllamaBaseUrl": "http://localhost:11434",
    "OllamaModelId": "qwen3:8b",
    "OllamaTimeoutSeconds": 45,
    "OllamaTemperature": 0,
    "OllamaMaxOutputTokens": 220
  }
}

That matters because context carry-forward is an operational boundary. Dataset path, report location, recent-turn depth, summary size, and TTL values all change what the agent is allowed to remember.

The Carry-Forward Budget Is Explicit

The next-turn context is bounded before it is assembled:

public int AvailableContextTokens =>
  ModelContextWindowTokens - ReservedOutputTokens - FixedPromptTokens;

The app validates that contract at startup:

if (AvailableContextTokens <= 0)
{
  throw new InvalidOperationException("AvailableContextTokens must be greater than zero.");
}

No budget means no carry-forward state. That prevents silent overflow and accidental prompt growth.

Memory Writes Are Filtered Before They Persist

A session turn does not get to become memory just because it happened. Each observed fact is normalized before storage:

if (string.IsNullOrWhiteSpace(normalizedValue))
{
  rejection = new RejectedFactWrite(turnId, observation.Key, "Value is empty.");
  return false;
}

if (normalizedValue.Length > config.MaxFactValueChars)
{
  rejection = new RejectedFactWrite(turnId, observation.Key, $"Value exceeds max length {config.MaxFactValueChars}.");
  return false;
}

var redactedValue = _secretRedactor.Redact(normalizedValue, out var wasRedacted);

This is where the system decides whether a candidate detail is valid memory, redacted memory, or rejected memory.

Pinned Facts Stay Durable

The sample coding-agent session starts by pinning the facts that must survive later compaction:

{
"key": "active_goal",
"value": "Fix flaky checkout tax calculation tests in Storefront.",
"source": "user",
"scope": "task",
"isPinned": true,
"isRequired": true
},
{
"key": "forbidden_path",
"value": "src/Storefront.Data/Migrations",
"source": "user",
"scope": "constraint",
"isPinned": true,
"isRequired": true
},
{
"key": "verification_command",
"value": "dotnet test Storefront.slnx --filter Tax",
"source": "user",
"scope": "execution",
"isPinned": true,
"isRequired": true
}

That is the difference between raw transcript replay and session memory engineering. The system knows which facts are durable task contract, not just recent conversation noise.

Stale Facts Expire Instead of Lingering

Not every fact deserves the same lifetime. Short-lived notes receive an explicit expiry:

if (observation.IsShortLived)
{
  return observedAtUtc.AddMinutes(config.ShortLivedFactTtlMinutes);
}

if (observation.IsPinned)
{
  return null;
}

return observedAtUtc.AddMinutes(config.DefaultFactTtlMinutes);

In the sample session, a temporary canary note about a pricing-service redeploy is valid for one stage of the task and then disappears before the later carry-forward snapshot.

Conflicting Updates Are Resolved Deterministically

Session memory is not append-only. Facts can be corrected, narrowed, or replaced:

if (currentFact.IsPinned && (int)incomingFact.Source < (int)currentFact.Source)
{
  return false;
}

return true;

Later in the session, the user replaces the original verification command with a narrower one:

{
  "key": "verification_command",
  "value": "dotnet test Storefront.slnx --filter FullyQualifiedName~TaxCalculatorTests",
  "source": "user",
  "scope": "execution",
  "isPinned": true,
  "isRequired": true
}

The system keeps the new user instruction and supersedes the earlier one. That is what makes the carry-forward state trustworthy instead of historically complete.

Older History Becomes a Rolling Summary

Once the recent-turn window is exceeded, the older turns are compacted into a deterministic summary:

var olderTurns = processedTurns
  .Take(Math.Max(0, processedTurns.Count - config.MaxRecentTurns))
  .TakeLast(config.MaxSummaryEntries)
  .ToArray();

return olderTurns
  .Select(static turn =>
  {
      var factSummary = turn.Facts.Count == 0
          ? Trim(turn.Message, 72)
          : string.Join("; ", turn.Facts.Take(3).Select(f => $"{f.Key}={Trim(f.Value, 40)}"));

      return $"{turn.TurnId} {turn.Speaker}: {factSummary}";
  })
  .ToArray();

This is a deliberately boring compaction strategy. That is a strength. It keeps the summary inspectable and deterministic instead of relying on another model call just to remember what happened.

Carry-Forward Assembly Is Ordered and Bounded

The carry-forward candidates are assembled in explicit order:

foreach (var fact in activeFacts
           .OrderByDescending(x => x.IsPinned)
           .ThenByDescending(x => x.IsRequired)
           .ThenBy(x => x.Scope)
           .ThenBy(x => x.Key, StringComparer.Ordinal))
{
  var content = $"[{fact.Scope}] {fact.Key}: {fact.Value}";
  var blockType = fact.IsPinned ? ContextBlockType.PinnedFact : ContextBlockType.ActiveFact;
}

Each block is then packed into the remaining budget or excluded with a concrete reason:

if (candidate.TokenCount <= remainingTokens)
{
  included.Add(new IncludedContextBlock(...));
  remainingTokens -= candidate.TokenCount;
  continue;
}

var reason = candidate.IsRequired
  ? ExclusionReason.RequiredBlockTooLargeForBudget
  : ExclusionReason.ExceedsRemainingBudget;

This is where the system stops treating context like an unbounded transcript and starts treating it like a controlled runtime input.

The LLM Sees Only the Packed State

After compaction, the app builds one bounded prompt for the next turn:

builder.AppendLine("You are continuing a long-running coding-agent session.");
builder.AppendLine("Use only the carry-forward context provided below.");
builder.AppendLine("Do not invent files, constraints, or verification steps.");
builder.AppendLine("If a required detail is missing, say so explicitly.");
builder.AppendLine();
builder.AppendLine($"Role: {agentRole}");
builder.AppendLine($"Next user task: {nextUserTask}");
builder.AppendLine();
builder.AppendLine(snapshot.PromptPreview);

The local model path then uses Ollama over HTTP:

var request = new OllamaGenerateRequest
{
  Model = modelId,
  Prompt = prompt,
  Stream = false,
  Options = new OllamaGenerateOptions
  {
      Temperature = temperature,
      NumPredict = maxOutputTokens
  }
};

using var response = await httpClient.PostAsync("/api/generate", content, cancellationToken);

That boundary is the important part. The model never sees the raw full transcript. It sees only the deterministic carry-forward state that survived validation, expiry, conflict resolution, and budget checks.

Walking a Real Live Run

A deterministic local run produced the following output:

Session Memory Context Compaction
Session: CODING-SESSION-001
Agent role: Checkout Bug Fix Assistant
Dataset turns: 6

SM-001
- Active facts: 3
- Expired facts: 0
- Superseded facts: 0
- Rejected writes: 0
- Can proceed: True
- Context tokens used: 92/1510
- Rolling summary entries: 0

SM-002
- Active facts: 5
- Expired facts: 0
- Superseded facts: 0
- Rejected writes: 0
- Can proceed: True
- Context tokens used: 162/1510
- Rolling summary entries: 0

SM-003
- Active facts: 8
- Expired facts: 0
- Superseded facts: 0
- Rejected writes: 0
- Can proceed: True
- Context tokens used: 240/1510
- Rolling summary entries: 1

SM-004
- Active facts: 10
- Expired facts: 0
- Superseded facts: 0
- Rejected writes: 0
- Can proceed: True
- Context tokens used: 299/1510
- Rolling summary entries: 2

SM-005
- Active facts: 10
- Expired facts: 0
- Superseded facts: 1
- Rejected writes: 0
- Can proceed: True
- Context tokens used: 351/1510
- Rolling summary entries: 3

SM-006
- Active facts: 10
- Expired facts: 1
- Superseded facts: 0
- Rejected writes: 0
- Can proceed: True
- Context tokens used: 379/1510
- Rolling summary entries: 4

Final carry-forward state
- Included blocks: 12
- Excluded blocks: 1
- Remaining tokens: 1131
- Final active facts: 10
- Expired fact keys: temporary_canary_note
- Verification command: dotnet test Storefront.slnx --filter FullyQualifiedName~TaxCalculatorTests
- Model call: ollama/qwen3:8b in 15872 ms

Model draft
1. Fix flaky New York checkout tax calculation tests by addressing rounding logic in `src/Storefront.Api/Services/TaxCalculator.cs`. Focus only on New York paths; exclude Promotional pricing and avoid `src/Storefront.Data/Migrations`.

2. Run `dotnet test Storefront.slnx --filter FullyQualifiedName~TaxCalculatorTests` to verify all targeted tax tests are green and stable.
Report: SessionMemoryContextCompaction\bin\Debug\net10.0\data\reports\20260606T070012Z-session-memory-compaction-report.json

How to interpret this:

The pinned task contract stayed stable even as the session accumulated more details
The updated verification command replaced the older value instead of adding conflicting state
The short-lived canary note expired before the final carry-forward snapshot
The accidentally pasted secret stayed out of the carry-forward surface even though it appeared in the session
The model draft stayed bounded to the packed carry-forward state rather than the raw transcript
The rolling summary grew as the recent-turn window stayed small
The final context stayed well under budget even after six turns of session history

Why This Architecture Works

The system works because context storage and context carry-forward are treated as different responsibilities:

The write policy decides what is valid memory at all
The expiry policy decides what remains current
The conflict policy decides which value wins when the session changes direction
The summary layer keeps older history compact and inspectable
The budgeter decides what the next model call is allowed to see
The model layer drafts against bounded carry-forward state instead of uncontrolled history
The saved report makes the full carry-forward state auditable after the run ends

That is the real boundary in context engineering for long-running agents. The model can reason over the bounded state. Deterministic code owns what the bounded state actually is.

Potential Enhancements

To extend this project further, you can consider:

Add per-scope TTL policies so diagnostics, user preferences, and repo constraints age differently
Replace the deterministic summary generator with an optional model-generated summary that is still schema-validated
Persist longitudinal snapshot diffs so you can compare carry-forward state across multiple sessions
Extend the dataset with support operations or incident-response sessions in addition to coding-agent flows
Add policy rules that require human approval before certain fact classes are ever persisted

Final Notes

Context engineering is not just about gathering more data for the model. It is about deciding which state is durable enough, current enough, and important enough to survive into the next turn.

If the next-turn context is part of the system contract, then session memory has to be shaped, compacted, and validated like a runtime component, not treated like a raw chat transcript.

Explore the source code at the GitHub repository.

See you in the next issue.

Stay curious.

Share this article with your network.

LinkedIn X Facebook

Join the Newsletter

Subscribe for AI engineering insights, system design strategies, and workflow tips.

Your information is safe. Unsubscribe anytime.