Context Compression

Compress 74% of your context. Keep 100% of the meaning.

Intelligent context optimization that understands what matters. Four strategies, zero context loss, sub-5ms latency.

74%

Token Reduction

<5ms

Latency Overhead

Strategies

Context Loss

Strategies

Four strategies. One API.

Choose the compression strategy that fits your use case, or combine them with hybrid mode.

Rolling Window

Keep the most recent N turns in full fidelity. Older turns are summarized or masked. Ideal for conversational agents.

Token Budget

Set a hard token limit. Content is intelligently compressed to fit within your budget while preserving the most important context.

Relevance Scored

Score each context block by relevance to the current task. Low-relevance blocks are compressed more aggressively.

Hybrid

Combine multiple strategies for maximum compression. Uses rolling windows for recency, relevance scoring for importance, and budgets for hard limits.

Easy API

Five lines to 74% savings

The easy API handles strategy selection, budget management, and optimization automatically.

import { fold } from "@fold-run/sdk";

const ctx = fold();  // Default: 100K budget, 10 turn window

// Add context from your agent loop
ctx.system("You are a coding assistant...");
ctx.think("I need to search for information...");
ctx.act({ tool: "search", query: "react hooks" }, "search");
ctx.observe("Found 3 results: useState, useEffect, useCallback...", "search");

// Get optimized messages for your LLM
const messages = ctx.messages();

// Check your savings
console.log(ctx.saved());
// { tokens: 5000, percent: 45, cost: 0.05 }

Advanced Features

Built for production

Every detail considered for real-world agent workloads.

Extractive Summarization

Condense long observations to their essential information.

Deduplication

Automatically detect and remove repeated context across turns.

Smart Truncation

Truncate at semantic boundaries, not arbitrary character limits.

Degradation Awareness

Get warnings when context quality approaches critical thresholds.

Artifact Preservation

Code blocks, JSON, and structured data are preserved intact.

KV-Cache Ordering

Messages ordered to maximize LLM KV-cache hit rates.

Full Control

Advanced configuration

Fine-tune every aspect of compression with the ContextSession API.

import { ContextSession } from "@fold-run/sdk";

const session = new ContextSession({
  budget: 8000,
  strategy: "hybrid",
  rollingWindow: { maxTurns: 15 },
  masking: {
    observations: "extractive_summary",
    thoughts: "relevance_scored",
    artifacts: "preserve",
  },
  degradation: {
    warnAt: 0.8,    // Warn at 80% budget
    criticalAt: 0.95 // Critical at 95%
  },
});

Try it in the playground.

See compression in action.