Compress 74% of your context. Keep 100% of the meaning.
Intelligent context optimization that understands what matters. Four strategies, zero context loss, sub-5ms latency.
Four strategies. One API.
Choose the compression strategy that fits your use case, or combine them with hybrid mode.
Keep the most recent N turns in full fidelity. Older turns are summarized or masked. Ideal for conversational agents.
Set a hard token limit. Content is intelligently compressed to fit within your budget while preserving the most important context.
Score each context block by relevance to the current task. Low-relevance blocks are compressed more aggressively.
Combine multiple strategies for maximum compression. Uses rolling windows for recency, relevance scoring for importance, and budgets for hard limits.
Five lines to 74% savings
The easy API handles strategy selection, budget management, and optimization automatically.
import { fold } from "@fold-run/sdk";
const ctx = fold(); // Default: 100K budget, 10 turn window
// Add context from your agent loop
ctx.system("You are a coding assistant...");
ctx.think("I need to search for information...");
ctx.act({ tool: "search", query: "react hooks" }, "search");
ctx.observe("Found 3 results: useState, useEffect, useCallback...", "search");
// Get optimized messages for your LLM
const messages = ctx.messages();
// Check your savings
console.log(ctx.saved());
// { tokens: 5000, percent: 45, cost: 0.05 }Built for production
Every detail considered for real-world agent workloads.
Extractive Summarization
Condense long observations to their essential information.
Deduplication
Automatically detect and remove repeated context across turns.
Smart Truncation
Truncate at semantic boundaries, not arbitrary character limits.
Degradation Awareness
Get warnings when context quality approaches critical thresholds.
Artifact Preservation
Code blocks, JSON, and structured data are preserved intact.
KV-Cache Ordering
Messages ordered to maximize LLM KV-cache hit rates.
Advanced configuration
Fine-tune every aspect of compression with the ContextSession API.
import { ContextSession } from "@fold-run/sdk";
const session = new ContextSession({
budget: 8000,
strategy: "hybrid",
rollingWindow: { maxTurns: 15 },
masking: {
observations: "extractive_summary",
thoughts: "relevance_scored",
artifacts: "preserve",
},
degradation: {
warnAt: 0.8, // Warn at 80% budget
criticalAt: 0.95 // Critical at 95%
},
});