Zero configuration agent memory. Two memory calls around the LLM call you already have. Every user gets persistent context.
Get Startedcommit()
Send conversation turns to commit() — one at a time or whenever it fits your flow. Extraction runs async on our side and decides what's worth remembering. System default extraction favors conservative, durable facts — override with your own prompt when you need different tradeoffs.
import { memory } from "@withmemory/sdk";
const userMessage = "Our exports are failing again. We're still on the "
+ "legacy v2 API — our ETL pipeline can't handle the v3 response shape.";
const agentReply = "I've flagged the v2 dependency in the ticket. "
+ "We'll keep that in mind for any changes to the export pipeline.";
await memory.commit({
userId: "customer-42871",
input: userMessage,
output: agentReply,
});
// Extraction ran. One memory was kept:
//
// { key: "api-version", value: "On legacy v2 API — ETL pipeline
// can't handle v3 response shape", source: "commit" }
//
// The rest of the turn was conversational filler and got dropped.recall()
Call recall() when you need context about a user — usually right before generating a response. You get back a memoryBlock: the most relevant memories, ranked by semantic similarity, pre-formatted as a string you drop into your system prompt. Always a string, always under 150 tokens, always safe to concatenate.
// Three weeks later — new ticket, different issue.
const { memoryBlock, memories } = await memory.recall({
userId: "customer-42871",
input: "Why are we getting 429s on /exports this morning?",
});
// memoryBlock → key: value pairs, one per line, ready for a prompt:
//
// "api-version: On legacy v2 API — ETL pipeline can't handle v3 response shape
// plan: enterprise"
//
// memories → raw array for rendering, logging, or filtering:
// [{ key: "api-version", value: "On legacy v2 API — ...", source: "commit" },
// { key: "plan", value: "enterprise", source: "set" }]set()
When you already have the fact, store it directly. Profile data, plan tiers, preferences — anything with a clear key and value that your app already knows. No extraction, no LLM in the loop, no guessing.
// Profile data, plan tiers, preferences — anything with
// a clear key and value that your app already knows.
await memory.set("customer-42871", "plan", "enterprise");Putting it together
The full loop: recall before you generate, commit after. Three lines around the LLM call you already have.
import { memory } from "@withmemory/sdk";
import { generateText } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
// 1. Recall relevant context
const { memoryBlock } = await memory.recall({
userId,
input: userMessage,
});
// 2. Generate with memory in the system prompt
const { text } = await generateText({
model: anthropic("claude-sonnet-4-6"),
system: `You are a support agent for Acme.
Known facts about this user:
${memoryBlock}`,
prompt: userMessage,
});
// 3. Commit the turn so extraction can learn from it
await memory.commit({ userId, input: userMessage, output: text });Forget Me Not
Without memory, your AI agent isn't really an agent — it's a stateless model API call dressed up with a system prompt. Every conversation starts from scratch. Every user feels like a stranger.
Every session gets smarter. Every user feels known. That's what separates an agent from a model wrapped in a system prompt.
WithMemory is the memory layer. Two API calls, zero configuration. Self-host on any Postgres when you're ready.
TypeScript-first
Works identically on Node, Bun, Deno, and Cloudflare Workers. No Python runtime in the loop.
Conservative extraction
Ships with a default extraction prompt that favors precision over recall. Bring your own prompt when you need different tradeoffs.
Self-hostable promise
The server runs against any Postgres. No vendor lock-in, no SaaS-only magic.