withmemory

Zero configuration agent memory. Two memory calls around the LLM call you already have. Every user gets persistent context.

Get Started

commit()

Send conversation turns to commit() — one at a time or whenever it fits your flow. Extraction runs async on our side and decides what's worth remembering. System default extraction favors conservative, durable facts — override with your own prompt when you need different tradeoffs.

import { memory } from "@withmemory/sdk";

const userMessage = "Our exports are failing again. We're still on the "
  + "legacy v2 API — our ETL pipeline can't handle the v3 response shape.";

const agentReply = "I've flagged the v2 dependency in the ticket. "
  + "We'll keep that in mind for any changes to the export pipeline.";

await memory.commit({
  userId: "customer-42871",
  input: userMessage,
  output: agentReply,
});

// Extraction ran. One memory was kept:
//
//   { key: "api-version", value: "On legacy v2 API — ETL pipeline
//     can't handle v3 response shape", source: "commit" }
//
// The rest of the turn was conversational filler and got dropped.

recall()

Call recall() when you need context about a user — usually right before generating a response. You get back a memoryBlock: the most relevant memories, ranked by semantic similarity, pre-formatted as a string you drop into your system prompt. Always a string, always under 150 tokens, always safe to concatenate.

// Three weeks later — new ticket, different issue.
const { memoryBlock, memories } = await memory.recall({
  userId: "customer-42871",
  input: "Why are we getting 429s on /exports this morning?",
});

// memoryBlock → key: value pairs, one per line, ready for a prompt:
//
//   "api-version: On legacy v2 API — ETL pipeline can't handle v3 response shape
//    plan: enterprise"
//
// memories → raw array for rendering, logging, or filtering:
//   [{ key: "api-version", value: "On legacy v2 API — ...", source: "commit" },
//    { key: "plan", value: "enterprise", source: "set" }]

set()

When you already have the fact, store it directly. Profile data, plan tiers, preferences — anything with a clear key and value that your app already knows. No extraction, no LLM in the loop, no guessing.

// Profile data, plan tiers, preferences — anything with
// a clear key and value that your app already knows.
await memory.set("customer-42871", "plan", "enterprise");

Putting it together

The full loop: recall before you generate, commit after. Three lines around the LLM call you already have.

import { memory } from "@withmemory/sdk";
import { generateText } from "ai";
import { anthropic } from "@ai-sdk/anthropic";

// 1. Recall relevant context
const { memoryBlock } = await memory.recall({
  userId,
  input: userMessage,
});

// 2. Generate with memory in the system prompt
const { text } = await generateText({
  model: anthropic("claude-sonnet-4-6"),
  system: `You are a support agent for Acme.

Known facts about this user:
${memoryBlock}`,
  prompt: userMessage,
});

// 3. Commit the turn so extraction can learn from it
await memory.commit({ userId, input: userMessage, output: text });

Forget Me Not

Without memory, your AI agent isn't really an agent — it's a stateless model API call dressed up with a system prompt. Every conversation starts from scratch. Every user feels like a stranger.

Every session gets smarter. Every user feels known. That's what separates an agent from a model wrapped in a system prompt.

WithMemory is the memory layer. Two API calls, zero configuration. Self-host on any Postgres when you're ready.

TypeScript-first

Works identically on Node, Bun, Deno, and Cloudflare Workers. No Python runtime in the loop.

Conservative extraction

Ships with a default extraction prompt that favors precision over recall. Bring your own prompt when you need different tradeoffs.

Self-hostable promise

The server runs against any Postgres. No vendor lock-in, no SaaS-only magic.