Key Concepts for Implementing Memory in AI Agents

An agent without memory starts from scratch with each interaction and feels less intelligent. Memory is a key part of an agent’s context to be more useful. First, we need to define what type of memory we want to implement.

Types of Memory

Memory in agents is based on cognitive psychology, specifically the Atkinson–Shiffrin memory model

Short-term: Your brain’s notepad; in humans, it lasts very briefly between 0 and 20 seconds.
Semantic: Essentially the meaning of the environment, facts, definitions.
Episodic: Records of previous conversations: what was said, decisions made, important events.
Procedural: Past instructions, behaviors learned through repetition.

State and Memory

“State” is one of the most relevant concepts in LangGraph: a LangGraph application is modeled as a State Machine. An agent without state is simply an API call.

A call to an LLM API is a stateless operation. The state layer is precisely how we introduce memory into the agent, whether temporary or persistent, short-term or long-term. One could argue that the agent’s state is in itself memory, but clearly it’s not sufficient.

Transient or Persistent?

Transient: lives only during the current session; is deleted if the agent or process restarts. Useful for prototypes or agents that should not store data for privacy reasons.
- Can be understood as the context window or modeled as a notepad in the application state.
Persistent: survives restarts and multiple conversations; essential when you want long-term personalization or traceability. Recent systems and papers (e.g., Mem0) show clear improvements in coherence and cost when using structured persistent memories. (arXiv)

Three Key Questions

What do we want to save: facts, conversation summaries, decisions, failures/successes.
How we save it: relational database, vector DB, graph, or a combination. Do we use a tool or an asynchronous consolidation process.
When we save it: immediately, at intervals, at the end of the session, or through asynchronous consolidation processes.

What We Want to Save

User data (semantic): name, preferences, roles. These are “small” and stored as CRUD records—low cost and high precision.
Long conversations (episodic): it’s not convenient to save every token. Instead:
- Chunking + embeddings for semantic searches when you need to retrieve specific chunks.
- Incremental summarization: at the end of N turns, generate a condensed summary that represents the essential part and store only the summary; keep expandable notes if reconstruction is required.
Complex relationships: if you need to reason about relationships between entities (e.g., “customer X requested this in 2023 and repeated it in 2025”), consider a memory graph or relational representation that captures edges/properties. Some recent work proposes graph-based memory to capture temporal/causal relationships. (arXiv, ResearchGate)

How We Will Save It

There is no “correct database”; the decision depends on consistency, latency, cost, and audit requirements. Common and robust pattern:

Working memory, scratchpad:

Redis or in-memory store for the last N turns (fast, cheap, cleared on restart).
Or we can simply store this in the application state.

Semantic and fact memory:

Document store / relational store for facts (e.g., Mongo, Postgres).
Vector DB (Pinecone, Weaviate, Milvus, or Atlas Vector Search) for embeddings and similarity searches.

Consolidation, summarization, and reasoning:

Background workers (cron, jobs) that: extract, condense, version, and write long-term persistent memory.
Store in a vector store or knowledge graph.

Entities and relationships:

In a graph with its own ontology, like neo4j

Memory layer as a service

Projects like Mem0 deliver an “off-the-shelf” layer for dynamic extraction, consolidation, and retention policies; can reduce engineering time and cost-to-serve in production. (mem0.ai, arXiv)

When to Save It

Immediate: when a critical event occurs (e.g., preference change, purchase), the agent autonomously decides what and when to save.
Turn counting: save only every N turns to avoid giving the agent the responsibility. We can keep a counter in the state that triggers the memory process.
Idle-trigger: when you detect inactivity, consolidate the session.
Batch/asynchronous: for long conversations, send background jobs that summarize and clean before persisting.

How We Will Consume It

Load in system prompt: useful for small and critical data (user persona, constraints).
Conditional retrieval: use signals (intent, slot-fill, topic change) to decide what memory to fetch.
Tool-based access: the agent calls a “tool” (API) that returns relevant memories (instead of loading everything in the prompt). This separates responsibilities and enables caching, pagination, and authorization.
Hybrid decision: load context summary + do additional retrieval when the response requires it.

MongoDB and other providers have published patterns and SDKs to enable cross-session memory and checkpointers in agent frameworks (e.g., LangGraph). If you work with LangGraph, check the available datastore/Store integrations to avoid reinventing the wheel. (MongoDB)

Recommended Tools, Readings, and Resources

Mem0 — memory as a layer (repository and docs). Recommended if you’re looking for a production-ready solution with control and traceability. (GitHub, docs.mem0.ai)
Mem0 (paper/arXiv) — systematic evaluation and results on latency and token-cost. Useful for justifying architectural decisions. (arXiv)
Hugging Face — posts on memory and how to reduce memory usage in models — good conceptual context and optimization patterns. (Hugging Face)
Memory Architectures in Long-Term AI Agents (research) — paper on integrating episodic/semantic/procedural memory inspired by human cognition. Good theoretical support. (ResearchGate)
MongoDB + LangGraph — examples and integrations (MongoDB Store for LangGraph) if you work with LangGraph and are looking for a scalable persistent memory solution. (MongoDB)