December 14, 2025

Retain, recall, reflect: the three operations of agent memory

TL;DR: Hindsight has three operations. Retain stores content and extracts facts. Recall retrieves memories using four search strategies. Reflect reasons over memories with disposition-influenced personality. Each serves a different purpose in the agent loop.

The Problem

LLM agents forget everything between sessions. You can stuff context into prompts, but that doesn’t scale. RAG helps, but vector search alone misses entity relationships, temporal context, and doesn’t form persistent beliefs.

Hindsight solves this with three operations: Retain, Recall, and Reflect.

Retain: Store and Extract

Retain takes unstructured content and turns it into searchable memory. It’s not just storage - an LLM extracts facts, identifies entities, and builds knowledge graph connections.

 1from hindsight_client import Hindsight
 2from datetime import datetime
 3
 4client = Hindsight(base_url="http://localhost:8888")
 5
 6# Store a conversation
 7client.retain(
 8    bank_id="user-123",
 9    content="Alice mentioned she's switching from Google to a startup next month. She's excited about the smaller team.",
10    context="casual conversation about career",
11    timestamp=datetime(2024, 12, 15),
12)

What happens under the hood:

Extracts facts: “Alice works at Google”, “Alice is joining a startup”, “Alice prefers smaller teams”
Identifies entities: Alice (person), Google (company)
Captures temporal info: event happening “next month” relative to December 2024
Builds graph connections between Alice and Google

Key parameters:

content: The raw text to store
context: Guides extraction - “career conversation” vs “technical discussion” affects what gets extracted
timestamp: When the event occurred (not when you’re storing it)
document_id: Reusing the same ID replaces previous content (upsert behavior)

Use Retain after every conversation turn, when ingesting documents, or whenever you learn something the agent should remember.

Recall: Multi-Strategy Search

Recall retrieves memories. Unlike basic RAG, it runs four search strategies in parallel and fuses results:

Semantic: Conceptual similarity (paraphrasing, synonyms)
Keyword (BM25): Exact name and term matching
Graph: Entity relationships, indirect connections
Temporal: Date parsing, time-range filtering

 1results = client.recall(
 2    bank_id="user-123",
 3    query="What's happening with Alice's career?",
 4    budget="high",
 5    max_tokens=4096,
 6    types=["world", "experience"],
 7)
 8
 9for r in results.results:
10    print(f"[{r.type}] {r.text}")

Output:

[world] Alice is leaving Google for a startup
[world] Alice prefers smaller teams
[experience] Discussed Alice's career change on Dec 15

Key parameters:

query: Natural language question
budget: Search depth - "low" (fast), "mid" (balanced), "high" (thorough)
max_tokens: Token budget for results, not arbitrary top-k
types: Filter by memory type - ["world", "experience", "opinion"]

The token budget is important. Instead of “give me top 10 results”, you say “fill up to 4096 tokens with the best matches”. This integrates cleanly with context window management.

Reflect: Reason with Personality

Reflect combines recall with disposition-influenced reasoning. It retrieves relevant memories, applies the bank’s personality traits, and generates a response grounded in evidence.

1answer = client.reflect(
2    bank_id="user-123",
3    query="Should I ask Alice to join our startup?",
4    context="we're building a dev tools company, need senior engineers",
5    budget="low",
6)
7
8print(answer.text)

Output might be:

Based on what I know about Alice, she could be a good fit. She’s already planning to leave Google for a startup environment and has expressed preference for smaller teams. However, I don’t have information about her specific technical skills or salary expectations. Consider discussing the role details with her directly.

Disposition traits shape how the agent reasons:

Skepticism (1-5): Trusting vs questioning claims
Literalism (1-5): Flexible vs exact interpretation
Empathy (1-5): Detached vs emotionally attuned

A high-skepticism bank might add: “Though I’d verify her actual start date before making plans.”

Reflect also persists opinions. If the agent concludes “Alice is a strong candidate”, that belief gets stored and influences future queries.

When to Use Each

Operation	Use When	Example
Retain	You have new information to store	After each conversation turn, document ingestion
Recall	You need facts to build a prompt	Before generating a response, fact-checking
Reflect	You need reasoned conclusions	Recommendations, decisions, personality-consistent responses

Typical Agent Loop

 1from hindsight_client import Hindsight
 2
 3def agent_turn(user_message: str, bank_id: str):
 4    with Hindsight(base_url="http://localhost:8888") as client:
 5        # 1. Store the user's message
 6        client.retain(
 7            bank_id=bank_id,
 8            content=f"User said: {user_message}",
 9            context="conversation",
10        )
11
12        # 2. Recall relevant context
13        results = client.recall(
14            bank_id=bank_id,
15            query=user_message,
16            budget="high",
17            max_tokens=2048,
18        )
19
20        # 3. Build prompt with memories
21        context = "\n".join([r.text for r in results.results])
22
23        # 4. Generate response (your LLM call)
24        response = generate_response(user_message, context)
25
26        # 5. Store the response
27        client.retain(
28            bank_id=bank_id,
29            content=f"I responded: {response}",
30            context="conversation",
31        )
32
33        return response

Or use Reflect for disposition-influenced reasoning:

 1def agent_turn_with_personality(user_message: str, bank_id: str):
 2    with Hindsight(base_url="http://localhost:8888") as client:
 3        # Store input
 4        client.retain(bank_id=bank_id, content=f"User: {user_message}")
 5
 6        # Reflect generates the response directly
 7        answer = client.reflect(
 8            bank_id=bank_id,
 9            query=user_message,
10            budget="low",
11        )
12
13        return answer.text

Memory Types

Hindsight categorizes memories into four types:

Type	What It Is	Example
`world`	Objective facts received	“Alice works at Google”
`experience`	Agent’s own interactions	“I discussed Python with Alice”
`opinion`	Beliefs with confidence	“Python excels for ML” (0.85)
`observation`	Synthesized entity profiles	Auto-generated summaries about tracked entities

You can filter by type in Recall to get only facts (world) or only the agent’s experiences (experience).

Three operations, each with a clear purpose. Retain stores, Recall retrieves, Reflect reasons. The rest is just parameters.

Hindsight documentation | GitHub