Retain, Recall, Reflect: The Three Operations of Agent Memory
TL;DR: Hindsight has three operations. Retain stores content and extracts facts. Recall retrieves memories using four search strategies. Reflect reasons over memories with disposition-influenced personality. Each serves a different purpose in the agent loop.
The Problem
LLM agents forget everything between sessions. You can stuff context into prompts, but that doesn’t scale. RAG helps, but vector search alone misses entity relationships, temporal context, and doesn’t form persistent beliefs.
Hindsight solves this with three operations: Retain, Recall, and Reflect.
Retain: Store and Extract
Retain takes unstructured content and turns it into searchable memory. It’s not just storage - an LLM extracts facts, identifies entities, and builds knowledge graph connections.
1from hindsight_client import Hindsight
2from datetime import datetime
3
4client = Hindsight(base_url="http://localhost:8888")
5
6# Store a conversation
7client.retain(
8 bank_id="user-123",
9 content="Alice mentioned she's switching from Google to a startup next month. She's excited about the smaller team.",
10 context="casual conversation about career",
11 timestamp=datetime(2024, 12, 15),
12)What happens under the hood:
- Extracts facts: “Alice works at Google”, “Alice is joining a startup”, “Alice prefers smaller teams”
- Identifies entities: Alice (person), Google (company)
- Captures temporal info: event happening “next month” relative to December 2024
- Builds graph connections between Alice and Google
Key parameters:
content: The raw text to storecontext: Guides extraction - “career conversation” vs “technical discussion” affects what gets extractedtimestamp: When the event occurred (not when you’re storing it)document_id: Reusing the same ID replaces previous content (upsert behavior)
Use Retain after every conversation turn, when ingesting documents, or whenever you learn something the agent should remember.
Recall: Multi-Strategy Search
Recall retrieves memories. Unlike basic RAG, it runs four search strategies in parallel and fuses results:
- Semantic: Conceptual similarity (paraphrasing, synonyms)
- Keyword (BM25): Exact name and term matching
- Graph: Entity relationships, indirect connections
- Temporal: Date parsing, time-range filtering
1results = client.recall(
2 bank_id="user-123",
3 query="What's happening with Alice's career?",
4 budget="high",
5 max_tokens=4096,
6 types=["world", "experience"],
7)
8
9for r in results.results:
10 print(f"[{r.type}] {r.text}")Output:
[world] Alice is leaving Google for a startup
[world] Alice prefers smaller teams
[experience] Discussed Alice's career change on Dec 15Key parameters:
query: Natural language questionbudget: Search depth -"low"(fast),"mid"(balanced),"high"(thorough)max_tokens: Token budget for results, not arbitrary top-ktypes: Filter by memory type -["world", "experience", "opinion"]
The token budget is important. Instead of “give me top 10 results”, you say “fill up to 4096 tokens with the best matches”. This integrates cleanly with context window management.
Reflect: Reason with Personality
Reflect combines recall with disposition-influenced reasoning. It retrieves relevant memories, applies the bank’s personality traits, and generates a response grounded in evidence.
1answer = client.reflect(
2 bank_id="user-123",
3 query="Should I ask Alice to join our startup?",
4 context="we're building a dev tools company, need senior engineers",
5 budget="low",
6)
7
8print(answer.text)Output might be:
Based on what I know about Alice, she could be a good fit. She’s already planning to leave Google for a startup environment and has expressed preference for smaller teams. However, I don’t have information about her specific technical skills or salary expectations. Consider discussing the role details with her directly.
Disposition traits shape how the agent reasons:
- Skepticism (1-5): Trusting vs questioning claims
- Literalism (1-5): Flexible vs exact interpretation
- Empathy (1-5): Detached vs emotionally attuned
A high-skepticism bank might add: “Though I’d verify her actual start date before making plans.”
Reflect also persists opinions. If the agent concludes “Alice is a strong candidate”, that belief gets stored and influences future queries.
When to Use Each
| Operation | Use When | Example |
|---|---|---|
| Retain | You have new information to store | After each conversation turn, document ingestion |
| Recall | You need facts to build a prompt | Before generating a response, fact-checking |
| Reflect | You need reasoned conclusions | Recommendations, decisions, personality-consistent responses |
Typical Agent Loop
1from hindsight_client import Hindsight
2
3def agent_turn(user_message: str, bank_id: str):
4 with Hindsight(base_url="http://localhost:8888") as client:
5 # 1. Store the user's message
6 client.retain(
7 bank_id=bank_id,
8 content=f"User said: {user_message}",
9 context="conversation",
10 )
11
12 # 2. Recall relevant context
13 results = client.recall(
14 bank_id=bank_id,
15 query=user_message,
16 budget="high",
17 max_tokens=2048,
18 )
19
20 # 3. Build prompt with memories
21 context = "\n".join([r.text for r in results.results])
22
23 # 4. Generate response (your LLM call)
24 response = generate_response(user_message, context)
25
26 # 5. Store the response
27 client.retain(
28 bank_id=bank_id,
29 content=f"I responded: {response}",
30 context="conversation",
31 )
32
33 return responseOr use Reflect for disposition-influenced reasoning:
1def agent_turn_with_personality(user_message: str, bank_id: str):
2 with Hindsight(base_url="http://localhost:8888") as client:
3 # Store input
4 client.retain(bank_id=bank_id, content=f"User: {user_message}")
5
6 # Reflect generates the response directly
7 answer = client.reflect(
8 bank_id=bank_id,
9 query=user_message,
10 budget="low",
11 )
12
13 return answer.textMemory Types
Hindsight categorizes memories into four types:
| Type | What It Is | Example |
|---|---|---|
world | Objective facts received | “Alice works at Google” |
experience | Agent’s own interactions | “I discussed Python with Alice” |
opinion | Beliefs with confidence | “Python excels for ML” (0.85) |
observation | Synthesized entity profiles | Auto-generated summaries about tracked entities |
You can filter by type in Recall to get only facts (world) or only the agent’s experiences (experience).
Three operations, each with a clear purpose. Retain stores, Recall retrieves, Reflect reasons. The rest is just parameters.