Getting stuff done

Retain, Recall, Reflect: The Three Operations of Agent Memory

TL;DR: Hindsight has three operations. Retain stores content and extracts facts. Recall retrieves memories using four search strategies. Reflect reasons over memories with disposition-influenced personality. Each serves a different purpose in the agent loop.


The Problem

LLM agents forget everything between sessions. You can stuff context into prompts, but that doesn’t scale. RAG helps, but vector search alone misses entity relationships, temporal context, and doesn’t form persistent beliefs.

Hindsight solves this with three operations: Retain, Recall, and Reflect.

Retain: Store and Extract

Retain takes unstructured content and turns it into searchable memory. It’s not just storage - an LLM extracts facts, identifies entities, and builds knowledge graph connections.

 1from hindsight_client import Hindsight
 2from datetime import datetime
 3
 4client = Hindsight(base_url="http://localhost:8888")
 5
 6# Store a conversation
 7client.retain(
 8    bank_id="user-123",
 9    content="Alice mentioned she's switching from Google to a startup next month. She's excited about the smaller team.",
10    context="casual conversation about career",
11    timestamp=datetime(2024, 12, 15),
12)

What happens under the hood:

Key parameters:

Use Retain after every conversation turn, when ingesting documents, or whenever you learn something the agent should remember.

Recall retrieves memories. Unlike basic RAG, it runs four search strategies in parallel and fuses results:

  1. Semantic: Conceptual similarity (paraphrasing, synonyms)
  2. Keyword (BM25): Exact name and term matching
  3. Graph: Entity relationships, indirect connections
  4. Temporal: Date parsing, time-range filtering
 1results = client.recall(
 2    bank_id="user-123",
 3    query="What's happening with Alice's career?",
 4    budget="high",
 5    max_tokens=4096,
 6    types=["world", "experience"],
 7)
 8
 9for r in results.results:
10    print(f"[{r.type}] {r.text}")

Output:

[world] Alice is leaving Google for a startup
[world] Alice prefers smaller teams
[experience] Discussed Alice's career change on Dec 15

Key parameters:

The token budget is important. Instead of “give me top 10 results”, you say “fill up to 4096 tokens with the best matches”. This integrates cleanly with context window management.

Reflect: Reason with Personality

Reflect combines recall with disposition-influenced reasoning. It retrieves relevant memories, applies the bank’s personality traits, and generates a response grounded in evidence.

1answer = client.reflect(
2    bank_id="user-123",
3    query="Should I ask Alice to join our startup?",
4    context="we're building a dev tools company, need senior engineers",
5    budget="low",
6)
7
8print(answer.text)

Output might be:

Based on what I know about Alice, she could be a good fit. She’s already planning to leave Google for a startup environment and has expressed preference for smaller teams. However, I don’t have information about her specific technical skills or salary expectations. Consider discussing the role details with her directly.

Disposition traits shape how the agent reasons:

A high-skepticism bank might add: “Though I’d verify her actual start date before making plans.”

Reflect also persists opinions. If the agent concludes “Alice is a strong candidate”, that belief gets stored and influences future queries.

When to Use Each

OperationUse WhenExample
RetainYou have new information to storeAfter each conversation turn, document ingestion
RecallYou need facts to build a promptBefore generating a response, fact-checking
ReflectYou need reasoned conclusionsRecommendations, decisions, personality-consistent responses

Typical Agent Loop

 1from hindsight_client import Hindsight
 2
 3def agent_turn(user_message: str, bank_id: str):
 4    with Hindsight(base_url="http://localhost:8888") as client:
 5        # 1. Store the user's message
 6        client.retain(
 7            bank_id=bank_id,
 8            content=f"User said: {user_message}",
 9            context="conversation",
10        )
11
12        # 2. Recall relevant context
13        results = client.recall(
14            bank_id=bank_id,
15            query=user_message,
16            budget="high",
17            max_tokens=2048,
18        )
19
20        # 3. Build prompt with memories
21        context = "\n".join([r.text for r in results.results])
22
23        # 4. Generate response (your LLM call)
24        response = generate_response(user_message, context)
25
26        # 5. Store the response
27        client.retain(
28            bank_id=bank_id,
29            content=f"I responded: {response}",
30            context="conversation",
31        )
32
33        return response

Or use Reflect for disposition-influenced reasoning:

 1def agent_turn_with_personality(user_message: str, bank_id: str):
 2    with Hindsight(base_url="http://localhost:8888") as client:
 3        # Store input
 4        client.retain(bank_id=bank_id, content=f"User: {user_message}")
 5
 6        # Reflect generates the response directly
 7        answer = client.reflect(
 8            bank_id=bank_id,
 9            query=user_message,
10            budget="low",
11        )
12
13        return answer.text

Memory Types

Hindsight categorizes memories into four types:

TypeWhat It IsExample
worldObjective facts received“Alice works at Google”
experienceAgent’s own interactions“I discussed Python with Alice”
opinionBeliefs with confidence“Python excels for ML” (0.85)
observationSynthesized entity profilesAuto-generated summaries about tracked entities

You can filter by type in Recall to get only facts (world) or only the agent’s experiences (experience).


Three operations, each with a clear purpose. Retain stores, Recall retrieves, Reflect reasons. The rest is just parameters.

Hindsight documentation | GitHub

#ai #agents #memory #hindsight #llm

Reply to this post by email ↪