December 8, 2025

#AI #agents #memory #hindsight #LLM #RAG

Hindsight vs traditional RAG: what you actually get

TL;DR: Traditional RAG uses semantic similarity search over document chunks. Hindsight runs four retrieval strategies in parallel (semantic + keyword + graph + temporal), maintains entity relationships, handles time expressions, and forms persistent opinions. Different tools for different problems.

What RAG Actually Does

Traditional RAG is straightforward:

Chunk your documents
Embed chunks into vectors
At query time, find the top-k most similar chunks
Stuff them into the LLM prompt

This works well for static document Q&A. “What’s the refund policy?” finds the refund section. “How do I reset my password?” retrieves the password docs.

But semantic similarity has limits.

Where Vector Search Falls Short

Exact Names and Terms

Query: “What did Alice Chen say about the API redesign?”

Vector search finds chunks semantically related to “API redesign.” But “Alice Chen” is a proper noun - it needs exact matching, not semantic similarity. BM25 keyword search handles this better.

Time Expressions

Query: “What happened last spring?”

Vector search has no concept of time. It might retrieve chunks containing the word “spring” or semantically similar terms. It can’t parse “last spring” into a date range and filter accordingly.

Multi-Hop Connections

Your knowledge base contains:

“Alice is the tech lead on Project Atlas”
“Project Atlas uses Kubernetes”
“The Kubernetes cluster had an outage Tuesday”

Query: “Was Alice affected by any infrastructure issues?”

Vector search retrieves chunks similar to “Alice” and “infrastructure issues.” It can’t traverse Alice → Project Atlas → Kubernetes → outage. That requires entity relationships and graph traversal.

What Hindsight Does Differently

Four Retrieval Strategies

Instead of semantic-only, Hindsight runs four searches in parallel:

Strategy	Handles
Semantic	Conceptual similarity, paraphrasing
Keyword (BM25)	Exact names, technical terms
Graph	Entity relationships, multi-hop reasoning
Temporal	Date parsing, time-range filtering

Results merge via reciprocal rank fusion (RRF), which doesn’t require score calibration across systems. Then a cross-encoder reranks the combined results.

Entity Resolution

Hindsight tracks entities across conversations. “Alice,” “Alice Chen,” and “Alice C.” resolve to the same canonical entity through:

String similarity (Levenshtein distance)
Co-occurrence patterns
Temporal proximity

This builds a knowledge graph where facts connect through shared entities, not just embedding proximity.

Temporal Understanding

Every fact stores two timestamps:

Occurrence time: When the event happened
Mention time: When you learned about it

A fact retained in January 2025 about “Alice got married in June 2024” can answer both:

“What did Alice do in 2024?” (occurrence-based)
“What did I learn recently?” (mention-based)

Queries like “last spring” or “before the merger” get parsed into date ranges and matched against occurrence intervals.

Graph Traversal

The memory graph has four edge types:

Entity links: Same canonical entity
Temporal links: Close in time (exponential decay)
Semantic links: High embedding similarity
Causal links: Cause-effect relationships

Queries trigger spreading activation across these links, surfacing indirectly connected facts that pure vector search misses.

Concrete Example

 1# Store facts about a project
 2client.retain(
 3    bank_id="my-bank",
 4    content="Alice is the tech lead on Project Atlas. Project Atlas launched in March 2024 and uses Kubernetes for orchestration.",
 5    context="team documentation"
 6)
 7
 8client.retain(
 9    bank_id="my-bank",
10    content="The Kubernetes cluster experienced a 2-hour outage on Tuesday affecting multiple services.",
11    context="incident report",
12    timestamp="2024-12-17"
13)
14
15# Query that requires multi-hop reasoning
16results = client.recall(
17    bank_id="my-bank",
18    query="Was Alice affected by any recent infrastructure issues?",
19    max_tokens=4096
20)

Traditional RAG: Retrieves chunks about Alice OR infrastructure issues. Misses the connection.

Hindsight: Traverses Alice → Project Atlas → Kubernetes → outage. Returns both the team structure and the incident.

Memory Type Separation

RAG treats everything as “document chunks.” Hindsight separates:

Type	What It Is
World	Objective facts received
Experience	Agent’s own interactions
Opinion	Beliefs with confidence scores
Observation	Synthesized entity profiles

You can filter retrieval by type. Want only objective facts? types=["world"]. Want the agent’s formed beliefs? types=["opinion"].

Persistent Opinions

RAG is stateless. Same query, same chunks, same response (modulo LLM variance).

Hindsight forms opinions during reasoning that persist across sessions. These have confidence scores that evolve:

Supporting evidence increases confidence
Contradictions decrease it (with 2x penalty)

An agent that’s been tracking a technology for months develops nuanced views that fresh retrieval can’t replicate.

When to Use Which

Traditional RAG works for:

Static document Q&A
Simple semantic matching
Stateless, one-off queries
When you don’t need entity tracking or temporal reasoning

Hindsight adds value for:

Conversational agents with memory
Queries requiring temporal reasoning (“last quarter,” “before the reorg”)
Multi-hop questions across entity relationships
Applications needing consistent agent personality
Long-horizon contexts where facts accumulate over time

The Complexity Trade-off

Hindsight is more complex than a vector store. You’re running four retrieval strategies, maintaining a knowledge graph, handling entity resolution, and managing opinion evolution.

In my opinion, the complexity is justified when your use case actually needs it. If you’re building a chatbot that answers questions about static docs, RAG is simpler and works fine. If you’re building an agent that remembers conversations, tracks entities over time, and should develop consistent perspectives - that’s where the extra machinery pays off.

RAG does one thing: semantic similarity search. Hindsight does four things in parallel and maintains structured memory on top. Pick based on what your use case actually requires.

Hindsight documentation | GitHub