Getting stuff done

Rich Fact Extraction: Preserving Narrative, Not Just Statements

TL;DR: Traditional RAG fragments text into isolated statements, losing context. Hindsight extracts 2-5 narrative facts per conversation that preserve emotions, reasoning chains, and causal relationships. The context parameter guides what gets extracted.


The Fragmentation Problem

Most RAG systems chunk text and store isolated statements. A conversation like:

“Alice and Bob discussed naming their summer party playlist. Bob suggested ‘Summer Vibes’ because it’s catchy, but Alice wanted something unique. They ultimately decided on ‘Beach Beats’ for its playful tone.”

Gets fragmented into:

Query “Why did they choose Beach Beats?” and you might get “They decided on Beach Beats” - which answers nothing. The reasoning chain is gone.

Narrative Extraction

Hindsight takes a different approach. Instead of sentence-level fragments, it uses coarse-grained chunking to produce 2-5 comprehensive facts per conversation. Each fact is narrative and self-contained, preserving the pragmatic flow.

From the same conversation, Hindsight extracts something like:

“Alice and Bob chose ‘Beach Beats’ as their summer party playlist name. Bob initially suggested ‘Summer Vibes’ for its catchiness, but Alice preferred something unique. They settled on ‘Beach Beats’ because of its playful tone.”

One fact. Complete context. The reasoning chain survives.

What Gets Extracted

Hindsight captures multiple dimensions beyond surface statements:

DimensionExample
Core factsAlice joined Google in spring
Emotional contextShe was thrilled about the opportunity
Reasoning chainsShe chose it specifically for research opportunities
Causal relationshipsThe research focus caused her excitement

From “Alice joined Google last spring and was thrilled about the research opportunities,” a later query “Why did Alice join Google?” returns meaningful context - not just “Alice joined Google.”

The Extraction Pipeline

Under the hood, retain() runs content through six processing steps:

  1. Coreference resolution - Identifies entity mentions across turns (“she” → “Alice”)
  2. Temporal normalization - Converts “last week” into absolute timestamps
  3. Participant attribution - Determines who said what
  4. Reasoning preservation - Maintains explicit justifications and cause-effect links
  5. Fact classification - Assigns to world facts, experiences, opinions, or observations
  6. Entity extraction - Identifies people, organizations, locations, products, concepts

Each extracted fact includes temporal ranges, confidence scores (for opinions), and embeddings for multi-modal retrieval.

Context Guides Extraction

The context parameter isn’t just metadata - it shapes what gets extracted.

 1from hindsight import Hindsight
 2
 3client = Hindsight(api_url="http://localhost:8080")
 4
 5# Same content, different contexts
 6content = "Alice mentioned she's leaving Google. The team dynamics changed after the reorg."
 7
 8# Career-focused extraction
 9client.retain(
10    bank_id="advisor",
11    content=content,
12    context="career discussion"
13)
14# Extracts: Alice is leaving Google, potentially due to organizational changes
15
16# Team dynamics focus
17client.retain(
18    bank_id="advisor",
19    content=content,
20    context="team health assessment"
21)
22# Extracts: Team dynamics shifted after reorganization, causing departures

Context tells the memory bank what to focus on and how to interpret ambiguous content. “Career discussion” emphasizes Alice’s decision. “Team health assessment” emphasizes the organizational impact.

Hindsight explicitly tracks cause-effect relationships. The knowledge graph maintains:

This means queries like “Why did Alice leave?” can trace through reasoning chains, not just pattern-match on keywords.

World Facts vs Experiences

The extraction distinguishes between two fundamental categories:

TypeWhat It IsExample
WorldObjective information received“Alice works at Google”
ExperienceAgent’s own interactions“I discussed Python with Alice”

World facts are things the agent learned. Experiences are things the agent participated in. This distinction matters for retrieval - sometimes you want objective facts, sometimes you want interaction history.

 1# Get only objective facts
 2results = client.recall(
 3    bank_id="advisor",
 4    query="Where does Alice work?",
 5    types=["world"]
 6)
 7
 8# Get interaction history
 9results = client.recall(
10    bank_id="advisor",
11    query="What have I discussed with Alice?",
12    types=["experience"]
13)

Why This Matters

I think the difference between fragment-based and narrative-based extraction becomes obvious at query time. Fragment-based systems return statements. Narrative-based systems return understanding.

Ask “What’s the relationship between Alice and Bob’s project?” A fragment system might return:

A narrative system returns:

Same underlying data. Dramatically different utility.


The extraction layer is where memory quality is won or lost. Narrative preservation means queries return context, not fragments. That’s what makes downstream reasoning possible.

Hindsight documentation | GitHub

#ai #agents #memory #hindsight #llm

Reply to this post by email ↪