Getting stuff done

Beyond Vector Search: How TEMPR Combines 4 Retrieval Strategies

TL;DR: TEMPR runs four retrieval strategies in parallel - semantic, keyword, graph, and temporal - then fuses results with Reciprocal Rank Fusion and reranks with a cross-encoder. Memories matching multiple strategies rank highest. On LongMemEval, this approach improved accuracy by 44.6 points over full-context baselines.


Vector similarity works well for paraphrasing and conceptual matches. But it fails on:

TEMPR solves this by running four retrieval strategies in parallel and fusing results.

The Four Strategies

1. Semantic Search (Vector Similarity)

Standard embedding-based retrieval. Query and memories are encoded as vectors, scored by cosine similarity:

score = v_query · v_memory / (||v_query|| ||v_memory||)

Uses HNSW indexing via pgvector. Good for conceptual matches - “Alice’s job” finds “Alice works as a software engineer” even without keyword overlap.

2. Keyword Search (BM25)

Full-text search using BM25 ranking over a GIN index. No embeddings involved - pure term frequency and inverse document frequency.

Best for:

You never miss results that mention exact terms.

3. Graph Traversal (Spreading Activation)

The memory graph connects entities through relationships. Graph retrieval uses breadth-first search with activation propagation - activation spreads along edges with decay.

Causal and entity edges get higher propagation weights (μ(ℓ) > 1), prioritizing explanatory connections.

Example: “What does Alice do?” → Alice (entity) → Google (employer) → Google’s products (via company edge)

This enables multi-hop reasoning that neither semantic nor keyword search can handle.

Parses time expressions and filters by occurrence dates. Uses hybrid parsing:

Handles queries like “What happened last spring?” or “meetings from Q3 2024” by filtering memories against time intervals.

Reciprocal Rank Fusion

After parallel execution, four ranked lists need merging. TEMPR uses Reciprocal Rank Fusion (RRF):

RRF(memory) = Σ 1/(k + rank_i(memory))

Where rank_i is the memory’s position in each strategy’s list, and k is a constant (typically 60).

Why RRF over score averaging?

A memory ranked #1 in two strategies beats a memory ranked #1 in one strategy.

Cross-Encoder Reranking

RRF produces a fused ranking, but it’s still based on individual strategy positions. The final step applies a neural cross-encoder (ms-marco-MiniLM-L-6-v2).

Unlike embeddings that encode query and memory separately, cross-encoders jointly encode both and output a relevance score. This models rich query-memory interactions learned from supervised ranking data.

The pipeline: 4 strategies → RRF fusion → top candidates → cross-encoder rerank → final results.

Why This Works

Each strategy catches what others miss:

Query TypeBest StrategyWhy Others Fail
“What’s Alice’s job?”SemanticParaphrasing
“Find mentions of TensorFlow”BM25Exact term match
“What does Alice’s company build?”GraphMulti-hop reasoning
“Updates from last week”TemporalDate filtering

Memories matching multiple strategies rank highest. If “Alice works at Google as an ML engineer” appears in semantic (job query), BM25 (Google), and graph (Alice entity) results, it gets boosted by RRF.

Benchmark Results

From the Hindsight paper:

LongMemEval:

LoCoMo:

The multi-strategy approach consistently outperforms single-strategy baselines.

Using TEMPR in Code

You don’t configure TEMPR directly - it runs automatically on every recall(). The budget parameter controls search depth:

 1from hindsight_client import Hindsight
 2
 3with Hindsight(base_url="http://localhost:8888") as client:
 4    # Low budget: faster, fewer candidates per strategy
 5    results = client.recall(
 6        bank_id="my-bank",
 7        query="What does Alice do at Google?",
 8        budget="low",
 9    )
10
11    # High budget: thorough, more candidates, better for complex queries
12    results = client.recall(
13        bank_id="my-bank",
14        query="What projects has Alice's team shipped since joining?",
15        budget="high",
16    )

The four strategies run in parallel, results fuse via RRF, and the cross-encoder reranks. You get the final ranked list.


Four strategies, each catching different query patterns. RRF fuses without calibration. Cross-encoder polishes the ranking. The result is retrieval that handles proper nouns, multi-hop reasoning, and temporal queries - not just semantic similarity.

Hindsight documentation | GitHub

#ai #agents #memory #hindsight #retrieval #rag

Reply to this post by email ↪