Background Operations: What Happens After retain()

2025-12-24

TL;DR: Hindsight’s retain() is fast and synchronous by default - memories are searchable immediately. For bulk ingestion scenarios, optional async mode lets you fire-and-forget while background workers handle batch processing, opinion formation, and observation regeneration.

Sync by Default, Async When You Need It

By default, retain() runs synchronously and fast - the extraction pipeline is optimized for real-time use. Most conversations complete in hundreds of milliseconds, and memories are searchable immediately after the call returns.

 1from hindsight_client import Hindsight
 2
 3client = Hindsight(base_url="http://localhost:8888")
 4
 5# Synchronous (default) - blocks until fully processed
 6client.retain(
 7    bank_id="my-agent",
 8    messages=[
 9        {"role": "user", "content": "Alice is moving from Google to Stripe next month."}
10    ]
11)
12# Memories are now searchable immediately

The pipeline that runs on each retain:

Fact extraction: LLM parses content into discrete facts
Entity identification: Finds or creates entity nodes
Graph connections: Links facts to entities
Opinion handling: Checks if new facts affect existing opinions
Observation regeneration: Updates entity summaries if needed

For bulk ingestion scenarios where you don’t need immediate availability, async mode lets you fire-and-forget:

1# Async - returns immediately, processing happens in background
2client.retain(
3    bank_id="my-agent",
4    messages=[...],
5    retain_async=True  # Optional async mode
6)

retain_batch: Bulk Ingestion

For high-volume scenarios - document ingestion, conversation history import, log processing - calling retain() in a loop is inefficient. retain_batch() groups multiple items into a single request.

 1from hindsight_client import Hindsight
 2
 3client = Hindsight(base_url="http://localhost:8888")
 4
 5# Batch multiple items
 6items = [
 7    {
 8        "content": "Meeting notes from standup: discussed Q1 roadmap",
 9        "context": "team meeting",
10        "timestamp": "2024-12-18T09:00:00Z"
11    },
12    {
13        "content": "Alice mentioned she's leaving Google for Stripe",
14        "context": "casual conversation",
15        "timestamp": "2024-12-18T10:30:00Z"
16    },
17    {
18        "content": "Reviewed pull request #432 - needs refactoring",
19        "context": "code review",
20        "timestamp": "2024-12-18T14:00:00Z"
21    },
22]
23
24client.retain_batch(bank_id="my-agent", items=items, retain_async=True)

What changes with batching:

Single network roundtrip instead of N calls
Grouped queue submission: Items enter the processing queue together
Shared LLM context: The extraction model can see relationships across items in the same batch

The last point matters. If three items mention “Alice,” the extraction model processes them with that context, improving entity resolution accuracy.

In my experience, batch sizes between 10-50 items hit the sweet spot. Beyond 50, you start hitting context window limits on the extraction model.

form_opinion: When Beliefs Are Born

Opinions don’t come from retain(). They emerge during reflect() when the agent reasons to conclusions.

The flow:

reflect() called
    ↓
Recall retrieves relevant memories
    ↓
LLM reasons over evidence
    ↓
Response generated
    ↓
[Background] Opinion extraction runs
    ↓
New opinions stored with confidence scores

 1# Facts about Redis exist in memory...
 2
 3response = client.reflect(
 4    bank_id="tech-advisor",
 5    query="Should we use Redis for our caching layer?",
 6    context="building high-traffic API",
 7    budget="mid"
 8)
 9
10# Response returned to user immediately
11print(response.text)
12
13# Background: opinion forms
14# "Redis is well-suited for high-traffic caching workloads" (confidence: 0.82)

The opinion formation happens asynchronously after the response is returned. A dedicated worker:

Parses the reflect response for conclusive statements
Classifies them as opinion-worthy (subjective, belief-expressing)
Assigns initial confidence based on evidence strength
Links opinions to relevant entities (Redis, in this case)
Stores in the opinion table

Initial confidence calculation considers:

Evidence density: More supporting facts → higher confidence
Source reliability: Recent facts weighted higher
Disposition: High-skepticism banks start with lower confidence

Opinion formation runs as part of the reflect() call, so opinions are available in subsequent recalls.

reinforce_opinion: Beliefs Evolve

When new facts arrive that relate to existing opinions, the reinforcement pipeline activates. This is the mechanism behind evolving beliefs.

retain() stores new fact
    ↓
[Background] Fact extraction
    ↓
Entity linking identifies related opinions
    ↓
Each related opinion enters reinforcement queue
    ↓
LLM classifies relationship:
    - REINFORCE: +confidence
    - WEAKEN: -confidence
    - CONTRADICT: -confidence (2x), revise text
    - NEUTRAL: no change

Concrete example:

 1# Existing opinion: "Redis is excellent for caching" (0.85)
 2
 3# New fact arrives
 4client.retain(
 5    bank_id="tech-advisor",
 6    content="Redis Labs changed licensing to SSPL. Many companies prohibit SSPL in production environments.",
 7    timestamp=datetime(2024, 3, 21)
 8)
 9
10# Background reinforcement process:
11# 1. Extracts fact: "Redis uses SSPL license, problematic for some orgs"
12# 2. Links to entity: Redis
13# 3. Finds opinion: "Redis is excellent for caching" (0.85)
14# 4. Classifies: CONTRADICT
15# 5. Adjusts: confidence → 0.65
16# 6. Revises text: "Redis is effective for caching but SSPL licensing may be problematic for some organizations"

The 2x multiplier on contradictions is deliberate. In my opinion, contradictory evidence should carry more weight than confirmatory evidence - it’s the Popperian falsification principle applied to agent beliefs.

The reinforcement worker also handles opinion merging. If multiple opinions converge on the same topic, they can be consolidated:

Opinion A: "Python is good for ML" (0.75)
Opinion B: "Python excels at data science" (0.80)
                    ↓
Merged: "Python is well-suited for ML and data science workloads" (0.82)

Observation Regeneration

Observations are synthesized entity profiles. They’re not created during retain - they’re regenerated when an entity accumulates enough facts.

The trigger conditions:

Entity has ≥ 5 facts (minimum for meaningful summary)
New facts added since last observation
Regeneration interval passed (prevents thrashing)

New fact stored about entity "Alice"
    ↓
Check: Alice fact count ≥ 5? Yes (now 8)
    ↓
Check: New facts since last observation? Yes (2 new)
    ↓
Check: Regeneration cooldown passed? Yes
    ↓
Queue observation regeneration for "Alice"
    ↓
[Worker] Collect all Alice facts
    ↓
[Worker] LLM synthesizes into coherent profile
    ↓
Store/update observation

 1# After multiple retains about Alice...
 2
 3results = client.recall(
 4    bank_id="my-agent",
 5    query="Tell me about Alice",
 6    include_entities=True,
 7    max_entity_tokens=500
 8)
 9
10# Observation returned:
11# "Alice is a software engineer transitioning from Google to Stripe.
12#  She specializes in infrastructure and is enthusiastic about payments
13#  systems. Previously worked on distributed systems at Google."

The cooldown prevents excessive regeneration when facts arrive in bursts. If you retain_batch 20 items about Alice, you don’t want 20 regeneration jobs.

Monitoring Async Operations

When using async mode (retain_async=True), you can monitor processing status:

1status = client.get_bank_status(bank_id="my-agent")
2
3print(f"Pending extractions: {status.pending_extractions}")
4print(f"Pending opinions: {status.pending_opinion_jobs}")
5print(f"Pending observations: {status.pending_observations}")

This is useful during bulk ingestion to track progress or wait for completion before running queries on the newly ingested data.

When to Use Async Mode

Sync mode (the default) works well for most use cases - it’s fast and gives you immediate consistency. But async mode shines in specific scenarios:

Bulk ingestion: When importing thousands of documents or conversation histories, async lets you submit everything quickly without waiting for each extraction.

Fire-and-forget logging: If you’re logging agent interactions where immediate recall isn’t needed, async reduces latency in the main request path.

High-throughput pipelines: When processing streams of data where you need maximum ingestion rate and can tolerate eventual consistency.

For typical agent loops where you retain a conversation and might recall from it in the same session, sync mode is the right choice - memories are available immediately.

Sync by default, async when you need it. For bulk ingestion use retain_batch() with retain_async=True. Opinion formation happens after reflect(), opinion reinforcement triggers when new facts relate to existing beliefs, and observation regeneration keeps entity profiles current. The pipeline is fast enough for real-time use, with optional async mode for high-throughput scenarios.

Hindsight documentation | GitHub

#ai #agents #memory #hindsight #llm

Reply to this post by email ↪