Getting stuff done

Opinions with Confidence Scores: How Agents Form Beliefs

TL;DR: Hindsight agents form opinions during reflect operations, storing beliefs with confidence scores from 0 to 1. Evidence accumulates over time, reinforcing or weakening opinions. Disposition traits shape how the same facts lead to different conclusions.


The Problem with Stateless Agents

Most LLM agents have no persistent opinions. Ask them the same question twice with slightly different context, and you get contradictory answers. They don’t accumulate expertise or develop consistent perspectives.

Hindsight introduces opinions - beliefs that form during reasoning, persist across sessions, and evolve as evidence accumulates. Each opinion carries a confidence score reflecting how firmly the agent holds that view.

What Opinions Are

Opinions are a memory type distinct from facts. In Hindsight’s taxonomy:

TypeNatureExample
worldObjective facts received“Redis uses BSD license”
experienceAgent’s interactions“I discussed caching with Alice”
opinionBeliefs with confidence“Redis is excellent for caching” (0.85)

The confidence score is the key differentiator. It’s a float between 0 and 1 representing belief strength:

An opinion isn’t just what the agent thinks - it’s what the agent thinks and how sure it is.

How Opinions Form

Opinions emerge during reflect() operations. When the agent reasons about a query, it may generate conclusions that get stored as opinions.

 1from hindsight import Hindsight
 2
 3client = Hindsight(api_url="http://localhost:8080")
 4
 5# Store some facts first
 6client.retain(
 7    bank_id="tech-advisor",
 8    content="Redis is an in-memory data store. It's open source under BSD license. Used by Twitter, GitHub, Pinterest for caching.",
 9    context="technical documentation"
10)
11
12# Reflect triggers opinion formation
13response = client.reflect(
14    bank_id="tech-advisor",
15    query="Should I use Redis for our caching layer?",
16    context="building a high-traffic web application",
17    budget="mid"
18)
19
20print(response.text)
21# Output: Based on the evidence, Redis is a strong choice for caching.
22# It's battle-tested at scale (Twitter, GitHub, Pinterest) and the BSD
23# license provides flexibility. For a high-traffic web app, the in-memory
24# architecture delivers the performance you need.

Behind the scenes, Hindsight forms an opinion like:

Opinion: "Redis is excellent for high-traffic caching workloads"
Confidence: 0.85
Entities: [Redis]

This happens asynchronously - opinions don’t appear in the reflect response directly. They influence future reflect calls.

The Reinforcement Mechanism

Opinions aren’t static. New evidence can strengthen, weaken, or contradict them. Hindsight classifies each new piece of evidence:

ClassificationEffect
ReinforceConfidence increases (capped at 1.0)
WeakenConfidence decreases moderately
ContradictConfidence decreases sharply (2x normal adjustment)
NeutralNo change

For contradictions, the opinion text itself may be revised.

Example: Opinion Evolution Over Time

Consider how an opinion about Redis might evolve:

 1from datetime import datetime
 2
 3# Day 1: Initial facts
 4client.retain(
 5    bank_id="tech-advisor",
 6    content="Redis is open source under BSD license. Very permissive.",
 7    timestamp=datetime(2024, 1, 1)
 8)
 9
10# Agent reflects and forms opinion with ~0.85 confidence:
11# "Redis is excellent for production use"
12
13# Day 30: License change
14client.retain(
15    bank_id="tech-advisor",
16    content="Redis changed to SSPL license. Not OSI-approved. Some companies prohibit SSPL in production.",
17    timestamp=datetime(2024, 1, 30)
18)
19
20# Opinion confidence drops to ~0.65
21# The contradiction triggers revision:
22# "Redis is good for caching but licensing may be a concern for some organizations"
23
24# Day 45: Community response
25client.retain(
26    bank_id="tech-advisor",
27    content="Valkey fork launched as BSD-licensed Redis alternative. Linux Foundation backing. Drop-in compatible.",
28    timestamp=datetime(2024, 2, 15)
29)
30
31# Opinion updates again (~0.80):
32# "Redis is good for caching; consider Valkey if SSPL licensing is problematic"

The agent’s recommendation evolves naturally. It doesn’t flip-flop randomly - it adjusts based on evidence weight. The confidence score tells you how settled the view is.

Disposition Shapes Opinion Formation

Two agents with identical facts can form different opinions based on their disposition. Three traits matter:

TraitLow (1)High (5)
SkepticismTrusting, accepts claims readilyQuestions everything, demands evidence
LiteralismFlexible interpretationStrict, exact interpretation
EmpathyDetached, fact-focusedConsiders emotional/social context

The same remote work study might produce:

Low skepticism bank: “Remote work improves productivity based on the Stanford study”

High skepticism bank: “Some evidence suggests remote work may improve productivity, though the Stanford study has methodological limitations worth considering”

Both see the same facts. Disposition determines how those facts translate into beliefs.

Setting Disposition

 1# Create a skeptical, literal advisor
 2client.create_bank(
 3    bank_id="cautious-advisor",
 4    disposition={
 5        "skepticism": 4,    # Questions claims
 6        "literalism": 4,    # Strict interpretation
 7        "empathy": 2        # Fact-focused
 8    }
 9)
10
11# Create a trusting, empathetic advisor
12client.create_bank(
13    bank_id="supportive-advisor",
14    disposition={
15        "skepticism": 2,    # More trusting
16        "literalism": 2,    # Flexible
17        "empathy": 4        # Considers context
18    }
19)

Using Opinions in Retrieval

Sometimes you want opinions, sometimes you don’t.

 1# Factual query - exclude opinions
 2facts = client.recall(
 3    bank_id="tech-advisor",
 4    query="What license does Redis use?",
 5    types=["world"]  # Only objective facts
 6)
 7
 8# Advisory query - include opinions
 9advice = client.recall(
10    bank_id="tech-advisor",
11    query="What should I use for caching?",
12    types=["world", "opinion"]  # Facts + beliefs
13)

For factual questions (“What is X?”), opinions might inject subjectivity you don’t want. For recommendations (“Should I use X?”), opinions carry the agent’s accumulated judgment.

Why This Matters

Traditional RAG returns documents. It doesn’t have beliefs. Every query is independent.

With opinion memory:

  1. Consistency: The agent maintains coherent views across conversations
  2. Evolution: Beliefs update naturally as evidence accumulates
  3. Transparency: Confidence scores reveal uncertainty
  4. Personality: Disposition makes agents feel less like search engines

An agent that’s been tracking a technology for months will have nuanced opinions that a fresh RAG query can’t match. It remembers the license change, the community response, the performance benchmarks - and has synthesized these into a perspective.


Opinions are trajectories, not snapshots. They start somewhere, gather evidence, and adjust. The confidence score is your window into how settled or uncertain the agent’s view is.

Hindsight documentation | GitHub

#ai #agents #memory #hindsight #llm

Reply to this post by email ↪