All posts

30 posts

Why 10 million tokens is the only memory benchmark that matters

Hindsight is #1 on BEAM at the 10M token tier. At that scale, context-stuffing dies and only real memory architecture survives.

That's not how you do business

Supermemory gamed a memory benchmark for viral reach, then called it a social experiment. Why stunts erode trust and what real benchmarking looks like.

Open source is a trust system. AI is breaking the contract.

AI lets people contribute to open source without understanding what they change. The OSS trust model needs human attestation, not just AI disclosure.

Not all agents are the same: task agents vs interaction agents

Task agents and interaction agents need different memory stacks. Latency, retrieval quality, and error tolerance diverge in ways most frameworks ignore.

AI won't replace engineers, it will replace project managers

AI coding agents eliminate the translation layer between users and code - the exact role PMs fill. Engineers who own the full product loop will thrive.

It was never about the code

A coworker rebuilt my two weeks of UI work with Claude in one weekend. The grief was real, but it revealed: the code was never the point.

Human attention defragmentation: flow, fatigue, and AI coding

AI coding tools boost output but fragment attention. Running multiple agents in parallel erodes deep understanding and ownership of your own codebase.

RLM is half a paradigm

RLM solves within-session context rot for massive inputs but ignores cross-session memory. Production agents need both RLM and external memory systems.

Not all context is equal: hierarchical memory for AI agents

Not all context is equal. A three-tier hierarchy of mental models, observations, and raw facts solves RAG consistency by prioritizing canonical knowledge.

Cache the reasoning, not the answer

Agents pay a synthesis tax re-deriving the same answers repeatedly. Mental models pre-compute consolidated knowledge for O(1) retrieval as memory evolves.

Local, long term memory for OpenClaw agents

Hindsight's OpenClaw integration adds local, free long-term memory to your agents using auto-recall instead of unreliable tool-based retrieval.

From facts to insights: how observations work in Hindsight

Observations consolidate scattered facts into synthesized patterns via async LLM processing, with traceable evidence chains and mission-driven consolidation.

What learning actually means for AI agents

Raw fact retrieval breaks down when agents need to learn from experience, adapt to change, and infer conclusions from scattered signals across time.

File-based agent memory: great demo, good luck in prod

File-based agent memory benchmarks well on small datasets but hits context rot, multi-hop failures, and temporal query problems in production.

Background operations: what happens after retain()

How Hindsight processes memories after retain() - from fact extraction and opinion formation to observation regeneration. Sync by default, async for bulk.

Temporal reasoning: "when it happened" vs "when you learned it"

Most memory systems track one timestamp. Hindsight tracks when events occurred and when you learned about them, enabling temporal queries RAG cannot handle.

Document upserting: keeping evolving conversations fresh

Append-only memory creates duplicates when information changes. Document upserting with document_id enables clean replacement of outdated memories.

TEI for production: embeddings and cross-encoder reranking

How to offload Hindsight embeddings and cross-encoder reranking to HuggingFace TEI for production. Setup, tuning, and Kubernetes deployment guide.

Beyond vector search: how TEMPR combines 4 retrieval strategies

TEMPR runs semantic, keyword, graph, and temporal search in parallel, fuses results with RRF, and reranks with a cross-encoder. 44.6 points over baselines.

Rich fact extraction: preserving narrative, not just statements

Why sentence-level RAG chunks lose context. Hindsight extracts 2-5 narrative facts per conversation, preserving reasoning chains and causal relationships.

Retain, recall, reflect: the three operations of agent memory

Hindsight gives AI agents persistent memory through three operations: Retain stores and extracts facts, Recall runs multi-strategy search, Reflect reasons.

Opinions with confidence scores: how agents form beliefs

How AI agents form persistent beliefs with confidence scores that evolve as evidence accumulates. Disposition traits shape how facts become opinions.

Memory types in Hindsight: world, experience, opinion, observation

Hindsight organizes agent memory into four cognitive types: world facts, experiences, opinions with confidence scores, and auto-synthesized observations.

Drop SQLite: zero-dependency quick starts with pg0

Stop maintaining SQLite fallbacks for local dev. pg0 gives you real PostgreSQL via pip install with zero setup, pgvector included.

Token budgets vs top-k: a better way to fill context windows

Top-k retrieval returns unpredictable context sizes. Token budgets fill your LLM context window by actual token count for predictable, maximum-density results.

Hindsight vs traditional RAG: what you actually get

Traditional RAG does semantic search over chunks. Hindsight adds keyword, graph, and temporal retrieval plus entity tracking and persistent opinions.

pg0: zero-dependency PostgreSQL for development

pg0 is a single-binary CLI that downloads and runs PostgreSQL 16 with pgvector locally. No Docker, no brew, no system dependencies.

The reasoning agent: a different architecture for AI systems (part 1)

Why AI agents should split into two layers: a read-only reasoning agent that gathers context and decides, and an execution agent that validates and acts.

LongMemEval: debugging a 300MB JSON file dataset

A browser-based visualizer for the LongMemEval benchmark dataset that indexes and navigates 300MB of chat history to debug AI memory systems faster.

Code comments: humans vs agentic code

Code comments went from anti-pattern to optimization technique. AI-generated docstrings act as prompt boosters, cutting agentic coding iterations from 5-6 to 1-2.