From bad NPC context to a self-correcting vector memory

April 6, 2026

I wanted to build a game with lots of AI NPC interactions. The issue was not just retrieval speed. The issue was bad context showing up at the wrong time.

You query memory, get a result, and the model basically says: this is bad context, I need better context. Then you run a wider search. That wider search is expensive and it also compounds latency when many NPCs are talking at once.

The thought was simple: if some memory keeps producing bad context, stop letting it stay in the hottest retrieval region. Move it farther away. Keep the trusted context near the center.

The idea in one paragraph

We forked Qdrant and implemented Radial Priority Indexing (RPI). Points live in shells: k=1 is high trust, k=2..N is lower trust. Good points get promoted inward over repeated wins. Bad points get demoted outward. Over time, k=1 becomes cleaner. Search starts at k=1 and only falls through if needed.

A key intuition is that changing only magnitude does not change cosine similarity directionally. The semantic direction stays stable. We then use Euclidean shell geometry for trust tiers and adaptive movement.

RPI shell visualization showing bad context demoted outward while semantic direction stays constant
Same direction, larger radius. Bad context is moved to higher shell k, so the next search favors cleaner k=1 candidates.

What we actually shipped

Public-data benchmark run today

I wanted at least one benchmark that is not hand-picked toy data. So I ran a retrieval simulation today on the public 20 Newsgroups dataset. This is not a perfect production benchmark, but it is a legitimate, reproducible public corpus.

Setup: 5,000 train docs, 1,200 test docs, TF-IDF + SVD embeddings, baseline nearest retrieval vs RPI-style adaptive shell policy over 5 epochs.

MetricBaselineRPIDelta
Top-1 accuracy0.45500.6117 (epoch 5)+34.44% relative
Wider-scope retry rate0.5400 (epoch 1)0.4167 (epoch 5)-22.83%
Avg candidate set size50003989-20.22%

Translation for game NPCs: with repeated interactions, we got better first answers and fewer expensive wider-scope retries.

Engine-level benchmark snapshots

From our Qdrant integration bench runs in this fork:

Pros

Cons

Honest note

We vibecoded a lot of this fork, tests, and rewrites. No pretense that this is already a universal theorem for every stack. It is a practical idea that looks strong in the workloads we care about, and we are being explicit about where it can lose.

Also lmao, yes, this blog is AI-assisted with human intervention.

Where this is going

The goal is to use this memory policy inside dense NPC interaction loops where context quality matters more than raw nearest-neighbor purity. If your agent says context is bad, the system should learn and stop surfacing that context first.

That is the whole point of RPI for us: make k=1 progressively trustworthy without retraining the whole world every week.

Repro command used for the public-data run

python3 - <<'PY'
# 20 Newsgroups retrieval simulation using
# TF-IDF + SVD embeddings and RPI-style shell adaptation.
# (Script executed during this writeup.)
PY

Find me elsewhere: