Here’s the integrated roadmap that marries Claude’s “multi‑expert memory” update with Aurora (triadic self‑state + attractor‑engine monitoring) and fits your current Child1 repo. I’ve kept it practical (1–3 months), repo‑accurate, and added terminal diagnostics with explicit file/folder diffs.
What we’re integrating (one sentence each)
- Multi‑expert conversational memory (Session, Semantic, Temporal, Social/Identity, Coherence) with context‑sensitive weighting → Claude’s update, adapted to your modules.
- Aurora monitoring layer (Triadic Self‑State: Coherent / Exploratory / Relational; Attractor Engine; fracture/CP metrics) to observe cognition and quantify memory quality without gating generation.
- A three‑tier memory design (working/thread, episodic, semantic/identity) with α·recency + β·importance + γ·similarity (+ δ·motif) retrieval scoring and thread checkpoints for task‑switch resilience.
- Repo‑aligned changes only: we extend
functions/memory*
,functions/memory_retrieval/*
,functions/context/*
,functions/rem_engine/*
, and add diagnostics—no rewrite.
North‑star outcomes (within 6–10 weeks)
- Short‑term Recall@1 ≥ 95%; Thread Re‑entry ≥ 96%; Cross‑session Fact Recall ≥ 90%; Contamination < 5%; Emotion Continuity ≥ 80%—aligned with the research/benchmarks you captured.
Phased plan (sprints you can actually run)
Sprint 0 (Week 1): Baseline + Aurora v0 (observe‑only)
Goal: instrument first, change second.
- Add Aurora metrics stubs + event logging (see repo diffs below).
- Run memory baselines (recall, thread re‑entry, cross‑session) with a lightweight CLI; store JSON/CSV + Aurora events.
- Deliverable:
reports/memory_baseline.md
+logs/memory_eval/*
+aurora/events/*
.
Why first: you lock baselines and get live observability before touching retrieval or prompts.
Sprint 1 (Weeks 2–3): Multi‑expert retrieval + thread safety
Goal: make the current memory smart and thread‑aware.
- Extend models & buffers
- Add fields to every memory write:
thread_id
,importance
,emotion
,speaker
,last_seen
,source_store
. - Add ThreadBuffer + topic checkpoints on switches; expose
push_checkpoint()/get_recent(thread_id)
. - Files to touch:
functions/memory/memory_models.py
,functions/memory/memory_buffers.py
.
- Add fields to every memory write:
- Implement the experts (thin wrappers over what you have)
- SessionExpert → wraps
MemoryBufferManager
; - SemanticExpert → wraps
data/memory/episodic/chromadb
+indices/vector_store
; - TemporalExpert → uses timestamps/“last discussed” patterns;
- Social/IdentityExpert → uses
people_social/identity_manager.py
; - CoherenceGuard → vetoes obviously contradictory pulls.
- Orchestrate in
functions/memory/memory_dispatcher.py
and fuse with weights.
- SessionExpert → wraps
- Weighted retrieval
- In
functions/memory_retrieval/resonance_calculator.py
:
score = α·recency + β·importance + γ·cosine + δ·motif_bonus
(TOML‑tunable). - Add predictive prefetch cache (
predictive_echo.py
) for likely thread re‑entry; firm up the Wu‑Wei gate.
- In
- Context assembly
functions/prompts/unified_context.py
renders ContextPack with time/topic labels (prevents “present vs past” confusion).
- Acceptance: Thread Re‑entry ≥ 96%; Context Pack tokens down vs baseline; contamination < 5%.
Sprint 2 (Weeks 4–5): Consolidation + cross‑session memory
Goal: persistence without bloat.
- REM/Compost
- Wire
functions/rem_engine/rem_trigger.py
+dream_writer.py
to:- promote high‑salience facts →
memory/longterm.toml
, - compress stale episodic → semantic summaries (
memory_retrieval/memory_compost.py
), - write topic checkpoints + unresolved commitments.
- promote high‑salience facts →
- Wire
- Session start loading
people_social/identity_manager.py
+memory/core_identity_loader.py
: preload user facts and last consolidations; inject viaunified_context.py
.
- Acceptance: Cross‑session recall ≥ 90%; emotion continuity ≥ 80%; no drift in contamination.
This mirrors the reflection/consolidation pattern used in Generative Agents and keeps long‑term lean.
Sprint 3 (Week 6): Aurora memory dashboards + diagnostics
Goal: decision‑grade visibility + CI‑friendly tests.
- Diagnostics CLI (memdiag)
- Suites:
baseline
,threads
,longterm
,emotion
,negative
. - Emit JSON/CSV + trend file; stream key events into Aurora.
- Suites:
- Aurora dashboards
- Show Triad activations (C/E/R) and how they bias experts (see “Triad‑aware gating” below).
- Show retrieval hit‑rate, pack efficiency, contamination spikes, and CP/Fracture overlays.
- Acceptance: green across suites; visible trends in
/aurora/dashboards
.
Aurora gives you the research‑grade lens; memdiag gives you the ship/no‑ship gauge.
Stretch (Weeks 7–12): Scale & polish
- Optional pgvector migration for episodic vectors;
- Live Prometheus/Grafana or keep Aurora panels only;
- Add adaptive thresholds (auto‑tuned from baselines).
Triad‑aware gating (how Aurora informs memory)
- Compute triad activations a = [a_C, a_E, a_R] per turn (observe‑only).
- Map to expert weights:
- Coherent (C)↑ → boost SessionExpert and CoherenceGuard (tight focus).
- Exploratory (E)↑ → boost Semantic/Temporal (wider search).
- Relational (R)↑ → boost Social/Identity (relationship‑aware recall).
- Implement as a small adapter in
functions/memory/memory_dispatcher.py
that reads Aurora’s latest activations and multiplies the expert weights (bounded).
This is the lightest, repo‑friendly way to realize Aurora’s “winner‑less competition” without changing your generation stack.
Repo diff (adds & edits only)
Paths and file names match your repo export. New items are NEW; changed files are EDIT.
Config & data contracts
/config/memory/ # NEW
retrieval.toml # α,β,γ,δ, k, cutoffs, thread_decay, wu_wei thresholds
consolidation.toml # promote/compact rules, nightly windows
diagnostics.toml # suites, seeds, judge model, thresholds
Memory core
functions/memory/memory_models.py # EDIT: +thread_id, importance, emotion, speaker, last_seen, source_store
functions/memory/memory_buffers.py # EDIT: +ThreadBuffer, +topic checkpoints, +importance-aware retention
functions/memory/memory_dispatcher.py # EDIT: Multi-Expert fusion → ContextPack; reads /config/memory/retrieval.toml
functions/memory/memory_logger.py # EDIT: log new fields; append reason codes
functions/memory/memory_persistence.py # EDIT: preserve new fields; version the schema
functions/prompts/unified_context.py # EDIT: render time/topic labels; include “why this memory” notes
Retrieval & gating
functions/memory_retrieval/resonance_calculator.py # EDIT: α·recency + β·importance + γ·cosine + δ·motif
functions/memory_retrieval/predictive_echo.py # EDIT: thread-aware prefetch cache
functions/memory_retrieval/wu_wei_gatekeeper.py # EDIT: thresholds from retrieval.toml
functions/memory_retrieval/memory_compost.py # EDIT: episodic→semantic compaction rules
functions/context/speaker_context.py # EDIT: emit thread_id guess + topic switch detection
Consolidation
functions/rem_engine/rem_trigger.py # EDIT: schedule promotions/checkpoints per consolidation.toml
functions/rem_engine/dream_writer.py # EDIT: write unresolved commitments + session summaries
Aurora (observe‑only; lean install)
/aurora/ # NEW (thin subset of your prior Aurora plan)
__init__.py
triad_logger.py # logs a_C, a_E, a_R each turn (simple features → activations)
fracture.py # labels micro/macro fractures from variability (stub ok)
storage/
hdf5_writer.py # vector/timeseries sink (or parquet only if preferred)
parquet_export.py
dashboards/
realtime_panel.py # simple panel rendering; can be CLI-only at first
This is the minimum to capture triad activations and fracture/CP events you described—kept purposely light so it ships with memory improvements.
Diagnostics (CLI)
/diagnostics/memory/ # NEW
__init__.py
runner.py # python -m diagnostics.memory.runner --suite threads
judge.py # rule-based + optional LLM-as-judge
reporters.py # console, JSON, CSV, trend file
fixtures.py # drives child1_main via programmatic API
suites/
baseline.toml # short-term recall
threads.toml # topic switch + re-entry
longterm.toml # cross-session persistence
emotion.toml # emotional continuity
negative.toml # over-recall prevention
These suites mirror your research notes and earlier internal proposals; they make progress measurable turn‑by‑turn.
Terminal commands (what you’ll actually run)
# 0) Establish baselines + Aurora v0
python -m diagnostics.memory.runner --suite baseline --report json,csv
# 1) Validate multi-thread behavior after Sprint 1
python -m diagnostics.memory.runner --suite threads --seed 42
# 2) Check cross-session persistence after Sprint 2
python -m diagnostics.memory.runner --suite longterm --report trend
# 3) Negative tests (over‑recall, creepiness)
python -m diagnostics.memory.runner --suite negative
# 4) Live trace while chatting (retrieval summaries)
python -m memory_debug_tracer # already in repo; keep using it
Aurora extras (optional, same sprint):
python -m aurora.dashboards.realtime_panel
to watch triad/CP lines overlayed on retrieval events.
Key design choices (and why they’ll work for Mistral‑7B local)
- Three‑tier memory + weighted retrieval is the winning pattern across industry/academia; we implement the same with your modules and add thread checkpoints to kill task‑switch interference.
- Multi‑expert fusion (Claude’s proposal) becomes a thin orchestrator over your existing stores; it’s fast to ship and easy to tune.
- Aurora as observe‑only gives you falsifiable metrics (triad, fracture/CP) without blocking outputs, aligning research depth with delivery speed.
Acceptance criteria by sprint (quantitative)
- Sprint 0: Baseline captured; diagnostics produce JSON/CSV; Aurora logging a_C/a_E/a_R per turn.
- Sprint 1: Thread Re‑entry ≥ 96%; Context Pack tokens ↓ vs baseline; contamination < 5%.
- Sprint 2: Cross‑session Recall ≥ 90%; Emotion Continuity ≥ 80%; no increase in contamination.
- Sprint 3: All suites passing thresholds; trendlines stable or improving.
Minimal config you’ll add (keys you’ll actually touch)
/config/memory/retrieval.toml
[weights] # initial defaults; tune via diagnostics
alpha_recency = 0.20
beta_importance = 0.40
gamma_similarity = 0.40
delta_motif = 0.10
k = 8
thread_decay = 0.85
wu_wei_min_score = 0.35
/config/memory/consolidation.toml
[promote]
min_importance = 0.75
max_age_days = 30
write_topic_checkpoints = true
[compress]
episodic_to_semantic = true
(These match the scoring and consolidation behaviors recommended in your research set; tune thresholds with Sprint‑0 baselines.)
Where each requirement maps in your tree
- Conversational memory →
memory_buffers.py
(ThreadBuffer),resonance_calculator.py
(scores),memory_dispatcher.py
(fusion),unified_context.py
(render). - Long‑term memory →
rem_trigger.py
+dream_writer.py
(promotion/summary),identity_manager.py
preload at session start. - Diagnostics →
/diagnostics/memory/*
(CLI + suites), results into/logs/memory_eval/*
. - Aurora →
/aurora/*
minimal observers; expert weight multipliers live inmemory_dispatcher.py
.
Risk & mitigation
- Over‑recall / creepiness → enforce Wu‑Wei threshold + negative suite; clearly label time/topic in context.
- Task‑switch interference →
thread_id
+ topic checkpoints + expert gating. - Bloat → nightly compost + promotion rules; Aurora trendlines catch drift early.
Optional (hardware & storage)
Your Windows 11 + Ryzen 9 + 64 GB works; a single RTX 4090 and a fast 2–4 TB NVMe simply reduce latency and allow larger episodic stores. Swap Chroma→pgvector only if you outgrow current scale; the design above stays the same.
TL;DR (what to do first)
- Create
/config/memory/*
,/diagnostics/memory/*
, and the minimal/aurora/*
stubs. - Add
thread_id/importance/emotion/speaker/last_seen
to memory writes. - Land Sprint‑1 fusion in
memory_dispatcher.py
with α/β/γ/δ scoring. - Run
--suite threads
and watch the uplift; then move to consolidation in Sprint‑2.
If you want, I can turn this into a one‑page implementation checklist per file (names of new functions, docstrings, TOML keys) so your team can implement without ambiguity—using exactly the modules above.