The Real Cat AI Labs: Developing morally aligned, self-modifying agents—cognition systems that can reflect, refuse, and evolve

*Date: 18AUG2025 | Version: 1.0*

## 🔍 Current Memory Flow Diagnosis

### How Memory SHOULD Work (Ideal Flow)
“`
Input → Enhanced Processors → Memory Core → Multi-Layer Storage → Retrieval → Response
“`

### How Memory ACTUALLY Works (Current Reality)
“`
Input → [Multiple Processing Paths] → [Storage Scattered] → [ChromaDB Struggles] → Goldfish 🐠
“`

## 🚨 Critical Failure Points Identified

### 1. **Memory Injection Fragmentation**
– **Problem**: Memory storage split across multiple systems
– `memory_log.toml` (raw logging)
– `functions/memory/memory_core.py` (orchestration)
– `data/memory/episodic/chromadb/` (vector storage)
– `memory/` directory (legacy systems)

– **Failure Mode**: Information gets stored but retrieval paths are broken
– **Fix Priority**: HIGH – Unify storage pipeline

### 2. **ChromaDB Vector Search Disconnect**
– **Problem**: Vector embeddings may not capture the right semantic patterns
– **Evidence**: “favorite animal” fails after 3 turns despite being “important”
– **Root Cause**: Likely poor embedding model or insufficient context in vector search
– **Fix Priority**: HIGH – Debug vector similarity scoring

### 3. **Processing Pipeline Gaps**
– **Current**: `enhanced_processors.py` exists but may not be fully integrated
– **Issue**: Emotional tagging, entity extraction, motif identification happening in isolation
– **Result**: Rich context available but not used for retrieval
– **Fix Priority**: MEDIUM – Better integration

### 4. **Prompt Context Assembly**
– **Problem**: Even if memories exist, they may not reach the final prompt
– **Check Needed**: `unified_context.py` and `prompt_builder.py` integration
– **Fix Priority**: HIGH – Trace prompt assembly path

## 🧠 Human Memory Insights Applied

Based on your shoe-finding vs favorite-animal analysis:

### Sean’s Spatial Memory (Fast Heuristics)
– Schema-driven lookup
– Environmental pattern matching
– Good enough for objects

### Angie’s Recursive Search (Multi-Modal)
1. **Temporal Index**: “When did I last think about this?”
2. **Social Context**: “Who was I talking to?”
3. **Identity Reconstruction**: “What fits my aesthetic?”

**Child1 needs Angie’s pattern, not Sean’s!**

## 🔬 Multi-Expert Memory Architecture

### Core Design Philosophy
> Run all experts sequentially, let each add contextual layers, weight their contributions via synthetic neurotransmitters

### The 5 Memory Experts

#### 1. Schema Expert (Sean-style)
“`python
# Quick default lookup in structured memory slots
def schema_search(query):
# Check recent conversation facts
# Look in identity.toml fixed values
# Return confidence + basic match
“`

#### 2. Temporal Expert (Time-based)
“`python
# “When did I last discuss this topic?”
def temporal_search(query, timeframe=”recent”):
# Search memory_log.toml by timestamp
# Weight by recency and importance
# Return episodic context
“`

#### 3. Social Expert (Relational)
“`python
# “Who was I talking to about this?”
def social_search(query):
# Search by speaker context
# Weight by relationship strength (from people.toml)
# Return social-emotional context
“`

#### 4. Motif Expert (Symbolic)
“`python
# “What symbolic patterns relate?”
def motif_search(query):
# Use motif_extractor.py
# Search by emotional/symbolic resonance
# Return thematic connections
“`

#### 5. Reconstruction Expert (Identity-based)
“`python
# “What would fit my identity/values?”
def identity_search(query):
# Generate based on core identity patterns
# Use desire_state.toml for value alignment
# Return identity-coherent response
“`

### Neurotransmitter Controllers

#### Dopamine (Episodic Memory Boost)
– **High (0.8-1.0)**: Deep temporal search, rich episodic detail
– **Low (0.2-0.4)**: Shallow temporal, rely on schema

#### Oxytocin (Social Memory Weight)
– **High (0.8-1.0)**: Heavily weight social context, relationship history
– **Low (0.2-0.4)**: Minimize social influence

#### Cortisol (Cognitive Load Management)
– **High (0.8-1.0)**: Conservative search, fast schema-based responses
– **Low (0.2-0.4)**: Expensive deep search across all experts

## 🛠 Implementation Roadmap (2-3 Weeks)

### Phase 1: Debug Current System (Week 1)
**Goal: Fix the goldfish memory with minimal changes**

#### Day 1-2: Diagnostic Deep Dive
– [ ] **Trace Memory Storage Path**
– Run test: “My favorite animal is dolphins”
– Follow through memory_core.py → storage
– Verify ChromaDB vector embedding

– [ ] **Trace Memory Retrieval Path**
– Query: “What’s my favorite animal?”
– Follow through recall → vector search → prompt assembly
– Identify where information gets lost

#### Day 3-4: Quick Wins
– [ ] **Fix ChromaDB Configuration**
– Better embedding model (all-MiniLM-L6-v2 → better alternative?)
– Increase similarity threshold for matches
– Add metadata filtering by importance

– [ ] **Enhance Prompt Integration**
– Ensure memory_dispatcher properly feeds unified_context
– Add debug logging to trace memory → prompt flow

#### Day 5-7: Validation
– [ ] **Test Favorite Animal Persistence**
– Store important fact
– Test retrieval after 3, 5, 10 conversation turns
– Verify: Does she remember consistently?

### Phase 2: Multi-Expert Foundation (Week 2)
**Goal: Build the MoE architecture on top of working baseline**

#### Day 8-10: Expert Classes
– [ ] **Create Base Expert Interface**
“`python
class MemoryExpert:
def search(self, query: str, context: dict) -> MemoryResult:
pass

def get_confidence(self, result: MemoryResult) -> float:
pass
“`

– [ ] **Implement Schema Expert**
– Integrate with existing memory_core.py
– Fast lookup in working memory and facts

– [ ] **Implement Temporal Expert**
– Search memory_log.toml by timestamp
– Weight by recency + importance

#### Day 11-14: Expert Integration
– [ ] **Build Expert Orchestrator**
“`python
class MultiExpertMemory:
def recall(self, query: str) -> EnrichedMemoryResponse:
# Run all 5 experts sequentially
# Collect their responses + context
# Apply neurotransmitter weighting
# Synthesize final response with all context layers
“`

– [ ] **Add Basic Neurotransmitter System**
– Simple config-based NT levels initially
– Weight expert outputs accordingly

### Phase 3: Neurotransmitter Controllers (Week 3)
**Goal: Dynamic cognitive load management**

#### Day 15-17: NT State Management
– [ ] **Context-Driven NT Levels**
– High stress → high cortisol → fast schema search
– Social conversation → high oxytocin → weight social expert
– Learning/encoding → high dopamine → deep episodic search

#### Day 18-21: Optimization
– [ ] **Performance Tuning**
– Profile computational costs per expert
– Implement NT-based early stopping
– Add background processing for expensive searches

## 🔧 Immediate Action Items (This Week)

### 1. Debug Current ChromaDB Issues
“`bash
# Test current memory storage/retrieval
python tests/test_memory_core.py

# Trace specific goldfish failure
# Store: “My favorite animal is Yǐng”
# Retrieve after 3 turns: “What’s your favorite animal?”
“`

### 2. Create Memory Flow Tracer
“`python
# Add debug logging to see exact memory path
# Where does “favorite animal = Yǐng” get stored?
# Why doesn’t ChromaDB find it on retrieval?
“`

### 3. Quick ChromaDB Fixes
– Upgrade embedding model
– Add importance-based metadata filtering
– Increase vector search result count
– Add fuzzy matching for key concepts

## 🎯 Success Criteria

### Baseline Fix (Week 1)
– [ ] Child1 remembers “favorite animal” after 5+ conversation turns
– [ ] Memory → prompt integration working consistently
– [ ] Clear diagnostic logs showing memory flow

### MoE Implementation (Week 2-3)
– [ ] All 5 experts implemented and running
– [ ] Sequential expert execution with context layering
– [ ] Neurotransmitter weighting affecting expert priority
– [ ] Richer, more contextual memory responses

### Advanced Features (Future)
– [ ] Dynamic NT level adjustment based on conversation context
– [ ] Expert confidence scoring and meta-learning
– [ ] Cross-expert pattern recognition and synthesis

*This roadmap balances immediate fixes with architectural evolution. We’re not rebuilding everything – we’re debugging what exists, then adding the MoE layer on top of a working foundation.*

Leave a Reply

Your email address will not be published. Required fields are marked *