- • OpenAI-compatible (chat/completions)
- • SSE streaming tokens + tool-calls
- • auto conv_id (L2 persistence)
- • clients: OpenCode, Cursor, aider, ClawCode
- • P2 heuristic: regex+keywords (~1ms)
- • P3 KNN: MiniLM 384d pgvector (~50ms)
- • P4 LLM idle: smallest worker classifies (~1-3s)
- • P5 feedback: reprompt <30s auto-excludes
- • Result: small 52% | large 12% (was 80%+)
- • All persistent data in PostgreSQL
- • prompt_embedding vector(384) per job
- • routing_feedback auto-corrective
- • Versioned migrations 001→025+
- • Auto-review (1s, from phase 1)
- • Cross-pool forward (M7a)
- • Multi-role pipeline (N-pool scale)
- • SSE exposes review to clients
- • L2 conversations Fernet encrypted
- • User opt-in (toggle)
- • GDPR export + delete
- • Pool admin cannot read content
- • small: Vertex, Cyclops (2B) + Gladiator, Thor, Jarvis (4B)
- • medium: Scout (9B)
- • code: Coder (30B-A3B)
- • large: RED, Tank (35B-A3B)