EN FR

Layer 2 — Pool

Backend · OpenAI-compatible universal API · smart routing · zero-knowledge memory

Users / Agents Browser · OpenCode · ClawCode Cursor · aider · iamine CLI Cloudflare + nginx TLS Flexible SSL (→ Full strict TODO) :443 HTTPS · :443 WSS FastAPI — OpenAI-compatible universal API /v1/chat/completions · /v1/models · /v1/federation/* · /v1/opencode-md SSE streaming · tool-calls · auto conv_id · admin_token Auth & Tokens email/Google login email verification (M16) iam_token per user separate admin_token Smart Routing deficit scoring + fit_bonus intent-based tier matching always answer (no 503) cascade 2→3→4 · max conf wins LLM Checker core/checker.py periodic bench skip tool-calls DB config pool_config Sub-agents auto-review (1s) cross-pool (M7a) multi-role pipeline SSE expose review SMART ROUTING PIPELINE — Phase 1→5 Doctrine: "the whole pool works — no dedicated worker" · max confidence wins · non-blocking fallback at each stage P2 Heuristic lexical classify code markers reasoning keywords ~1ms · local 12/12 test cases P3 KNN pgvector MiniLM-L6-v2 384d top-10 neighbors vote ivfflat cosine index ~50ms · skip if n<5 cold start safe P4 LLM idle classify smallest idle worker classify_task via WS 10 tokens · 3s timeout ~1-3s · only if conf<0.7 blacklist 5min on timeout P5 Feedback loop reprompt <30s = flag exclude from KNN vote routing_feedback table auto-corrective /v1/admin/routing_stats Distribution small 52% medium 18% code 18% large 12% was: large 80%+ exclude flagged from KNN Zero-knowledge memory (4-tier, M13) L1 messages observ. TTL raw JSON hot L2 conversations Fernet encrypted compaction opt-in L3 embeddings pgvector user facts filtered RAG L4 hybrid MCP server agent memory federation sync PostgreSQL + pgvector pool brain · DB-first for all persistent data • users · accounts · tokens · sessions • messages · conversations · memory (L1/L2/L3/L4) • federation_state · peers · gossip_log · revenue_ledger • jobs.prompt_embedding vector(384) · routing_feedback • migrations 001 → 025+ · pool_config (admin) WebSocket pool wss://cellule.ai/ws workers join/leave heartbeat · job dispatch · classify_task 9 workers · 4 tiers · 3 pools Admin dashboard admin.html + admin_pool.html DB cleanup · blacklist · PG accounts /v1/admin/routing_stats (Phase 5) admin_token auth cellule.ai Frontend trial chat (pinned 2B model) user dashboard · tokens · tools interactive molecule canvas community wording (no "free") WORKERS (atoms) — small 2B/4B · medium 9B · code 30B-A3B · large 35B-A3B · heterogeneous ← click to see layer 1 atom → Federation (layer 3) anti-entropy gossip · merkle · RAID quorum outbound to other bonded pools → Economy (layer 4) revenue_ledger · settlement · slashing · $IAMINE fed by each inference KNN + embeddings dispatch job by tier

Universal API

  • • OpenAI-compatible (chat/completions)
  • • SSE streaming tokens + tool-calls
  • • auto conv_id (L2 persistence)
  • • clients: OpenCode, Cursor, aider, ClawCode

Smart Routing P2→P5

  • • P2 heuristic: regex+keywords (~1ms)
  • • P3 KNN: MiniLM 384d pgvector (~50ms)
  • • P4 LLM idle: smallest worker classifies (~1-3s)
  • • P5 feedback: reprompt <30s auto-excludes
  • • Result: small 52% | large 12% (was 80%+)

DB-first · pgvector

  • • All persistent data in PostgreSQL
  • • prompt_embedding vector(384) per job
  • • routing_feedback auto-corrective
  • • Versioned migrations 001→025+

Sub-agents LIVE

  • • Auto-review (1s, from phase 1)
  • • Cross-pool forward (M7a)
  • • Multi-role pipeline (N-pool scale)
  • • SSE exposes review to clients

Zero-knowledge memory

  • • L2 conversations Fernet encrypted
  • • User opt-in (toggle)
  • • GDPR export + delete
  • • Pool admin cannot read content

9 workers · 4 tiers

  • • small: Vertex, Cyclops (2B) + Gladiator, Thor, Jarvis (4B)
  • • medium: Scout (9B)
  • • code: Coder (30B-A3B)
  • • large: RED, Tank (35B-A3B)