# Project State ## Project Reference A high-performance, locally-hosted AI assistant system built on two RTX 3060 12GB GPUs. It uses a "2+0" architecture where Machine A acts as a dedicated inference server running large language models, while Machine B handles the user interface (VS Code, Discord) and tool execution. ## Current Position Phase: 02-api-engine (Complete) -> Ready for Phase 3 Plan: None Status: Transitioning to Phase 3 ## Progress [████████████████████] 100% (Phase 01: LLM Tuning) [████████████████████] 100% (Phase 02: API Engine) ## Completed Phases - Phase 01 (LLM Tuning): 5개 모델 최적 설정 확정 (74.65 / 61.62 / 16.0 / 16.7 / 8.95 t/s) - Phase 02 (API Engine): Variet Engine v1.0 — FastAPI 프록시 + 핫스왑 + 503 보호 ## Recent Decisions - 2+0 GPU Architecture (Machine A API Server, Machine B tools). - 5-tier model strategy: fast/balanced/deep-coder/deep-logic/ultra. - GPU 0 PCIe x4 제약 → 122B MoE는 GPU 1 단독 사용. - Variet Engine: 단일 포트(8000) FastAPI 리버스 프록시. - config/engine_models.json → 모든 설정의 Single Source of Truth. ## Pending Todos 0 pending. ## Blockers/Concerns None. ## Next Phases (Suggested) - Phase 03: VS Code Extension (에이전트 루프, 도구 통합) - Phase 04: Discord Bot (개인 비서, 슬래시 명령어) - Phase 05: MCP Tools (SearXNG, Calendar, Gmail) ## Session Continuity Last session: 2026-04-07T18:07:00+09:00 Stopped at: Phase 02 complete, GSD sync in progress Resume file: .planning/HANDOFF.json