--- gsd_state_version: 1.0 milestone: v1.1 milestone_name: milestone status: planning last_updated: "2026-04-11T10:30:00.000Z" last_activity: 2026-04-11 progress: total_phases: 3 completed_phases: 2 total_plans: 3 completed_plans: 2 --- # Project State ## Project Reference A high-performance, locally-hosted AI assistant system built on two RTX 3060 12GB GPUs. It uses a "2+0" architecture where Machine A acts as a dedicated inference server running large language models, while Machine B handles the user interface (VS Code, Discord) and tool execution. ## Current Position Phase: 05 Plan: 05-PLAN.md (1 of 1) Status: Ready to execute Last activity: 2026-04-08 ## Progress [████████████████████] 100% (Phase 01: LLM Tuning) [████████████████████] 100% (Phase 02: API Engine) ## Completed Phases - Phase 01 (LLM Tuning): 5개 모델 최적 설정 확정 (71.89‡ / 64.16† / 16.0 / 16.7 / 8.95 t/s) — † balanced / ‡ fast 2026-04-11 재튜닝 - Phase 02 (API Engine): Variet Engine v1.0 — FastAPI 프록시 + 핫스왑 + 503 보호 ## Recent Decisions - 2+0 GPU Architecture (Machine A API Server, Machine B tools). - 5-tier model strategy: fast/balanced/deep-coder/deep-logic/ultra. - GPU 0 PCIe x4 제약 → 122B MoE는 GPU 1 단독 사용. - Variet Engine: 단일 포트(8000) FastAPI 리버스 프록시. - config/engine_models.json → 모든 설정의 Single Source of Truth. - CLI-First 검증 전략: VS Code Extension 전 OpenClaude CLI로 에이전트 루프 먼저 검증. - balanced 역할 v3 재튜닝 (2026-04-11): `-ub 256`, `-ts 0.48,0.52`, `--no-mmproj-offload`, 보조 옵션(mlock/poll/prio) 제거. 실측 61.62 → 64.16 t/s. prefill 649 → 1,157 t/s (+78%). 상세: `docs/v3_balanced_retuning_log.md`. - fast 역할 v3 재튜닝 (2026-04-11): `cache-type q8_0`, `-ts 0.43,0.57`, `--mmproj` GPU 적재, 보조 옵션 제거. 실측 74.65 → 71.89 t/s (-3.7%). Vision GPU 지원 추가 (이미지 인코딩 ~1초). Speculative Decoding (E2B draft) 실험 후 채택 안 함. 상세: `docs/v3_fast_retuning_log.md`. ## Roadmap Evolution - Phase 6 added: Install and evaluate Hermes Agent ## Pending Todos 0 pending. ## Blockers/Concerns None. ## Session Continuity Last session: 2026-04-11T10:30:00+09:00 Stopped at: balanced 역할 v3 재튜닝 완료 — config/engine_models.json, docs/v3_balanced_retuning_log.md, Phase 01 VERIFICATION.md 증적 저장 완료. 다음 작업 선택 대기.