Files
variet_llm/.planning/STATE.md
2026-04-07 21:25:25 +09:00

59 lines
1.7 KiB
Markdown

---
gsd_state_version: 1.0
milestone: v1.1
milestone_name: milestone
status: planning
last_updated: "2026-04-07T12:25:10.234Z"
last_activity: 2026-04-07
progress:
total_phases: 3
completed_phases: 1
total_plans: 1
completed_plans: 1
---
# Project State
## Project Reference
A high-performance, locally-hosted AI assistant system built on two RTX 3060 12GB GPUs. It uses a "2+0" architecture where Machine A acts as a dedicated inference server running large language models, while Machine B handles the user interface (VS Code, Discord) and tool execution.
## Current Position
Phase: 04
Plan: Not started
Status: Ready to plan
Last activity: 2026-04-07
## Progress
[████████████████████] 100% (Phase 01: LLM Tuning)
[████████████████████] 100% (Phase 02: API Engine)
## Completed Phases
- Phase 01 (LLM Tuning): 5개 모델 최적 설정 확정 (74.65 / 61.62 / 16.0 / 16.7 / 8.95 t/s)
- Phase 02 (API Engine): Variet Engine v1.0 — FastAPI 프록시 + 핫스왑 + 503 보호
## Recent Decisions
- 2+0 GPU Architecture (Machine A API Server, Machine B tools).
- 5-tier model strategy: fast/balanced/deep-coder/deep-logic/ultra.
- GPU 0 PCIe x4 제약 → 122B MoE는 GPU 1 단독 사용.
- Variet Engine: 단일 포트(8000) FastAPI 리버스 프록시.
- config/engine_models.json → 모든 설정의 Single Source of Truth.
- CLI-First 검증 전략: VS Code Extension 전 OpenClaude CLI로 에이전트 루프 먼저 검증.
## Pending Todos
0 pending.
## Blockers/Concerns
None.
## Session Continuity
Last session: 2026-04-07T20:38:00+09:00
Milestone: v1.1 OpenClaude CLI Integration