Phase 01 (LLM Tuning): - Gemma4 26B: 74.65 t/s (fast) - Qwen 35B: 61.62 t/s (balanced) - Gemma4 31B: 16.0 t/s (deep-coder) - Qwen 27B: 16.7 t/s (deep-logic) - Qwen 122B: 8.95 t/s (ultra, GPU 1 only) Phase 02 (API Engine): - FastAPI reverse proxy on port 8000 - /engine/switch hot-swap with 503 protection - config/engine_models.json as single source of truth - Replaced 4 individual .bat files with unified engine File cleanup: - scripts/ 85 files -> 9 + _archive/ - Root .bat files -> _archive/
41 lines
1.6 KiB
Markdown
41 lines
1.6 KiB
Markdown
# Project State
|
|
|
|
## Project Reference
|
|
A high-performance, locally-hosted AI assistant system built on two RTX 3060 12GB GPUs. It uses a "2+0" architecture where Machine A acts as a dedicated inference server running large language models, while Machine B handles the user interface (VS Code, Discord) and tool execution.
|
|
|
|
## Current Position
|
|
Phase: 02-api-engine (Complete) -> Ready for Phase 3
|
|
Plan: None
|
|
Status: Transitioning to Phase 3
|
|
|
|
## Progress
|
|
[████████████████████] 100% (Phase 01: LLM Tuning)
|
|
[████████████████████] 100% (Phase 02: API Engine)
|
|
|
|
## Completed Phases
|
|
- Phase 01 (LLM Tuning): 5개 모델 최적 설정 확정 (74.65 / 61.62 / 16.0 / 16.7 / 8.95 t/s)
|
|
- Phase 02 (API Engine): Variet Engine v1.0 — FastAPI 프록시 + 핫스왑 + 503 보호
|
|
|
|
## Recent Decisions
|
|
- 2+0 GPU Architecture (Machine A API Server, Machine B tools).
|
|
- 5-tier model strategy: fast/balanced/deep-coder/deep-logic/ultra.
|
|
- GPU 0 PCIe x4 제약 → 122B MoE는 GPU 1 단독 사용.
|
|
- Variet Engine: 단일 포트(8000) FastAPI 리버스 프록시.
|
|
- config/engine_models.json → 모든 설정의 Single Source of Truth.
|
|
|
|
## Pending Todos
|
|
0 pending.
|
|
|
|
## Blockers/Concerns
|
|
None.
|
|
|
|
## Next Phases (Suggested)
|
|
- Phase 03: VS Code Extension (에이전트 루프, 도구 통합)
|
|
- Phase 04: Discord Bot (개인 비서, 슬래시 명령어)
|
|
- Phase 05: MCP Tools (SearXNG, Calendar, Gmail)
|
|
|
|
## Session Continuity
|
|
Last session: 2026-04-07T18:07:00+09:00
|
|
Stopped at: Phase 02 complete, GSD sync in progress
|
|
Resume file: .planning/HANDOFF.json
|