GPU 0 PCIe 3.0 x4 (3.94 GB/s) ← bottleneck GPU 1 PCIe 4.0 x16 (31.5 GB/s) Total VRAM: 24 GB
GPU 0의 PCIe x4 제약이 이중 GPU 추론 속도 상한을 결정합니다.
-ub 128 → 256
-ts 0.5,0.5 → 0.48,0.52
--mmproj models/mmproj-F16.gguf
--no-mmproj-offload
--mlock/--poll/--prio/-t/-tb
cache-type: f16 → q8_0
-ts 0.43,0.57
--mmproj models/gemma-4-26B-mmproj-F16.gguf
Phase 01: ✅ COMPLETE (2026-04-07 초기, 2026-04-11 재검증)
Deleting the wiki page "Phase-01-LLM-Tuning" cannot be undone. Continue?