feat: Variet Engine v1.0 + 5-model tuning complete
Phase 01 (LLM Tuning): - Gemma4 26B: 74.65 t/s (fast) - Qwen 35B: 61.62 t/s (balanced) - Gemma4 31B: 16.0 t/s (deep-coder) - Qwen 27B: 16.7 t/s (deep-logic) - Qwen 122B: 8.95 t/s (ultra, GPU 1 only) Phase 02 (API Engine): - FastAPI reverse proxy on port 8000 - /engine/switch hot-swap with 503 protection - config/engine_models.json as single source of truth - Replaced 4 individual .bat files with unified engine File cleanup: - scripts/ 85 files -> 9 + _archive/ - Root .bat files -> _archive/
This commit is contained in:
31
scripts/_archive/results/dual_gpu_summary.txt
Normal file
31
scripts/_archive/results/dual_gpu_summary.txt
Normal file
@@ -0,0 +1,31 @@
|
||||
Dual-GPU Benchmark v2 — 2026-04-06T06:52:08.868Z
|
||||
2x RTX 3060 12GB | 256K Context | 58 configs | 69.4 min
|
||||
|
||||
=======================================================
|
||||
RANKING
|
||||
=======================================================
|
||||
|
||||
🥇 #1: Gemma4-26B Q4_K_M
|
||||
AVG: 76.4 t/s | BEST: 76.75 t/s | Boot: 9s
|
||||
ngl=999 t=6 ub=512 b=2048 ctk=f16 ctv=f16
|
||||
|
||||
🥈 #2: Gemma4-26B MXFP4_MOE
|
||||
AVG: 64.05 t/s | BEST: 64.29 t/s | Boot: 9s
|
||||
ngl=999 t=2 ub=512 b=2048 ctk=q8_0 ctv=q8_0
|
||||
|
||||
🥉 #3: Qwen3.5-35B Q4_K_M
|
||||
AVG: 52.05 t/s | BEST: 54.48 t/s | Boot: 12.1s
|
||||
ngl=999 t=4 ub=256 b=1024 ctk=q4_0 ctv=q4_0
|
||||
--n-cpu-moe 5
|
||||
|
||||
#4: Qwen3.5-35B MXFP4_MOE
|
||||
AVG: 46.66 t/s | BEST: 47.09 t/s | Boot: 15s
|
||||
ngl=999 t=6 ub=512 b=2048 ctk=q4_0 ctv=q4_0
|
||||
--n-cpu-moe 5
|
||||
|
||||
=======================================================
|
||||
★ CHAMPION: Gemma4-26B Q4_K_M — 76.4 t/s
|
||||
=======================================================
|
||||
|
||||
Recommended:
|
||||
llama-server --model models\gemma-4-26B-A4B-it-Q4_K_M.gguf -ngl 999 -c 262144 -t 6 -tb 6 -ub 512 -b 2048 -fa on --cache-type-k f16 --cache-type-v f16 --prio 3 --poll 50 --mlock --port 8000 --host 0.0.0.0
|
||||
Reference in New Issue
Block a user