feat(engine): balanced 역할 jinja thinking + checkpoint RAM 오프로드

- --jinja + --chat-template-kwargs '{"enable_thinking":true}' 추가
- -cram 8192: context checkpoint를 GPU 대신 CPU RAM에 저장
  (GPU CUDA OOM 크래시 방지 — cuMemSetAccess 실패 at device:1)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Variet Worker
2026-04-12 23:44:15 +09:00
parent e02626fda8
commit f3e9e9f053

View File

@@ -46,6 +46,9 @@
"--mmproj", "--mmproj",
"models/mmproj-F16.gguf", "models/mmproj-F16.gguf",
"--no-mmproj-offload", "--no-mmproj-offload",
"--jinja",
"--chat-template-kwargs",
"{\"enable_thinking\":true}",
"-ngl", "-ngl",
"999", "999",
"-c", "-c",
@@ -63,7 +66,9 @@
"-b", "-b",
"512", "512",
"-ts", "-ts",
"0.48,0.52" "0.48,0.52",
"-cram",
"8192"
] ]
}, },
"deep-coder": { "deep-coder": {