feat(engine): balanced 역할 jinja thinking + checkpoint RAM 오프로드

- --jinja + --chat-template-kwargs '{"enable_thinking":true}' 추가
- -cram 8192: context checkpoint를 GPU 대신 CPU RAM에 저장
  (GPU CUDA OOM 크래시 방지 — cuMemSetAccess 실패 at device:1)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Variet Worker
2026-04-12 23:44:15 +09:00
parent e02626fda8
commit f3e9e9f053

View File

@@ -46,6 +46,9 @@
"--mmproj",
"models/mmproj-F16.gguf",
"--no-mmproj-offload",
"--jinja",
"--chat-template-kwargs",
"{\"enable_thinking\":true}",
"-ngl",
"999",
"-c",
@@ -63,7 +66,9 @@
"-b",
"512",
"-ts",
"0.48,0.52"
"0.48,0.52",
"-cram",
"8192"
]
},
"deep-coder": {