Session 2026-04-11 — Phase 01 Retuning & Hermes Agent v0.8.0
Type: Maintenance
Focus: LLM tuning re-verification, Hermes Agent update, primary model switch
Summary
Phase 01 (LLM Tuning) 재검증 세션과 Hermes Agent 주요 업데이트를 병행하여 진행. 두 LLM 역할 최적화 파라미터 갱신 및 Qwen 3.5 35B를 메인 모델로 승격, Hermes Agent v0.7.x → v0.8.0 업데이트 완료.
Key Outcomes
LLM Engine
- balanced (Qwen 3.5 35B-A3B): 61.62 → 64.16 t/s (+4.1%)
- fast (Gemma 4 26B-A4B): 74.65 → 71.89 t/s (Vision GPU 추가 trade-off)
- Default role:
fast → balanced
- 두 역할 모두
--mlock/--poll/--prio/-t/-tb 제거
Qwen Primary Promotion
- 속도 차이 1.25 t/s (negligible)
- 35B > 26B 품질 우위
- Thinking mode + 한국어/코딩 강점
- Vision CPU 오프로드 수용 (6.4s/image)
Hermes Agent
- 버전:
fff237e1 → e902e55b (340 commits, v0.8.0)
- 로컬 8개 패치 자동 병합 (0 conflict)
- 설정:
custom/qwen3.5-35b-a3b + DISCORD_HOME_CHANNEL 수동 설정
Experiments (Not Adopted)
- Speculative Decoding (E2B draft): +14% gen vs -31% cold start → 채택 안 함
- llama.cpp b8757: Gemma 4 9% 회귀 → b8660 유지
Files Changed
config/engine_models.json
docs/v3_balanced_retuning_log.md (new)
docs/v3_fast_retuning_log.md (new)
.planning/reports/20260411-session-report.md (new)
.planning/phases/01-llm-tuning/VERIFICATION.md (updated)
.planning/STATE.md (updated)
.planning/HANDOFF.json (updated)
agents/hermes-agent (submodule bump)
scripts/bench_short.py, bench_long.py, test_ts_ratios.py (new utilities)
Git History
Hardware Constraints Documented
- GPU 0: PCIe 3.0 x4 (3.94 GB/s) — bottleneck
- GPU 1: PCIe 4.0 x16 (31.5 GB/s)
- Total VRAM: 24 GB (2x RTX 3060 12GB)
Next Session
- Run
/gsd-resume-work to reload state from HANDOFF.json
- Options:
- Start Hermes Agent (
run_hermes_agent.bat)
- Resume Phase 05 (VS Code Extension Packaging)
- New feature development