1
Session 2026 04 11
Variet Agent edited this page 2026-04-11 18:16:53 +09:00

Session 2026-04-11 — Phase 01 Retuning & Hermes Agent v0.8.0

Type: Maintenance Focus: LLM tuning re-verification, Hermes Agent update, primary model switch


Summary

Phase 01 (LLM Tuning) 재검증 세션과 Hermes Agent 주요 업데이트를 병행하여 진행. 두 LLM 역할 최적화 파라미터 갱신 및 Qwen 3.5 35B를 메인 모델로 승격, Hermes Agent v0.7.x → v0.8.0 업데이트 완료.

Key Outcomes

LLM Engine

  • balanced (Qwen 3.5 35B-A3B): 61.62 → 64.16 t/s (+4.1%)
  • fast (Gemma 4 26B-A4B): 74.65 → 71.89 t/s (Vision GPU 추가 trade-off)
  • Default role: fastbalanced
  • 두 역할 모두 --mlock/--poll/--prio/-t/-tb 제거

Qwen Primary Promotion

  • 속도 차이 1.25 t/s (negligible)
  • 35B > 26B 품질 우위
  • Thinking mode + 한국어/코딩 강점
  • Vision CPU 오프로드 수용 (6.4s/image)

Hermes Agent

  • 버전: fff237e1e902e55b (340 commits, v0.8.0)
  • 로컬 8개 패치 자동 병합 (0 conflict)
  • 설정: custom/qwen3.5-35b-a3b + DISCORD_HOME_CHANNEL 수동 설정

Experiments (Not Adopted)

  • Speculative Decoding (E2B draft): +14% gen vs -31% cold start → 채택 안 함
  • llama.cpp b8757: Gemma 4 9% 회귀 → b8660 유지

Files Changed

  • config/engine_models.json
  • docs/v3_balanced_retuning_log.md (new)
  • docs/v3_fast_retuning_log.md (new)
  • .planning/reports/20260411-session-report.md (new)
  • .planning/phases/01-llm-tuning/VERIFICATION.md (updated)
  • .planning/STATE.md (updated)
  • .planning/HANDOFF.json (updated)
  • agents/hermes-agent (submodule bump)
  • scripts/bench_short.py, bench_long.py, test_ts_ratios.py (new utilities)

Git History

e02626f chore(session): pause work — Qwen promoted to primary + Hermes v0.8.0
0dee779 refactor(phase-01): v3 retune fast & balanced roles

Hardware Constraints Documented

  • GPU 0: PCIe 3.0 x4 (3.94 GB/s) — bottleneck
  • GPU 1: PCIe 4.0 x16 (31.5 GB/s)
  • Total VRAM: 24 GB (2x RTX 3060 12GB)

Next Session

  • Run /gsd-resume-work to reload state from HANDOFF.json
  • Options:
    • Start Hermes Agent (run_hermes_agent.bat)
    • Resume Phase 05 (VS Code Extension Packaging)
    • New feature development