wip: [06-install-and-evaluate-hermes-agent] paused at task 5/5
This commit is contained in:
@@ -1,43 +0,0 @@
|
||||
---
|
||||
phase: 01-llm-tuning
|
||||
task: 3
|
||||
total_tasks: 5
|
||||
status: in_progress
|
||||
last_updated: 2026-04-06T21:18:00+09:00
|
||||
---
|
||||
|
||||
<current_state>
|
||||
We are currently assessing the max context bounds and generation speed for dense/mid-sized models (Qwen 27B and Gemma 4 31B) in Q4_K_M formats. Qwen 27B booted successfully with `-c 262144`. We need to run its benchmark and then move on to testing the Gemma 4 31B context bounding limit to see if it also fits 256K.
|
||||
</current_state>
|
||||
|
||||
<completed_work>
|
||||
|
||||
- Task 1: Evaluate 122B Dual GPU vs Single GPU dynamics - Done
|
||||
- Task 2: Prove physical memory bandwidth limits of DDR4 on MoE architecture - Done
|
||||
- Task 3: Test Qwen 27B Dense max logic - In progress, booted successfully at -c 262144 inside 24GB VRAM
|
||||
</completed_work>
|
||||
|
||||
<remaining_work>
|
||||
|
||||
- Task 3: Finish speed benchmark of Qwen 27B at 256K context.
|
||||
- Task 4: Find maximum stable context for Gemma-4 31B Q4_K_M (17.0GB) and speed test.
|
||||
</remaining_work>
|
||||
|
||||
<decisions_made>
|
||||
|
||||
- Concluded that hitting 20t/s on 122B Q4_K_M is physically impossible via system DDR4 RAM. The limit is ~10-12 t/s.
|
||||
- Addressed `cudaMalloc failed` for dual GPU memory splitting. `n-cpu-moe` leaves a massive asymmetry that intrinsically fails to full-load dual 12GB VRAM cards efficiently.
|
||||
- Pivoted entirely away from 122B and 35B optimization, redirecting efforts to dense models (27B and 31B) to guarantee speed and 256K context.
|
||||
</decisions_made>
|
||||
|
||||
<blockers>
|
||||
- None. Hardware limitations acknowledged and bounded.
|
||||
</blockers>
|
||||
|
||||
<context>
|
||||
The user demanded explicit proof and answers regarding hardware utilization and VRAM filling geometry. With those physically justified, they requested a shift to new assets (Qwen 27B, Gemma 4 31B). We found that 27B at Q4_K_M (15.5GB) fits 256K into the dual RTX 3060 perfectly.
|
||||
</context>
|
||||
|
||||
<next_action>
|
||||
Start with: Re-run `node scripts/find_max_dense.mjs` but make sure `CUDA_VISIBLE_DEVICES` correctly spans all GPUs or is explicitly blank (`$env:CUDA_VISIBLE_DEVICES=""`), to get the speed test output for Qwen 27B and Gemma 31B.
|
||||
</next_action>
|
||||
@@ -1,63 +0,0 @@
|
||||
---
|
||||
phase: 05-vscode-extension-packaging
|
||||
task: 0
|
||||
total_tasks: 0
|
||||
status: not_started
|
||||
last_updated: 2026-04-07T22:40:56+09:00
|
||||
---
|
||||
|
||||
<current_state>
|
||||
Milestone v1.1 (OpenClaude CLI Integration)이 2/3 완료.
|
||||
Phase 03 (CLI Build) + Phase 04 (Model Routing & Agent Loop) 완료.
|
||||
Phase 05 (VS Code Extension Packaging)는 아직 Plan 미작성 상태.
|
||||
</current_state>
|
||||
|
||||
<completed_work>
|
||||
|
||||
- Phase 03: CLI Build & Provider Connection - Done
|
||||
- `bun install && bun run build` → OpenClaude v0.1.8 빌드 완료
|
||||
- `.env` 프로바이더 설정 (http://192.168.10.4:8000/v1)
|
||||
- `scripts/start_openclaude.bat` + `.ps1` 런처 생성
|
||||
- `--print` 모드로 E2E 연결 검증: "Hello there, friend." (76 t/s)
|
||||
|
||||
- Phase 04: Model Routing & Agent Loop - Done
|
||||
- `~/.claude/settings.json` — agentModels + agentRouting 설정
|
||||
- 핫스왑 테스트: fast(76 t/s) ↔ balanced(66 t/s) 왕복 성공
|
||||
- 스트리밍 응답 + 장문 계산 (123*456=56,088) 검증 완료
|
||||
|
||||
</completed_work>
|
||||
|
||||
<remaining_work>
|
||||
|
||||
- Phase 05: VS Code Extension Packaging (not started)
|
||||
- Plan 작성 필요
|
||||
- `npx @vscode/vsce package` → .vsix 빌드
|
||||
- Machine B VS Code에 Extension 설치
|
||||
- launchCommand + useOpenAIShim 설정 for Variet Engine
|
||||
|
||||
</remaining_work>
|
||||
|
||||
<decisions_made>
|
||||
|
||||
- CLAUDE_CODE_USE_OPENAI=1 shim 사용 (llama-server /v1 호환)
|
||||
- Machine A IP = 192.168.10.4 (이더넷)
|
||||
- OPENAI_API_KEY = "variet-local" (llama-server는 키 검증 안 함)
|
||||
- agentRouting에 단일 모델(variet-fast)만 설정 — 핫스왑으로 tier 교체
|
||||
- --print 모드로 CLI 검증 (인터랙티브 도구 호출은 Extension에서 검증)
|
||||
|
||||
</decisions_made>
|
||||
|
||||
<blockers>
|
||||
None.
|
||||
</blockers>
|
||||
|
||||
<context>
|
||||
Variet Engine이 이 머신(192.168.10.4)에서 실행 중.
|
||||
현재 로드된 모델: Gemma 4 26B (fast role).
|
||||
OpenClaude CLI 빌드: openclaude/dist/cli.mjs
|
||||
VS Code Extension 소스: openclaude/vscode-extension/openclaude-vscode/
|
||||
</context>
|
||||
|
||||
<next_action>
|
||||
Start with: /gsd-plan-phase 05 → VS Code Extension .vsix 빌드 및 설치 계획 수립
|
||||
</next_action>
|
||||
@@ -1,41 +0,0 @@
|
||||
---
|
||||
phase: 06-install-and-evaluate-hermes-agent
|
||||
task: 4
|
||||
total_tasks: 4
|
||||
status: paused
|
||||
last_updated: 2026-04-08T14:14:16.106Z
|
||||
---
|
||||
|
||||
<current_state>
|
||||
We have successfully connected Hermes Agent CLI and Discord Gateway to our local Variet Engine via the `custom` provider. All Windows specific patches (msvcrt locking, subprocess shell fixes, vLLM prefill compat) are implemented, committed, and contextualized (`06-PLAN.md` & `CONTEXT.md` completed). The phase is implicitly finished, and we are just pausing before starting the next project action.
|
||||
</current_state>
|
||||
|
||||
<completed_work>
|
||||
|
||||
- Task 1: Hermes Agent Repository Clone & Config - Done
|
||||
- Task 2: Variet Engine 로컬 연결 및 `.env` 세팅 - Done
|
||||
- Task 3: 윈도우 환경 구조적 버그 패치 (fcntl, browser, vLLM HTTP 400) - Done
|
||||
- Task 4: GSD Phase 06 마일스톤 생성 (`06-PLAN.md` 작성) 및 커밋 - Done
|
||||
</completed_work>
|
||||
|
||||
<remaining_work>
|
||||
|
||||
- Phase 06 is fully complete. No remaining implementation tasks exist.
|
||||
</remaining_work>
|
||||
|
||||
<decisions_made>
|
||||
|
||||
- Milestone v1.1 has reached its logical endpoint with the 100% completion of Phase 06.
|
||||
</decisions_made>
|
||||
|
||||
<blockers>
|
||||
- None
|
||||
</blockers>
|
||||
|
||||
<context>
|
||||
The integration phase went smoothly with local patches applied over the linux-based Hermes repo. You are pausing the workspace at the very end of Milestone 1.1 successfully. When you return, you basically have a clean slate to do what you want next.
|
||||
</context>
|
||||
|
||||
<next_action>
|
||||
Start with: `/gsd-complete-milestone` to clean up old phases, or `/gsd-add-phase` to inject a new Phase 07 to continue right away.
|
||||
</next_action>
|
||||
Reference in New Issue
Block a user