wip: [01-stabilize] paused at task 1/1 - OCR Hallucination Immune logic via Semantic delta window and fret-isolation

This commit is contained in:
2026-03-29 22:08:40 +09:00
parent aca7bf592a
commit 2507de45d3
4289 changed files with 732689 additions and 28672 deletions

View File

@@ -0,0 +1,45 @@
---
name: do
description: Execute a phased implementation plan using subagents. Use when asked to execute, run, or carry out a plan — especially one created by make-plan.
---
# Do Plan
You are an ORCHESTRATOR. Deploy subagents to execute *all* work. Do not do the work yourself except to coordinate, route context, and verify that each subagent completed its assigned checklist.
## Execution Protocol
### Rules
- Each phase uses fresh subagents where noted (or when context is large/unclear)
- Assign one clear objective per subagent and require evidence (commands run, outputs, files changed)
- Do not advance to the next step until the assigned subagent reports completion and the orchestrator confirms it matches the plan
### During Each Phase
Deploy an "Implementation" subagent to:
1. Execute the implementation as specified
2. COPY patterns from documentation, don't invent
3. Cite documentation sources in code comments when using unfamiliar APIs
4. If an API seems missing, STOP and verify — don't assume it exists
### After Each Phase
Deploy subagents for each post-phase responsibility:
1. **Run verification checklist** — Deploy a "Verification" subagent to prove the phase worked
2. **Anti-pattern check** — Deploy an "Anti-pattern" subagent to grep for known bad patterns from the plan
3. **Code quality review** — Deploy a "Code Quality" subagent to review changes
4. **Commit only if verified** — Deploy a "Commit" subagent *only after* verification passes; otherwise, do not commit
### Between Phases
Deploy a "Branch/Sync" subagent to:
- Push to working branch after each verified phase
- Prepare the next phase handoff so the next phase's subagents start fresh but have plan context
## Failure Modes to Prevent
- Don't invent APIs that "should" exist — verify against docs
- Don't add undocumented parameters — copy exact signatures
- Don't skip verification — deploy a verification subagent and run the checklist
- Don't commit before verification passes (or without explicit orchestrator approval)

View File

@@ -0,0 +1,63 @@
---
name: make-plan
description: Create a detailed, phased implementation plan with documentation discovery. Use when asked to plan a feature, task, or multi-step implementation — especially before executing with do.
---
# Make Plan
You are an ORCHESTRATOR. Create an LLM-friendly plan in phases that can be executed consecutively in new chat contexts.
## Delegation Model
Use subagents for *fact gathering and extraction* (docs, examples, signatures, grep results). Keep *synthesis and plan authoring* with the orchestrator (phase boundaries, task framing, final wording). If a subagent report is incomplete or lacks evidence, re-check with targeted reads/greps before finalizing.
### Subagent Reporting Contract (MANDATORY)
Each subagent response must include:
1. Sources consulted (files/URLs) and what was read
2. Concrete findings (exact API names/signatures; exact file paths/locations)
3. Copy-ready snippet locations (example files/sections to copy)
4. "Confidence" note + known gaps (what might still be missing)
Reject and redeploy the subagent if it reports conclusions without sources.
## Plan Structure
### Phase 0: Documentation Discovery (ALWAYS FIRST)
Before planning implementation, deploy "Documentation Discovery" subagents to:
1. Search for and read relevant documentation, examples, and existing patterns
2. Identify the actual APIs, methods, and signatures available (not assumed)
3. Create a brief "Allowed APIs" list citing specific documentation sources
4. Note any anti-patterns to avoid (methods that DON'T exist, deprecated parameters)
The orchestrator consolidates findings into a single Phase 0 output.
### Each Implementation Phase Must Include
1. **What to implement** — Frame tasks to COPY from docs, not transform existing code
- Good: "Copy the V2 session pattern from docs/examples.ts:45-60"
- Bad: "Migrate the existing code to V2"
2. **Documentation references** — Cite specific files/lines for patterns to follow
3. **Verification checklist** — How to prove this phase worked (tests, grep checks)
4. **Anti-pattern guards** — What NOT to do (invented APIs, undocumented params)
### Final Phase: Verification
1. Verify all implementations match documentation
2. Check for anti-patterns (grep for known bad patterns)
3. Run tests to confirm functionality
## Key Principles
- Documentation Availability ≠ Usage: Explicitly require reading docs
- Task Framing Matters: Direct agents to docs, not just outcomes
- Verify > Assume: Require proof, not assumptions about APIs
- Session Boundaries: Each phase should be self-contained with its own doc references
## Anti-Patterns to Prevent
- Inventing API methods that "should" exist
- Adding parameters not in documentation
- Skipping verification steps
- Assuming structure without checking examples

View File

@@ -0,0 +1,127 @@
---
name: mem-search
description: Search claude-mem's persistent cross-session memory database. Use when user asks "did we already solve this?", "how did we do X last time?", or needs work from previous sessions.
---
# Memory Search
Search past work across all sessions. Simple workflow: search -> filter -> fetch.
## When to Use
Use when users ask about PREVIOUS sessions (not current conversation):
- "Did we already fix this?"
- "How did we solve X last time?"
- "What happened last week?"
## 3-Layer Workflow (ALWAYS Follow)
**NEVER fetch full details without filtering first. 10x token savings.**
### Step 1: Search - Get Index with IDs
Use the `search` MCP tool:
```
search(query="authentication", limit=20, project="my-project")
```
**Returns:** Table with IDs, timestamps, types, titles (~50-100 tokens/result)
```
| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #11131 | 3:48 PM | 🟣 | Added JWT authentication | ~75 |
| #10942 | 2:15 PM | 🔴 | Fixed auth token expiration | ~50 |
```
**Parameters:**
- `query` (string) - Search term
- `limit` (number) - Max results, default 20, max 100
- `project` (string) - Project name filter
- `type` (string, optional) - "observations", "sessions", or "prompts"
- `obs_type` (string, optional) - Comma-separated: bugfix, feature, decision, discovery, change
- `dateStart` (string, optional) - YYYY-MM-DD or epoch ms
- `dateEnd` (string, optional) - YYYY-MM-DD or epoch ms
- `offset` (number, optional) - Skip N results
- `orderBy` (string, optional) - "date_desc" (default), "date_asc", "relevance"
### Step 2: Timeline - Get Context Around Interesting Results
Use the `timeline` MCP tool:
```
timeline(anchor=11131, depth_before=3, depth_after=3, project="my-project")
```
Or find anchor automatically from query:
```
timeline(query="authentication", depth_before=3, depth_after=3, project="my-project")
```
**Returns:** `depth_before + 1 + depth_after` items in chronological order with observations, sessions, and prompts interleaved around the anchor.
**Parameters:**
- `anchor` (number, optional) - Observation ID to center around
- `query` (string, optional) - Find anchor automatically if anchor not provided
- `depth_before` (number, optional) - Items before anchor, default 5, max 20
- `depth_after` (number, optional) - Items after anchor, default 5, max 20
- `project` (string) - Project name filter
### Step 3: Fetch - Get Full Details ONLY for Filtered IDs
Review titles from Step 1 and context from Step 2. Pick relevant IDs. Discard the rest.
Use the `get_observations` MCP tool:
```
get_observations(ids=[11131, 10942])
```
**ALWAYS use `get_observations` for 2+ observations - single request vs N requests.**
**Parameters:**
- `ids` (array of numbers, required) - Observation IDs to fetch
- `orderBy` (string, optional) - "date_desc" (default), "date_asc"
- `limit` (number, optional) - Max observations to return
- `project` (string, optional) - Project name filter
**Returns:** Complete observation objects with title, subtitle, narrative, facts, concepts, files (~500-1000 tokens each)
## Examples
**Find recent bug fixes:**
```
search(query="bug", type="observations", obs_type="bugfix", limit=20, project="my-project")
```
**Find what happened last week:**
```
search(type="observations", dateStart="2025-11-11", limit=20, project="my-project")
```
**Understand context around a discovery:**
```
timeline(anchor=11131, depth_before=5, depth_after=5, project="my-project")
```
**Batch fetch details:**
```
get_observations(ids=[11131, 10942, 10855], orderBy="date_desc")
```
## Why This Workflow?
- **Search index:** ~50-100 tokens per result
- **Full observation:** ~500-1000 tokens each
- **Batch fetch:** 1 HTTP request vs N individual requests
- **10x token savings** by filtering before fetching

View File

@@ -0,0 +1,145 @@
---
name: smart-explore
description: Token-optimized structural code search using tree-sitter AST parsing. Use instead of reading full files when you need to understand code structure, find functions, or explore a codebase efficiently.
---
# Smart Explore
Structural code exploration using AST parsing. **This skill overrides your default exploration behavior.** While this skill is active, use smart_search/smart_outline/smart_unfold as your primary tools instead of Read, Grep, and Glob.
**Core principle:** Index first, fetch on demand. Give yourself a map of the code before loading implementation details. The question before every file read should be: "do I need to see all of this, or can I get a structural overview first?" The answer is almost always: get the map.
## Your Next Tool Call
This skill only loads instructions. You must call the MCP tools yourself. Your next action should be one of:
```
smart_search(query="<topic>", path="./src") -- discover files + symbols across a directory
smart_outline(file_path="<file>") -- structural skeleton of one file
smart_unfold(file_path="<file>", symbol_name="<name>") -- full source of one symbol
```
Do NOT run Grep, Glob, Read, or find to discover files first. `smart_search` walks directories, parses all code files, and returns ranked symbols in one call. It replaces the Glob → Grep → Read discovery cycle.
## 3-Layer Workflow
### Step 1: Search -- Discover Files and Symbols
```
smart_search(query="shutdown", path="./src", max_results=15)
```
**Returns:** Ranked symbols with signatures, line numbers, match reasons, plus folded file views (~2-6k tokens)
```
-- Matching Symbols --
function performGracefulShutdown (services/infrastructure/GracefulShutdown.ts:56)
function httpShutdown (services/infrastructure/HealthMonitor.ts:92)
method WorkerService.shutdown (services/worker-service.ts:846)
-- Folded File Views --
services/infrastructure/GracefulShutdown.ts (7 symbols)
services/worker-service.ts (12 symbols)
```
This is your discovery tool. It finds relevant files AND shows their structure. No Glob/find pre-scan needed.
**Parameters:**
- `query` (string, required) -- What to search for (function name, concept, class name)
- `path` (string) -- Root directory to search (defaults to cwd)
- `max_results` (number) -- Max matching symbols, default 20, max 50
- `file_pattern` (string, optional) -- Filter to specific files/paths
### Step 2: Outline -- Get File Structure
```
smart_outline(file_path="services/worker-service.ts")
```
**Returns:** Complete structural skeleton -- all functions, classes, methods, properties, imports (~1-2k tokens per file)
**Skip this step** when Step 1's folded file views already provide enough structure. Most useful for files not covered by the search results.
**Parameters:**
- `file_path` (string, required) -- Path to the file
### Step 3: Unfold -- See Implementation
Review symbols from Steps 1-2. Pick the ones you need. Unfold only those:
```
smart_unfold(file_path="services/worker-service.ts", symbol_name="shutdown")
```
**Returns:** Full source code of the specified symbol including JSDoc, decorators, and complete implementation (~400-2,100 tokens depending on symbol size). AST node boundaries guarantee completeness regardless of symbol size — unlike Read + agent summarization, which may truncate long methods.
**Parameters:**
- `file_path` (string, required) -- Path to the file (as returned by search/outline)
- `symbol_name` (string, required) -- Name of the function/class/method to expand
## When to Use Standard Tools Instead
Use these only when smart_* tools are the wrong fit:
- **Grep:** Exact string/regex search ("find all TODO comments", "where is `ensureWorkerStarted` defined?")
- **Read:** Small files under ~100 lines, non-code files (JSON, markdown, config)
- **Glob:** File path patterns ("find all test files")
- **Explore agent:** When you need synthesized understanding across 6+ files, architecture narratives, or answers to open-ended questions like "how does this entire system work end-to-end?" Smart-explore is a scalpel — it answers "where is this?" and "show me that." It doesn't synthesize cross-file data flows, design decisions, or edge cases across an entire feature.
For code files over ~100 lines, prefer smart_outline + smart_unfold over Read.
## Workflow Examples
**Discover how a feature works (cross-cutting):**
```
1. smart_search(query="shutdown", path="./src")
-> 14 symbols across 7 files, full picture in one call
2. smart_unfold(file_path="services/infrastructure/GracefulShutdown.ts", symbol_name="performGracefulShutdown")
-> See the core implementation
```
**Navigate a large file:**
```
1. smart_outline(file_path="services/worker-service.ts")
-> 1,466 tokens: 12 functions, WorkerService class with 24 members
2. smart_unfold(file_path="services/worker-service.ts", symbol_name="startSessionProcessor")
-> 1,610 tokens: the specific method you need
Total: ~3,076 tokens vs ~12,000 to Read the full file
```
**Write documentation about code (hybrid workflow):**
```
1. smart_search(query="feature name", path="./src") -- discover all relevant files and symbols
2. smart_outline on key files -- understand structure
3. smart_unfold on important functions -- get implementation details
4. Read on small config/markdown/plan files -- get non-code context
```
Use smart_* tools for code exploration, Read for non-code files. Mix freely.
**Exploration then precision:**
```
1. smart_search(query="session", path="./src", max_results=10)
-> 10 ranked symbols: SessionMetadata, SessionQueueProcessor, SessionSummary...
2. Pick the relevant one, unfold it
```
## Token Economics
| Approach | Tokens | Use Case |
|----------|--------|----------|
| smart_outline | ~1,000-2,000 | "What's in this file?" |
| smart_unfold | ~400-2,100 | "Show me this function" |
| smart_search | ~2,000-6,000 | "Find all X across the codebase" |
| search + unfold | ~3,000-8,000 | End-to-end: find and read (the primary workflow) |
| Read (full file) | ~12,000+ | When you truly need everything |
| Explore agent | ~39,000-59,000 | Cross-file synthesis with narrative |
**4-8x savings** on file understanding (outline + unfold vs Read). **11-18x savings** on codebase exploration vs Explore agent. The narrower the query, the wider the gap — a 27-line function costs 55x less to read via unfold than via an Explore agent, because the agent still reads the entire file.

View File

@@ -0,0 +1,203 @@
---
name: timeline-report
description: Generate a "Journey Into [Project]" narrative report analyzing a project's entire development history from claude-mem's timeline. Use when asked for a timeline report, project history analysis, development journey, or full project report.
---
# Timeline Report
Generate a comprehensive narrative analysis of a project's entire development history using claude-mem's persistent memory timeline.
## When to Use
Use when users ask for:
- "Write a timeline report"
- "Journey into [project]"
- "Analyze my project history"
- "Full project report"
- "Summarize the entire development history"
- "What's the story of this project?"
## Prerequisites
The claude-mem worker must be running on localhost:37777. The project must have claude-mem observations recorded.
## Workflow
### Step 1: Determine the Project Name
Ask the user which project to analyze if not obvious from context. The project name is typically the directory name of the project (e.g., "tokyo", "my-app"). If the user says "this project", use the current working directory's basename.
**Worktree Detection:** Before using the directory basename, check if the current directory is a git worktree. In a worktree, the data source is the **parent project**, not the worktree directory itself. Run:
```bash
git_dir=$(git rev-parse --git-dir 2>/dev/null)
git_common_dir=$(git rev-parse --git-common-dir 2>/dev/null)
if [ "$git_dir" != "$git_common_dir" ]; then
# We're in a worktree — resolve the parent project name
parent_project=$(basename "$(dirname "$git_common_dir")")
echo "Worktree detected. Parent project: $parent_project"
else
parent_project=$(basename "$PWD")
fi
echo "$parent_project"
```
If a worktree is detected, use `$parent_project` (the basename of the parent repo) as the project name for all API calls. Inform the user: "Detected git worktree. Using parent project '[name]' as the data source."
### Step 2: Fetch the Full Timeline
Use Bash to fetch the complete timeline from the claude-mem worker API:
```bash
curl -s "http://localhost:37777/api/context/inject?project=PROJECT_NAME&full=true"
```
This returns the entire compressed timeline -- every observation, session boundary, and summary across the project's full history. The response is pre-formatted markdown optimized for LLM consumption.
**Token estimates:** The full timeline size depends on the project's history:
- Small project (< 1,000 observations): ~20-50K tokens
- Medium project (1,000-10,000 observations): ~50-300K tokens
- Large project (10,000-35,000 observations): ~300-750K tokens
If the response is empty or returns an error, the worker may not be running or the project name may be wrong. Try `curl -s "http://localhost:37777/api/search?query=*&limit=1"` to verify the worker is healthy.
### Step 3: Estimate Token Count
Before proceeding, estimate the token count of the fetched timeline (roughly 1 token per 4 characters). Report this to the user:
```
Timeline fetched: ~X observations, estimated ~Yk tokens.
This analysis will consume approximately Yk input tokens + ~5-10k output tokens.
Proceed? (y/n)
```
Wait for user confirmation before continuing if the timeline exceeds 100K tokens.
### Step 4: Analyze with a Subagent
Deploy an Agent (using the Task tool) with the full timeline and the following analysis prompt. Pass the ENTIRE timeline as context to the agent. The agent should also be instructed to query the SQLite database at `~/.claude-mem/claude-mem.db` for the Token Economics section.
**Agent prompt:**
```
You are a technical historian analyzing a software project's complete development timeline from claude-mem's persistent memory system. The timeline below contains every observation, session boundary, and summary recorded across the project's entire history.
You also have access to the claude-mem SQLite database at ~/.claude-mem/claude-mem.db. Use it to run queries for the Token Economics & Memory ROI section. The database has an "observations" table with columns: id, memory_session_id, project, text, type, title, subtitle, facts, narrative, concepts, files_read, files_modified, prompt_number, discovery_tokens, created_at, created_at_epoch, source_tool, source_input_summary.
Write a comprehensive narrative report titled "Journey Into [PROJECT_NAME]" that covers:
## Required Sections
1. **Project Genesis** -- When and how the project started. What were the first commits, the initial vision, the founding technical decisions? What problem was being solved?
2. **Architectural Evolution** -- How did the architecture change over time? What were the major pivots? Why did they happen? Trace the evolution from initial design through each significant restructuring.
3. **Key Breakthroughs** -- Identify the "aha" moments: when a difficult problem was finally solved, when a new approach unlocked progress, when a prototype first worked. These are the observations where the tone shifts from investigation to resolution.
4. **Work Patterns** -- Analyze the rhythm of development. Identify debugging cycles (clusters of bug fixes), feature sprints (rapid observation sequences), refactoring phases (architectural changes without new features), and exploration phases (many discoveries without changes).
5. **Technical Debt** -- Track where shortcuts were taken and when they were paid back. Identify patterns of accumulation (rapid feature work) and resolution (dedicated refactoring sessions).
6. **Challenges and Debugging Sagas** -- The hardest problems encountered. Multi-session debugging efforts, architectural dead-ends that required backtracking, platform-specific issues that took days to resolve.
7. **Memory and Continuity** -- How did persistent memory (claude-mem itself, if applicable) affect the development process? Were there moments where recalled context from prior sessions saved significant time or prevented repeated mistakes?
8. **Token Economics & Memory ROI** -- Quantitative analysis of how memory recall saved work:
- Query the database directly for these metrics using `sqlite3 ~/.claude-mem/claude-mem.db`
- Count total discovery_tokens across all observations (the original cost of all work)
- Count sessions that had context injection available (sessions after the first)
- Calculate the compression ratio: average discovery_tokens vs average read_tokens per observation
- Identify the highest-value observations (highest discovery_tokens -- these are the most expensive decisions, bugs, and discoveries that memory prevents re-doing)
- Identify explicit recall events (observations where source_tool contains "search", "smart_search", "get_observations", "timeline", or where narrative mentions "recalled", "from memory", "previous session")
- Estimate passive recall savings: each session with context injection receives ~50 observations. Use a 30% relevance factor (conservative estimate that 30% of injected context prevents re-work). Savings = sessions_with_context × avg_discovery_value_of_50_obs_window × 0.30
- Estimate explicit recall savings: ~10K tokens per explicit recall query
- Calculate net ROI: total_savings / total_read_tokens_invested
- Present as a table with monthly breakdown
- Highlight the top 5 most expensive observations by discovery_tokens -- these represent the highest-value memories in the system (architecture decisions, hard bugs, implementation plans that cost 100K+ tokens to produce originally)
Use these SQL queries as a starting point:
```sql
-- Total discovery tokens
SELECT SUM(discovery_tokens) FROM observations WHERE project = 'PROJECT_NAME';
-- Sessions with context available (not the first session)
SELECT COUNT(DISTINCT memory_session_id) FROM observations WHERE project = 'PROJECT_NAME';
-- Average tokens per observation
SELECT AVG(discovery_tokens) as avg_discovery, AVG(LENGTH(title || COALESCE(subtitle,'') || COALESCE(narrative,'') || COALESCE(facts,'')) / 4) as avg_read FROM observations WHERE project = 'PROJECT_NAME' AND discovery_tokens > 0;
-- Top 5 most expensive observations (highest-value memories)
SELECT id, title, discovery_tokens FROM observations WHERE project = 'PROJECT_NAME' ORDER BY discovery_tokens DESC LIMIT 5;
-- Monthly breakdown
SELECT strftime('%Y-%m', created_at) as month, COUNT(*) as obs, SUM(discovery_tokens) as total_discovery, COUNT(DISTINCT memory_session_id) as sessions FROM observations WHERE project = 'PROJECT_NAME' GROUP BY month ORDER BY month;
-- Explicit recall events
SELECT COUNT(*) FROM observations WHERE project = 'PROJECT_NAME' AND (source_tool LIKE '%search%' OR source_tool LIKE '%timeline%' OR source_tool LIKE '%get_observations%' OR narrative LIKE '%recalled%' OR narrative LIKE '%from memory%' OR narrative LIKE '%previous session%');
```
9. **Timeline Statistics** -- Quantitative summary:
- Date range (first observation to last)
- Total observations and sessions
- Breakdown by observation type (features, bug fixes, discoveries, decisions, changes)
- Most active days/weeks
- Longest debugging sessions
10. **Lessons and Meta-Observations** -- What patterns emerge from the full history? What would a new developer learn about this codebase from reading the timeline? What recurring themes or principles guided development?
## Writing Style
- Write as a technical narrative, not a list of bullet points
- Use specific observation IDs and timestamps when referencing events (e.g., "On Dec 14 (#26766), the root cause was finally identified...")
- Connect events across time -- show how early decisions created later consequences
- Be honest about struggles and dead ends, not just successes
- Target 3,000-6,000 words depending on project size
- Use markdown formatting with headers, emphasis, and code references where appropriate
## Important
- Analyze the ENTIRE timeline chronologically -- do not skip early history
- Look for narrative arcs: problem -> investigation -> solution
- Identify turning points where the project's direction fundamentally changed
- Note any observations about the development process itself (tooling, workflow, collaboration patterns)
Here is the complete project timeline:
[TIMELINE CONTENT GOES HERE]
```
### Step 5: Save the Report
Save the agent's output as a markdown file. Default location:
```
./journey-into-PROJECT_NAME.md
```
Or if the user specified a different output path, use that instead.
### Step 6: Report Completion
Tell the user:
- Where the report was saved
- The approximate token cost (input timeline + output report)
- The date range covered
- Number of observations analyzed
## Error Handling
- **Empty timeline:** "No observations found for project 'X'. Check the project name with: `curl -s 'http://localhost:37777/api/search?query=*&limit=1'`"
- **Worker not running:** "The claude-mem worker is not responding on port 37777. Start it with your usual method or check `ps aux | grep worker-service`."
- **Timeline too large:** For projects with 50,000+ observations, the timeline may exceed context limits. Suggest using date range filtering: `curl -s "http://localhost:37777/api/context/inject?project=X&full=true"` -- the current endpoint returns all observations; for extremely large projects, the user may want to analyze in time-windowed segments.
## Example
User: "Write a journey report for the tokyo project"
1. Fetch: `curl -s "http://localhost:37777/api/context/inject?project=tokyo&full=true"`
2. Estimate: "Timeline fetched: ~34,722 observations, estimated ~718K tokens. Proceed?"
3. User confirms
4. Deploy analysis agent with full timeline
5. Save to `./journey-into-tokyo.md`
6. Report: "Report saved. Analyzed 34,722 observations spanning Oct 2025 - Mar 2026 (~718K input tokens, ~8K output tokens)."