docs: initialize project

This commit is contained in:
2026-03-28 22:17:48 +09:00
parent 7c14f3c040
commit 64ecc12d35
5 changed files with 120 additions and 0 deletions

36
.planning/PROJECT.md Normal file
View File

@@ -0,0 +1,36 @@
# Guitar Score Extraction Pipeline (youtube_tab_to_pdf v2)
## What This Is
This project aims to automate the extraction of guitar tablature from YouTube videos into clean, readable PDFs. The current objective (v2) is to completely rebuild the OpenCV-based "scroll" and "overlay" extraction pipeline from scratch (zero-based) to solve the chronic issue of missing/discontinuous measure numbers.
## Target Users
- Guitarists wanting to practice songs from YouTube covers without manually transcribing or struggling to pause the video.
## Core Value
100% reliable measure extraction without overlaps, repetitions, or jumps, resulting in a perfectly sequenced PDF score.
## Context
The previous implementation (`merge_panoramas_list` and `cv2.matchTemplate`) relied on horizontal scrolling offset matches which completely failed when similar-looking choruses or repeating measures appeared, leading to entire sections of the song being overwritten and skipped. Additionally, the OCR-based measure duplicate detection was too unstable due to video compression noise and differing fonts.
## Existing Capabilities (Brownfield)
- ✓ YouTube `yt-dlp` integration and 1080p -> 720p scaling.
- ✓ Frame extraction memory-efficient loop (`DEFAULT_FPS=2`).
- ✓ Target Tab color isolation (`_find_white_tab_strip`).
- ✓ PDF generation via `img2pdf`.
## Active Requirements
- [ ] Implement Temporal Tracking to measure pixel shift velocity ($v_x$) across frames instead of purely matching past panoramic bounds.
- [ ] Implement Time-Median Filter to erase moving playheads and animated cursors cleanly.
- [ ] Robustly detect Tab Staff Line horizontal rows.
- [ ] Slice continuous stream by strictly calculating elapsed $v_x$ distance rather than relying on unreliable OCR text or thin measure bars.
- [ ] Create rigorous test suite asserting 0 missing frames across reference videos (`video_1`, `video_2`, `video_3`).
## Key Decisions
| Decision | Rationale | Outcome |
|----------|-----------|---------|
| **Zero-Based Rebuild** | Legacy horizontal stitching math was fundamentally flawed for repeating melodies. | — Pending |
| **Separation of CV tracking** | `youtube_tab_to_pdf.py` is too heavy (914 lines), move CV logic to `video_cv_tracker.py`. | — Pending |
| **Time-Median Filter** | Necessary to remove the playhead cursor which interferes with continuous sequence matching. | — Pending |
---
*Last updated: 2026-03-28 after initialization*

20
.planning/REQUIREMENTS.md Normal file
View File

@@ -0,0 +1,20 @@
# Requirements
## Objective
Rebuild `youtube_tab_to_pdf.py` Computer Vision pipeline from the ground up to achieve 100% continuous measure extraction without skips or overwrites, primarily resolving the "discontinuous measure numbers" issue in YouTube guitar tabs.
## Scenarios
- **SCN-1: The Playhead Problem.** Videos often contain a vertical red/blue bar tracking the current play position. This cursor moves across the screen and disrupts image matching.
- **SCN-2: The Repeating Chorus Problem.** In music, measure 50 might look identical to measure 10. The system must not confuse current frame context with a previous frame 40 measures ago and overwrite the timeline.
- **SCN-3: Sub-optimal measure bars.** Videos compress measure bar lines making them hard to detect accurately, so the system must rely on chronological time-shift tracking.
## Acceptance Criteria
- [ ] `test_pipeline.py` passes for all 3 sample URLs showing no missing sections between start and end.
- [ ] Output panoramas/chunks are continuously ordered from start to finish without jumping back to an earlier identical part of the song.
- [ ] The moving playhead indicator is fully removed in the final PDF chunks.
- [ ] CV Logic is moved out of the main wrapper into a concise, easily testable `video_cv_tracker.py`.
## Out of Scope
- Building a UI/Frontend.
- Changing `yt-dlp` download logic.
- Supporting arbitrary instruments (Piano/Drums) other than 6-string Guitar Tabs.

29
.planning/ROADMAP.md Normal file
View File

@@ -0,0 +1,29 @@
# Roadmap
## Phase 1: CV Core Refactor (video_cv_tracker.py)
**Goal**: Isolate and establish the core computer vision algorithms needed for temporal continuous extraction of guitar tabs without OCR.
- **Dependencies**: None.
- **Plans**:
- `01-create-tracker.md`: Build `video_cv_tracker.py` introducing exactly three core functions: `extract_roi_median()` (playhead killer), `compute_pixel_shift()` (1D phase correlation tracking), and `stitch_temporally()` (append newly shifted columns only).
- `02-unit-tests.md`: Write minimal unit tests feeding dummy arrays simulating guitar chord lines scrolling to verify exactly $v_x$ shift is returned.
## Phase 2: Refactoring youtube_tab_to_pdf.py
**Goal**: Tear down the old logic and integrate the new temporal tracking mechanism.
- **Dependencies**: Phase 1.
- **Plans**:
- `01-remove-legacy.md`: Delete `cv2.matchTemplate`-heavy sprawling logic, the fragile `_merge_scroll_candidates()`, and unpredictable `_detect_measure_bars()`.
- `02-integrate-tracker.md`: Hook `extract_unique_scroll` directly to the `video_cv_tracker` generator and loop frames across time, returning one continuous panoramic image.
## Phase 3: Slicing & PDF Integration
**Goal**: Reliably chop the massive horizontal panoramic tab into A4 width segments.
- **Dependencies**: Phase 2.
- **Plans**:
- `01-robust-measure-chop.md`: Given a complete panorama, cut it blindly into fixed max chunk widths (simulating A4 line breaks) OR slice correctly using the tab color projection. Measure jumps are natively prevented by Phase 1.
- `02-pdf-export.md`: Hand off arrays back to the existing `img2pdf` PDF generation stack.
## Phase 4: Final Acceptance Testing
**Goal**: Execute test suite against `video_1` (晴る), `video_2` (新宝島), `video_3` (空奏列車).
- **Dependencies**: Phase 3.
- **Plans**:
- `01-execute-end-to-end.md`: Run `python test_pipeline.py`.
- `02-verify-output.md`: Visually inspect the `output/debug_frames/` panoramas to prove zero overlaps and strict chronological transcription.

19
.planning/STATE.md Normal file
View File

@@ -0,0 +1,19 @@
{
"project_started": "2026-03-28",
"decisions": [
{
"date": "2026-03-28",
"decision": "Abandon `cv2.matchTemplate` panoramic horizon matching.",
"rationale": "Repeating music patterns inherently break matching systems without temporal (time-based) anchors. If measures look identical, the algorithm maps current video frames back to old identical measures causing jump-cuts and data loss.",
"context": "YouTube guitar tabs almost always follow a linear chronological left-to-right scrolling pattern. Only velocity tracking is needed."
},
{
"date": "2026-03-28",
"decision": "Median filtering for playhead removal.",
"rationale": "Animated cursors traversing horizontal staves distort matching distances and visual clarity. Using a Time-Median filter over 5-7 frames permanently erases any rapid-moving objects (like a red cursor) leaving only the static or slowly scrolling background tab notes untouched.",
"context": "The user approved the RAM overhead of taking multiple frames."
}
],
"blockers": [],
"current_focus": "Phase 1: CV Core Refactor (video_cv_tracker.py)"
}

16
.planning/config.json Normal file
View File

@@ -0,0 +1,16 @@
{
"mode": "yolo",
"granularity": "standard",
"model_profile": "inherit",
"commit_docs": true,
"sub_repos": [],
"parallelization": true,
"workflow": {
"research": false,
"plan_check": false,
"verifier": true,
"nyquist_validation": true,
"auto_advance": true,
"_auto_chain_active": true
}
}