docs: initialize project
This commit is contained in:
36
.planning/PROJECT.md
Normal file
36
.planning/PROJECT.md
Normal file
@@ -0,0 +1,36 @@
|
||||
# Guitar Score Extraction Pipeline (youtube_tab_to_pdf v2)
|
||||
|
||||
## What This Is
|
||||
This project aims to automate the extraction of guitar tablature from YouTube videos into clean, readable PDFs. The current objective (v2) is to completely rebuild the OpenCV-based "scroll" and "overlay" extraction pipeline from scratch (zero-based) to solve the chronic issue of missing/discontinuous measure numbers.
|
||||
|
||||
## Target Users
|
||||
- Guitarists wanting to practice songs from YouTube covers without manually transcribing or struggling to pause the video.
|
||||
|
||||
## Core Value
|
||||
100% reliable measure extraction without overlaps, repetitions, or jumps, resulting in a perfectly sequenced PDF score.
|
||||
|
||||
## Context
|
||||
The previous implementation (`merge_panoramas_list` and `cv2.matchTemplate`) relied on horizontal scrolling offset matches which completely failed when similar-looking choruses or repeating measures appeared, leading to entire sections of the song being overwritten and skipped. Additionally, the OCR-based measure duplicate detection was too unstable due to video compression noise and differing fonts.
|
||||
|
||||
## Existing Capabilities (Brownfield)
|
||||
- ✓ YouTube `yt-dlp` integration and 1080p -> 720p scaling.
|
||||
- ✓ Frame extraction memory-efficient loop (`DEFAULT_FPS=2`).
|
||||
- ✓ Target Tab color isolation (`_find_white_tab_strip`).
|
||||
- ✓ PDF generation via `img2pdf`.
|
||||
|
||||
## Active Requirements
|
||||
- [ ] Implement Temporal Tracking to measure pixel shift velocity ($v_x$) across frames instead of purely matching past panoramic bounds.
|
||||
- [ ] Implement Time-Median Filter to erase moving playheads and animated cursors cleanly.
|
||||
- [ ] Robustly detect Tab Staff Line horizontal rows.
|
||||
- [ ] Slice continuous stream by strictly calculating elapsed $v_x$ distance rather than relying on unreliable OCR text or thin measure bars.
|
||||
- [ ] Create rigorous test suite asserting 0 missing frames across reference videos (`video_1`, `video_2`, `video_3`).
|
||||
|
||||
## Key Decisions
|
||||
| Decision | Rationale | Outcome |
|
||||
|----------|-----------|---------|
|
||||
| **Zero-Based Rebuild** | Legacy horizontal stitching math was fundamentally flawed for repeating melodies. | — Pending |
|
||||
| **Separation of CV tracking** | `youtube_tab_to_pdf.py` is too heavy (914 lines), move CV logic to `video_cv_tracker.py`. | — Pending |
|
||||
| **Time-Median Filter** | Necessary to remove the playhead cursor which interferes with continuous sequence matching. | — Pending |
|
||||
|
||||
---
|
||||
*Last updated: 2026-03-28 after initialization*
|
||||
20
.planning/REQUIREMENTS.md
Normal file
20
.planning/REQUIREMENTS.md
Normal file
@@ -0,0 +1,20 @@
|
||||
# Requirements
|
||||
|
||||
## Objective
|
||||
Rebuild `youtube_tab_to_pdf.py` Computer Vision pipeline from the ground up to achieve 100% continuous measure extraction without skips or overwrites, primarily resolving the "discontinuous measure numbers" issue in YouTube guitar tabs.
|
||||
|
||||
## Scenarios
|
||||
- **SCN-1: The Playhead Problem.** Videos often contain a vertical red/blue bar tracking the current play position. This cursor moves across the screen and disrupts image matching.
|
||||
- **SCN-2: The Repeating Chorus Problem.** In music, measure 50 might look identical to measure 10. The system must not confuse current frame context with a previous frame 40 measures ago and overwrite the timeline.
|
||||
- **SCN-3: Sub-optimal measure bars.** Videos compress measure bar lines making them hard to detect accurately, so the system must rely on chronological time-shift tracking.
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] `test_pipeline.py` passes for all 3 sample URLs showing no missing sections between start and end.
|
||||
- [ ] Output panoramas/chunks are continuously ordered from start to finish without jumping back to an earlier identical part of the song.
|
||||
- [ ] The moving playhead indicator is fully removed in the final PDF chunks.
|
||||
- [ ] CV Logic is moved out of the main wrapper into a concise, easily testable `video_cv_tracker.py`.
|
||||
|
||||
## Out of Scope
|
||||
- Building a UI/Frontend.
|
||||
- Changing `yt-dlp` download logic.
|
||||
- Supporting arbitrary instruments (Piano/Drums) other than 6-string Guitar Tabs.
|
||||
29
.planning/ROADMAP.md
Normal file
29
.planning/ROADMAP.md
Normal file
@@ -0,0 +1,29 @@
|
||||
# Roadmap
|
||||
|
||||
## Phase 1: CV Core Refactor (video_cv_tracker.py)
|
||||
**Goal**: Isolate and establish the core computer vision algorithms needed for temporal continuous extraction of guitar tabs without OCR.
|
||||
- **Dependencies**: None.
|
||||
- **Plans**:
|
||||
- `01-create-tracker.md`: Build `video_cv_tracker.py` introducing exactly three core functions: `extract_roi_median()` (playhead killer), `compute_pixel_shift()` (1D phase correlation tracking), and `stitch_temporally()` (append newly shifted columns only).
|
||||
- `02-unit-tests.md`: Write minimal unit tests feeding dummy arrays simulating guitar chord lines scrolling to verify exactly $v_x$ shift is returned.
|
||||
|
||||
## Phase 2: Refactoring youtube_tab_to_pdf.py
|
||||
**Goal**: Tear down the old logic and integrate the new temporal tracking mechanism.
|
||||
- **Dependencies**: Phase 1.
|
||||
- **Plans**:
|
||||
- `01-remove-legacy.md`: Delete `cv2.matchTemplate`-heavy sprawling logic, the fragile `_merge_scroll_candidates()`, and unpredictable `_detect_measure_bars()`.
|
||||
- `02-integrate-tracker.md`: Hook `extract_unique_scroll` directly to the `video_cv_tracker` generator and loop frames across time, returning one continuous panoramic image.
|
||||
|
||||
## Phase 3: Slicing & PDF Integration
|
||||
**Goal**: Reliably chop the massive horizontal panoramic tab into A4 width segments.
|
||||
- **Dependencies**: Phase 2.
|
||||
- **Plans**:
|
||||
- `01-robust-measure-chop.md`: Given a complete panorama, cut it blindly into fixed max chunk widths (simulating A4 line breaks) OR slice correctly using the tab color projection. Measure jumps are natively prevented by Phase 1.
|
||||
- `02-pdf-export.md`: Hand off arrays back to the existing `img2pdf` PDF generation stack.
|
||||
|
||||
## Phase 4: Final Acceptance Testing
|
||||
**Goal**: Execute test suite against `video_1` (晴る), `video_2` (新宝島), `video_3` (空奏列車).
|
||||
- **Dependencies**: Phase 3.
|
||||
- **Plans**:
|
||||
- `01-execute-end-to-end.md`: Run `python test_pipeline.py`.
|
||||
- `02-verify-output.md`: Visually inspect the `output/debug_frames/` panoramas to prove zero overlaps and strict chronological transcription.
|
||||
19
.planning/STATE.md
Normal file
19
.planning/STATE.md
Normal file
@@ -0,0 +1,19 @@
|
||||
{
|
||||
"project_started": "2026-03-28",
|
||||
"decisions": [
|
||||
{
|
||||
"date": "2026-03-28",
|
||||
"decision": "Abandon `cv2.matchTemplate` panoramic horizon matching.",
|
||||
"rationale": "Repeating music patterns inherently break matching systems without temporal (time-based) anchors. If measures look identical, the algorithm maps current video frames back to old identical measures causing jump-cuts and data loss.",
|
||||
"context": "YouTube guitar tabs almost always follow a linear chronological left-to-right scrolling pattern. Only velocity tracking is needed."
|
||||
},
|
||||
{
|
||||
"date": "2026-03-28",
|
||||
"decision": "Median filtering for playhead removal.",
|
||||
"rationale": "Animated cursors traversing horizontal staves distort matching distances and visual clarity. Using a Time-Median filter over 5-7 frames permanently erases any rapid-moving objects (like a red cursor) leaving only the static or slowly scrolling background tab notes untouched.",
|
||||
"context": "The user approved the RAM overhead of taking multiple frames."
|
||||
}
|
||||
],
|
||||
"blockers": [],
|
||||
"current_focus": "Phase 1: CV Core Refactor (video_cv_tracker.py)"
|
||||
}
|
||||
16
.planning/config.json
Normal file
16
.planning/config.json
Normal file
@@ -0,0 +1,16 @@
|
||||
{
|
||||
"mode": "yolo",
|
||||
"granularity": "standard",
|
||||
"model_profile": "inherit",
|
||||
"commit_docs": true,
|
||||
"sub_repos": [],
|
||||
"parallelization": true,
|
||||
"workflow": {
|
||||
"research": false,
|
||||
"plan_check": false,
|
||||
"verifier": true,
|
||||
"nyquist_validation": true,
|
||||
"auto_advance": true,
|
||||
"_auto_chain_active": true
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user