Files
guitar_score/.planning/codebase/ARCHITECTURE.md

1.5 KiB

ARCHITECTURE

System Design

The project is built as a sequential, multi-step Data Processing Pipeline processing raw video into a formatted A4 PDF. The primary entry point pushes the video through 5 logical steps:

  1. Download (Step 1): Fetches target YouTube video.
  2. Frame Extraction (Step 2): Opens OpenCV VideoCapture, skips frames based on DEFAULT_FPS to limit memory, and crops out non-meaningful content.
  3. Pattern Detection (Step 3): Classifies the scrolling behavior (e.g., scroll vs overlay) using temporal tracking and template matching.
  4. Frame Dedup & Stitching (Step 4/5): The core logic engine. Filters out duplicates caused by video pauses or rewind/D.S. al Coda behavior. Tracks pixel movements, stitches horizontal scrolling tabs, or stacks overlay pages using TemporalTracker.
  5. PDF Tiling (Step 6): Breaks stitched panoramas into A4 printable chunks and bounds them using layout metrics.

Key Subsystems

  • Temporal Tracker (video_cv_tracker.py): Tracks time-series differences between frames by evaluating column/row variations rather than relying entirely on brute-force image subtraction. This captures Page Flips cleanly.
  • Duplicate Prevention Engine: Employs a tiered validation:
    • Phase 1: Difference Hash (_dhash) clustering & Laplacian Variance.
    • Phase 2: Template match history memory (Catching identical choruses via distance vs time).
    • Phase 3: OCR measure verification (Ensuring numerical monotonicity).