1.5 KiB
1.5 KiB
ARCHITECTURE
System Design
The project is built as a sequential, multi-step Data Processing Pipeline processing raw video into a formatted A4 PDF. The primary entry point pushes the video through 5 logical steps:
- Download (
Step 1): Fetches target YouTube video. - Frame Extraction (
Step 2): Opens OpenCV VideoCapture, skips frames based onDEFAULT_FPSto limit memory, and crops out non-meaningful content. - Pattern Detection (
Step 3): Classifies the scrolling behavior (e.g.,scrollvsoverlay) using temporal tracking and template matching. - Frame Dedup & Stitching (
Step 4/5): The core logic engine. Filters out duplicates caused by video pauses or rewind/D.S. al Coda behavior. Tracks pixel movements, stitches horizontal scrolling tabs, or stacks overlay pages usingTemporalTracker. - PDF Tiling (
Step 6): Breaks stitched panoramas into A4 printable chunks and bounds them using layout metrics.
Key Subsystems
- Temporal Tracker (
video_cv_tracker.py): Tracks time-series differences between frames by evaluating column/row variations rather than relying entirely on brute-force image subtraction. This captures Page Flips cleanly. - Duplicate Prevention Engine: Employs a tiered validation:
- Phase 1: Difference Hash (
_dhash) clustering & Laplacian Variance. - Phase 2: Template match history memory (Catching identical choruses via distance vs time).
- Phase 3: OCR measure verification (Ensuring numerical monotonicity).
- Phase 1: Difference Hash (