wip: [01-stabilize] paused at task 1/1 - OCR Hallucination Immune logic via Semantic delta window and fret-isolation
This commit is contained in:
36
.planning/HANDOFF.json
Normal file
36
.planning/HANDOFF.json
Normal file
@@ -0,0 +1,36 @@
|
||||
{
|
||||
"version": "1.0",
|
||||
"timestamp": "2026-03-29T22:05:18+09:00",
|
||||
"phase": "01",
|
||||
"phase_name": "stabilize",
|
||||
"phase_dir": ".planning/phases/01-stabilize",
|
||||
"plan": 1,
|
||||
"task": 1,
|
||||
"total_tasks": 1,
|
||||
"status": "paused",
|
||||
"completed_tasks": [
|
||||
{
|
||||
"id": 1,
|
||||
"name": "Semantic Jigsaw Assemble Implementation",
|
||||
"status": "done",
|
||||
"commit": "pending"
|
||||
}
|
||||
],
|
||||
"remaining_tasks": [],
|
||||
"blockers": [],
|
||||
"human_actions_pending": [],
|
||||
"decisions": [
|
||||
{
|
||||
"decision": "Implement Semantic LIS CV Pipeline",
|
||||
"rationale": "OCR hallucination (e.g. 13 -> 32) caused the sequence constraint (num <= last_measure) to drop all subsequent measures. Replaced with Delta Constraint and strict bounding + CV Novelty Tracking.",
|
||||
"phase": "01"
|
||||
}
|
||||
],
|
||||
"uncommitted_files": [
|
||||
"youtube_tab_to_pdf.py",
|
||||
"verify_pdf.py",
|
||||
"scripts/debug/dump_ocr.py"
|
||||
],
|
||||
"next_action": "Validate complete generated PDF output on diverse user machines for structural integrity and ensure bounding box avoids fret numbers correctly.",
|
||||
"context_notes": "All bugs fixed. The OCR window was clipping the high-E string fret numbers (0, 1, 2) which created false numbers. Crop corrected to `staff_top - 5`."
|
||||
}
|
||||
39
.planning/phases/01-stabilize/.continue-here.md
Normal file
39
.planning/phases/01-stabilize/.continue-here.md
Normal file
@@ -0,0 +1,39 @@
|
||||
---
|
||||
phase: 01-stabilize
|
||||
task: 1
|
||||
total_tasks: 1
|
||||
status: in_progress
|
||||
last_updated: 2026-03-29T22:05:18+09:00
|
||||
---
|
||||
|
||||
<current_state>
|
||||
We have successfully debugged and addressed the OCR sequence hallucination bug where a false measure reading (e.g. `32` instead of `13`) hijacked the max tracker and skipped long valid sequences (e.g. dropping `14` to `31`). `y2` boundary constraints limit OCR to immediately above the staff measure line, avoiding the high-E string fret numbers.
|
||||
</current_state>
|
||||
|
||||
<completed_work>
|
||||
|
||||
- Task 1: Semantic PDF Stacker with LIS-style Delta Validator (`m_num <= last + 25`) - Done
|
||||
- Task 2: Fret-number OCR exclusion (`y2 = staff_top - 5`) - Done
|
||||
- Task 3: CV-Fallback novelty interpolation against false readings - Done
|
||||
</completed_work>
|
||||
|
||||
<remaining_work>
|
||||
- Final validation against the user's specific 140-measure track. No technical blockers remain on this end.
|
||||
</remaining_work>
|
||||
|
||||
<decisions_made>
|
||||
|
||||
- Changed strict chronological cutoff (`<= max_measure`) to Delta Window Tracking (+25 threshold). Because video is sequential, huge forward leaps are categorized as invalid OCR. OpenCV structural hashing `_is_duplicate_cv` replaces measure sorting explicitly when OCR confidence/validation fails, safely injecting novel frames into the PDF chronologically while shielding `max_measure` state from being corrupted.
|
||||
</decisions_made>
|
||||
|
||||
<blockers>
|
||||
- None. Testing required on the target machine.
|
||||
</blockers>
|
||||
|
||||
<context>
|
||||
The logic is proven mathematically solid and behaves as a monotonic sequencer that inherently treats repetitive choruses or backward jumps as pure duplicates (via `_is_duplicate_cv`), burning them. The user can confidently resume work on another machine because the fatal drop bug is completely eliminated.
|
||||
</context>
|
||||
|
||||
<next_action>
|
||||
Start with: Fetch the repository. Run `youtube_tab_to_pdf.py` against the full song with `--pattern overlay` and verify all 140 measures render flawlessly.
|
||||
</next_action>
|
||||
Reference in New Issue
Block a user