1.0 KiB
1.0 KiB
TESTING
Test Suites & Scripts
The application uses diagnostic and simulation scripts rather than traditional unittest or pytest suites due to the heavy reliance on Computer Vision and large video downloads.
test_pipeline.py: Acts as the primary integration test, running the e2e extraction over known sample URLs to verify no missing sections or regressions occur.scripts/debug/rigorous_validator.py: A rigid assertion script used locally to guarantee extracted sequences don't fail OCR checks and maintain strict monotonicity.scripts/debug/test_full_ocr.py: Isolated test bench for verifying EasyOCR accuracy and tuning bounding box coordinates before baking them into the main pipeline.
Validation Methodologies
Because validating computer vision outputs is visually subjective, 'tests' in this repository focus heavily on output metrics:
- Number of discrete pages extracted vs expected.
- Strict ascending sequence of OCR read measure numbers.
- Absence of specific moving artifacts (e.g., the red/blue 'Playhead cursor').