# TESTING ## Test Suites & Scripts The application uses diagnostic and simulation scripts rather than traditional `unittest` or `pytest` suites due to the heavy reliance on Computer Vision and large video downloads. - `test_pipeline.py`: Acts as the primary integration test, running the e2e extraction over known sample URLs to verify no missing sections or regressions occur. - `scripts/debug/rigorous_validator.py`: A rigid assertion script used locally to guarantee extracted sequences don't fail OCR checks and maintain strict monotonicity. - `scripts/debug/test_full_ocr.py`: Isolated test bench for verifying EasyOCR accuracy and tuning bounding box coordinates before baking them into the main pipeline. ## Validation Methodologies Because validating computer vision outputs is visually subjective, 'tests' in this repository focus heavily on output metrics: - Number of discrete pages extracted vs expected. - Strict ascending sequence of OCR read measure numbers. - Absence of specific moving artifacts (e.g., the red/blue 'Playhead cursor').