Comprehensive testing documentation for the Skill Seeker project.
# Run all tests
python3 run_tests.py
# Run all tests with verbose output
python3 run_tests.py -v
# Run specific test suite
python3 run_tests.py --suite config
python3 run_tests.py --suite features
python3 run_tests.py --suite integration
# Stop on first failure
python3 run_tests.py --failfast
# List all available tests
python3 run_tests.py --list
tests/
├── __init__.py # Test package marker
├── test_config_validation.py # Config validation tests (30+ tests)
├── test_scraper_features.py # Core feature tests (25+ tests)
├── test_integration.py # Integration tests (15+ tests)
├── test_pdf_extractor.py # PDF extraction tests (23 tests)
├── test_pdf_scraper.py # PDF workflow tests (18 tests)
└── test_pdf_advanced_features.py # PDF advanced features (26 tests) NEW
test_config_validation.py)Tests the validate_config() function with comprehensive coverage.
Test Categories:
name, base_url)Example Test:
def test_valid_complete_config(self):
"""Test valid complete configuration"""
config = {
'name': 'godot',
'base_url': 'https://docs.godotengine.org/en/stable/',
'selectors': {
'main_content': 'div[role="main"]',
'title': 'title',
'code_blocks': 'pre code'
},
'rate_limit': 0.5,
'max_pages': 500
}
errors = validate_config(config)
self.assertEqual(len(errors), 0)
Running:
python3 run_tests.py --suite config -v
test_scraper_features.py)Tests core scraper functionality including URL validation, language detection, pattern extraction, and categorization.
Test Categories:
URL Validation:
Language Detection:
language-*, lang-*)Pattern Extraction:
Categorization:
Text Cleaning:
Example Test:
def test_detect_python_from_heuristics(self):
"""Test Python detection from code content"""
html = '<code>import os\nfrom pathlib import Path</code>'
elem = BeautifulSoup(html, 'html.parser').find('code')
lang = self.converter.detect_language(elem, elem.get_text())
self.assertEqual(lang, 'python')
Running:
python3 run_tests.py --suite features -v
test_integration.py)Tests complete workflows and interactions between components.
Test Categories:
Dry-Run Mode:
Config Loading:
Real Config Validation:
URL Processing:
Content Extraction:
Example Test:
def test_dry_run_no_directories_created(self):
"""Test that dry-run mode doesn't create directories"""
converter = DocToSkillConverter(self.config, dry_run=True)
data_dir = Path(f"output/{self.config['name']}_data")
skill_dir = Path(f"output/{self.config['name']}")
self.assertFalse(data_dir.exists())
self.assertFalse(skill_dir.exists())
Running:
python3 run_tests.py --suite integration -v
test_pdf_extractor.py) NEWTests PDF content extraction functionality (B1.2-B1.5).
Note: These tests require PyMuPDF (pip install PyMuPDF). They will be skipped if not installed.
Test Categories:
Language Detection (5 tests):
Syntax Validation (5 tests):
Quality Scoring (4 tests):
Chapter Detection (4 tests):
Code Block Merging (2 tests):
Code Detection Methods (2 tests):
Quality Filtering (1 test):
Example Test:
def test_detect_python_with_confidence(self):
"""Test Python detection returns language and confidence"""
extractor = self.PDFExtractor.__new__(self.PDFExtractor)
code = "def hello():\n print('world')\n return True"
language, confidence = extractor.detect_language_from_code(code)
self.assertEqual(language, "python")
self.assertGreater(confidence, 0.7)
self.assertLessEqual(confidence, 1.0)
Running:
python3 -m pytest tests/test_pdf_extractor.py -v
test_pdf_scraper.py) NEWTests PDF to skill conversion workflow (B1.6).
Note: These tests require PyMuPDF (pip install PyMuPDF). They will be skipped if not installed.
Test Categories:
PDFToSkillConverter (3 tests):
Categorization (3 tests):
Skill Building (3 tests):
Code Block Handling (2 tests):
Image Handling (2 tests):
Error Handling (3 tests):
JSON Workflow (2 tests):
Example Test:
def test_build_skill_creates_structure(self):
"""Test that build_skill creates required directory structure"""
converter = self.PDFToSkillConverter(
name="test_skill",
pdf_path="test.pdf",
output_dir=self.temp_dir
)
converter.extracted_data = {
"pages": [{"page_number": 1, "text": "Test", "code_blocks": [], "images": []}],
"total_pages": 1
}
converter.categories = {"test": [converter.extracted_data["pages"][0]]}
converter.build_skill()
skill_dir = Path(self.temp_dir) / "test_skill"
self.assertTrue(skill_dir.exists())
self.assertTrue((skill_dir / "references").exists())
self.assertTrue((skill_dir / "scripts").exists())
self.assertTrue((skill_dir / "assets").exists())
Running:
python3 -m pytest tests/test_pdf_scraper.py -v
test_pdf_advanced_features.py) NEWTests advanced PDF features (Priority 2 & 3).
Note: These tests require PyMuPDF (pip install PyMuPDF). OCR tests also require pytesseract and Pillow. They will be skipped if not installed.
Test Categories:
OCR Support (5 tests):
Password Protection (4 tests):
Table Extraction (5 tests):
Caching (5 tests):
Parallel Processing (4 tests):
Integration (3 tests):
Example Test:
def test_table_extraction_basic(self):
"""Test basic table extraction"""
extractor = self.PDFExtractor.__new__(self.PDFExtractor)
extractor.extract_tables = True
extractor.verbose = False
# Create mock table
mock_table = Mock()
mock_table.extract.return_value = [
["Header 1", "Header 2", "Header 3"],
["Data 1", "Data 2", "Data 3"]
]
mock_table.bbox = (0, 0, 100, 100)
mock_tables = Mock()
mock_tables.tables = [mock_table]
mock_page = Mock()
mock_page.find_tables.return_value = mock_tables
tables = extractor.extract_tables_from_page(mock_page)
self.assertEqual(len(tables), 1)
self.assertEqual(tables[0]['row_count'], 2)
self.assertEqual(tables[0]['col_count'], 3)
Running:
python3 -m pytest tests/test_pdf_advanced_features.py -v
The custom test runner (run_tests.py) provides:
======================================================================
TEST SUMMARY
======================================================================
Total Tests: 70
✓ Passed: 68
✗ Failed: 2
⊘ Skipped: 0
Success Rate: 97.1%
Test Breakdown by Category:
TestConfigValidation: 28/30 passed
TestURLValidation: 6/6 passed
TestLanguageDetection: 10/10 passed
TestPatternExtraction: 3/3 passed
TestCategorization: 5/5 passed
TestDryRunMode: 3/3 passed
TestConfigLoading: 4/4 passed
TestRealConfigFiles: 6/6 passed
TestContentExtraction: 3/3 passed
======================================================================
# Verbose output (show each test name)
python3 run_tests.py -v
# Quiet output (minimal)
python3 run_tests.py -q
# Stop on first failure
python3 run_tests.py --failfast
# Run specific suite
python3 run_tests.py --suite config
# List all tests
python3 run_tests.py --list
python3 -m unittest tests.test_config_validation
python3 -m unittest tests.test_scraper_features
python3 -m unittest tests.test_integration
python3 -m unittest tests.test_config_validation.TestConfigValidation
python3 -m unittest tests.test_scraper_features.TestLanguageDetection
python3 -m unittest tests.test_config_validation.TestConfigValidation.test_valid_complete_config
python3 -m unittest tests.test_scraper_features.TestLanguageDetection.test_detect_python_from_heuristics
| Component | Tests | Coverage |
|---|---|---|
| Config Validation | 30+ | 100% |
| URL Validation | 6 | 95% |
| Language Detection | 10 | 90% |
| Pattern Extraction | 3 | 85% |
| Categorization | 5 | 90% |
| Text Cleaning | 4 | 100% |
| Dry-Run Mode | 3 | 100% |
| Config Loading | 4 | 95% |
| Real Configs | 6 | 100% |
| Content Extraction | 3 | 80% |
| PDF Extraction | 23 | 90% |
| PDF Workflow | 18 | 85% |
| PDF Advanced Features | 26 | 95% |
Total: 142 tests (75 passing + 67 PDF tests)
Note: PDF tests (67 total) require PyMuPDF and will be skipped if not installed. When PyMuPDF is available, all 142 tests run.
enhance_skill.py, enhance_skill_local.py)package_skill.py)#!/usr/bin/env python3
"""
Test suite for [feature name]
Tests [description of what's being tested]
"""
import sys
import os
import unittest
# Add parent directory to path
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from doc_scraper import DocToSkillConverter
class TestYourFeature(unittest.TestCase):
"""Test [feature] functionality"""
def setUp(self):
"""Set up test fixtures"""
self.config = {
'name': 'test',
'base_url': 'https://example.com/',
'selectors': {
'main_content': 'article',
'title': 'h1',
'code_blocks': 'pre code'
},
'rate_limit': 0.1,
'max_pages': 10
}
self.converter = DocToSkillConverter(self.config, dry_run=True)
def tearDown(self):
"""Clean up after tests"""
pass
def test_your_feature(self):
"""Test description"""
# Arrange
test_input = "something"
# Act
result = self.converter.some_method(test_input)
# Assert
self.assertEqual(result, expected_value)
if __name__ == '__main__':
unittest.main()
test_valid_name_formats not test1name: Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: '3.7'
- run: pip install requests beautifulsoup4
- run: python3 run_tests.py
# Make sure you're in the repository root
cd /path/to/Skill_Seekers
# Run tests from root directory
python3 run_tests.py
# Clean up test artifacts
rm -rf output/test-*
# Make sure tests use dry_run=True
# Check test setUp methods
# Run only that test with verbose output
python3 -m unittest tests.test_config_validation.TestConfigValidation.test_name -v
# Check the error message carefully
# Verify test expectations match implementation
Test execution times:
When adding new features:
Run tests before committing:
python3 run_tests.py
Aim for >80% coverage for new code
✅ 142 comprehensive tests covering all major features (75 + 67 PDF) ✅ PDF support testing with 67 tests for B1 tasks + Priority 2 & 3 ✅ Colored test runner with detailed summaries ✅ Fast execution (~1 second for full suite) ✅ Easy to extend with clear patterns and templates ✅ Good coverage of critical paths
PDF Tests Status:
Advanced PDF Features Tested:
Run tests frequently to catch bugs early! 🚀