Comprehensive reference for all commands, options, and workflows.
# 1. Estimate pages (fast, 1-2 min)
python3 cli/estimate_pages.py configs/react.json
# 2. Scrape documentation (20-40 min)
python3 cli/doc_scraper.py --config configs/react.json
# 3. Enhance with Claude Code (60 sec)
python3 cli/enhance_skill_local.py output/react/
# 4. Package to .zip (instant)
python3 cli/package_skill.py output/react/
# 5. Test everything (1 sec)
python3 cli/run_tests.py
usage: doc_scraper.py [-h] [--interactive] [--config CONFIG] [--name NAME]
[--url URL] [--description DESCRIPTION] [--skip-scrape]
[--dry-run] [--enhance] [--enhance-local]
[--api-key API_KEY]
Convert documentation websites to Claude skills
options:
-h, --help Show this help message and exit
--interactive, -i Interactive configuration mode
--config, -c CONFIG Load configuration from file (e.g., configs/godot.json)
--name NAME Skill name
--url URL Base documentation URL
--description, -d DESCRIPTION
Skill description
--skip-scrape Skip scraping, use existing data
--dry-run Preview what will be scraped without actually scraping
--enhance Enhance SKILL.md using Claude API after building
(requires API key)
--enhance-local Enhance SKILL.md using Claude Code in new terminal
(no API key needed)
--api-key API_KEY Anthropic API key for --enhance (or set ANTHROPIC_API_KEY)
1. Use Preset Config (Recommended)
python3 cli/doc_scraper.py --config configs/godot.json
python3 cli/doc_scraper.py --config configs/react.json
python3 cli/doc_scraper.py --config configs/vue.json
python3 cli/doc_scraper.py --config configs/django.json
python3 cli/doc_scraper.py --config configs/fastapi.json
2. Interactive Mode
python3 cli/doc_scraper.py --interactive
# Wizard walks you through:
# - Skill name
# - Base URL
# - Description
# - Selectors (optional)
# - URL patterns (optional)
# - Rate limit
# - Max pages
3. Quick Mode (Minimal)
python3 cli/doc_scraper.py \
--name react \
--url https://react.dev/ \
--description "React framework for building UIs"
4. Dry-Run (Preview)
python3 cli/doc_scraper.py --config configs/react.json --dry-run
# Shows what will be scraped without downloading data
# No directories created
# Fast validation
5. Skip Scraping (Use Cached Data)
python3 cli/doc_scraper.py --config configs/godot.json --skip-scrape
# Uses existing output/godot_data/
# Fast rebuild (1-3 minutes)
# Useful for testing changes
6. With Local Enhancement
python3 cli/doc_scraper.py --config configs/react.json --enhance-local
# Scrapes + enhances in one command
# Opens new terminal for Claude Code
# No API key needed
7. With API Enhancement
export ANTHROPIC_API_KEY=sk-ant-...
python3 cli/doc_scraper.py --config configs/react.json --enhance
# Or with inline API key:
python3 cli/doc_scraper.py --config configs/react.json --enhance --api-key sk-ant-...
output/
├── {name}_data/ # Scraped raw data (cached)
│ ├── pages/
│ │ ├── page_0.json
│ │ ├── page_1.json
│ │ └── ...
│ └── summary.json # Scraping stats
│
└── {name}/ # Built skill directory
├── SKILL.md # Main skill file
├── SKILL.md.backup # Backup (if enhanced)
├── references/ # Categorized docs
│ ├── index.md
│ ├── getting_started.md
│ ├── api.md
│ └── ...
├── scripts/ # Empty (user scripts)
└── assets/ # Empty (user assets)
usage: estimate_pages.py [-h] [--max-discovery MAX_DISCOVERY]
[--timeout TIMEOUT]
config
Estimate page count for Skill Seeker configs
positional arguments:
config Path to config JSON file
options:
-h, --help Show this help message and exit
--max-discovery, -m MAX_DISCOVERY
Maximum pages to discover (default: 1000)
--timeout, -t TIMEOUT
HTTP request timeout in seconds (default: 30)
1. Quick Estimate (100 pages)
python3 cli/estimate_pages.py configs/react.json --max-discovery 100
# Time: ~30-60 seconds
# Good for: Quick validation
2. Standard Estimate (1000 pages - default)
python3 cli/estimate_pages.py configs/godot.json
# Time: ~1-2 minutes
# Good for: Most use cases
3. Deep Estimate (2000 pages)
python3 cli/estimate_pages.py configs/vue.json --max-discovery 2000
# Time: ~3-5 minutes
# Good for: Large documentation sites
4. Custom Timeout
python3 cli/estimate_pages.py configs/django.json --timeout 60
# Useful for slow servers
🔍 Estimating pages for: react
📍 Base URL: https://react.dev/
🎯 Start URLs: 6
⏱️ Rate limit: 0.5s
🔢 Max discovery: 1000
⏳ Discovered: 180 pages (1.3 pages/sec)
======================================================================
📊 ESTIMATION RESULTS
======================================================================
Config: react
Base URL: https://react.dev/
✅ Pages Discovered: 180
⏳ Pages Pending: 50
📈 Estimated Total: 230
⏱️ Time Elapsed: 140.5s
⚡ Discovery Rate: 1.28 pages/sec
======================================================================
💡 RECOMMENDATIONS
======================================================================
✅ Current max_pages (300) is sufficient
⏱️ Estimated full scrape time: 1.9 minutes
(Based on rate_limit: 0.5s)
What It Shows:
max_pages is sufficientmax_pages valueNo API key needed - uses Claude Code Max plan
# Usage
python3 cli/enhance_skill_local.py output/react/
python3 cli/enhance_skill_local.py output/godot/
# What it does:
# 1. Reads SKILL.md and references/
# 2. Opens new terminal with Claude Code
# 3. Claude enhances SKILL.md
# 4. Backs up original to SKILL.md.backup
# 5. Saves enhanced version
# Time: ~60 seconds
# Cost: Free (uses your Claude Code Max plan)
Requires Anthropic API key
# Install dependency first
pip3 install anthropic
# Usage with environment variable
export ANTHROPIC_API_KEY=sk-ant-...
python3 cli/enhance_skill.py output/react/
# Usage with inline API key
python3 cli/enhance_skill.py output/godot/ --api-key sk-ant-...
# What it does:
# 1. Reads SKILL.md and references/
# 2. Calls Claude API (Sonnet 4)
# 3. Enhances SKILL.md
# 4. Backs up original to SKILL.md.backup
# 5. Saves enhanced version
# Time: ~30-60 seconds
# Cost: ~$0.01-0.10 per skill (depending on size)
# Usage
python3 cli/package_skill.py output/react/
python3 cli/package_skill.py output/godot/
# What it does:
# 1. Validates SKILL.md exists
# 2. Creates .zip with all skill files
# 3. Saves to output/{name}.zip
# Output:
# output/react.zip
# output/godot.zip
# Time: Instant
# Run all tests (default)
python3 cli/run_tests.py
# 71 tests, ~1 second
# Verbose output
python3 cli/run_tests.py -v
python3 cli/run_tests.py --verbose
# Quiet output
python3 cli/run_tests.py -q
python3 cli/run_tests.py --quiet
# Stop on first failure
python3 cli/run_tests.py -f
python3 cli/run_tests.py --failfast
# Run specific test suite
python3 cli/run_tests.py --suite config
python3 cli/run_tests.py --suite features
python3 cli/run_tests.py --suite integration
# List all tests
python3 cli/run_tests.py --list
# Run single test file
python3 -m unittest tests.test_config_validation
python3 -m unittest tests.test_scraper_features
python3 -m unittest tests.test_integration
# Run single test class
python3 -m unittest tests.test_config_validation.TestConfigValidation
# Run single test method
python3 -m unittest tests.test_config_validation.TestConfigValidation.test_valid_complete_config
| Config | Framework | Pages | Description |
|---|---|---|---|
godot.json |
Godot Engine | ~500 | Game engine documentation |
react.json |
React | ~300 | React framework docs |
vue.json |
Vue.js | ~250 | Vue.js framework docs |
django.json |
Django | ~400 | Django web framework |
fastapi.json |
FastAPI | ~200 | FastAPI Python framework |
steam-economy-complete.json |
Steam | ~100 | Steam Economy API docs |
# List all configs
ls configs/
# View config content
cat configs/react.json
python3 -m json.tool configs/godot.json
{
"name": "react",
"base_url": "https://react.dev/",
"description": "React - JavaScript library for building UIs",
"start_urls": [
"https://react.dev/learn",
"https://react.dev/reference/react",
"https://react.dev/reference/react-dom"
],
"selectors": {
"main_content": "article",
"title": "h1",
"code_blocks": "pre code"
},
"url_patterns": {
"include": ["/learn/", "/reference/"],
"exclude": ["/blog/", "/community/"]
},
"categories": {
"getting_started": ["learn", "tutorial", "intro"],
"api": ["reference", "api", "hooks"],
"guides": ["guide"]
},
"rate_limit": 0.5,
"max_pages": 300
}
# 1. Estimate (optional, 1-2 min)
python3 cli/estimate_pages.py configs/react.json
# 2. Scrape with local enhancement (25 min)
python3 cli/doc_scraper.py --config configs/react.json --enhance-local
# 3. Package (instant)
python3 cli/package_skill.py output/react/
# Result: output/react.zip
# Upload to Claude!
# 1. Create config
cat > configs/my-docs.json << 'EOF'
{
"name": "my-docs",
"base_url": "https://docs.example.com/",
"description": "My documentation site",
"rate_limit": 0.5,
"max_pages": 200
}
EOF
# 2. Estimate
python3 cli/estimate_pages.py configs/my-docs.json
# 3. Dry-run test
python3 cli/doc_scraper.py --config configs/my-docs.json --dry-run
# 4. Full scrape
python3 cli/doc_scraper.py --config configs/my-docs.json
# 5. Enhance
python3 cli/enhance_skill_local.py output/my-docs/
# 6. Package
python3 cli/package_skill.py output/my-docs/
# 1. Start interactive wizard
python3 cli/doc_scraper.py --interactive
# 2. Answer prompts:
# - Name: my-framework
# - URL: https://framework.dev/
# - Description: My favorite framework
# - Selectors: (uses defaults)
# - Rate limit: 0.5
# - Max pages: 100
# 3. Enhance
python3 cli/enhance_skill_local.py output/my-framework/
# 4. Package
python3 cli/package_skill.py output/my-framework/
python3 cli/doc_scraper.py \
--name vue \
--url https://vuejs.org/ \
--description "Vue.js framework" \
--enhance-local
# Already scraped once?
# Skip re-scraping, just rebuild
python3 cli/doc_scraper.py --config configs/godot.json --skip-scrape
# Try new enhancement
python3 cli/enhance_skill_local.py output/godot/
# Re-package
python3 cli/package_skill.py output/godot/
# 1. Create test config with low max_pages
cat > configs/test.json << 'EOF'
{
"name": "test-site",
"base_url": "https://docs.test.com/",
"max_pages": 20,
"rate_limit": 0.1
}
EOF
# 2. Estimate
python3 cli/estimate_pages.py configs/test.json --max-discovery 50
# 3. Dry-run
python3 cli/doc_scraper.py --config configs/test.json --dry-run
# 4. Small scrape
python3 cli/doc_scraper.py --config configs/test.json
# 5. Validate output
ls output/test-site/
ls output/test-site/references/
# 6. If good, increase max_pages and re-run
# Increase rate_limit in config
# Default: 0.5 seconds
# Conservative: 1.0 seconds
# Very conservative: 2.0 seconds
# Edit config:
{
"rate_limit": 1.0
}
# Estimate first
python3 cli/estimate_pages.py configs/my-config.json
# Set max_pages based on estimate
# Add buffer: estimated + 50
# Edit config:
{
"max_pages": 350 # for 300 estimated
}
# Wrong selectors
# Test selectors manually:
curl -s https://docs.example.com/ | grep -i 'article\|main\|content'
# Common selectors:
"main_content": "article"
"main_content": "main"
"main_content": ".content"
"main_content": "#main-content"
"main_content": "div[role=\"main\"]"
# Update config with correct selector
# Run specific failing test
python3 -m unittest tests.test_config_validation.TestConfigValidation.test_name -v
# Check error message
# Verify expectations match implementation
# Local enhancement:
# Make sure Claude Code is running
# Check terminal output
# API enhancement:
# Verify API key is set:
echo $ANTHROPIC_API_KEY
# Or use inline:
python3 cli/enhance_skill.py output/react/ --api-key sk-ant-...
# Verify SKILL.md exists
ls output/my-skill/SKILL.md
# If missing, build first:
python3 cli/doc_scraper.py --config configs/my-skill.json --skip-scrape
# Check output directory
ls output/
# Skill data (cached):
ls output/{name}_data/
# Built skill:
ls output/{name}/
# Packaged skill:
ls output/{name}.zip
{
"selectors": {
"main_content": "div.documentation",
"title": "h1.page-title",
"code_blocks": "pre.highlight code",
"navigation": "nav.sidebar"
}
}
{
"url_patterns": {
"include": [
"/docs/",
"/guide/",
"/api/",
"/tutorial/"
],
"exclude": [
"/blog/",
"/news/",
"/community/",
"/showcase/"
]
}
}
{
"categories": {
"getting_started": ["intro", "tutorial", "quickstart", "installation"],
"core_concepts": ["concept", "fundamental", "architecture"],
"api": ["reference", "api", "method", "function"],
"guides": ["guide", "how-to", "example"],
"advanced": ["advanced", "expert", "performance"]
}
}
{
"start_urls": [
"https://docs.example.com/getting-started/",
"https://docs.example.com/api/",
"https://docs.example.com/guides/",
"https://docs.example.com/examples/"
]
}
--skip-scrape for fast rebuilds# Anthropic API key (for API enhancement)
export ANTHROPIC_API_KEY=sk-ant-...
# Optional: Set custom output directory
export SKILL_SEEKER_OUTPUT_DIR=/path/to/output
0: Success1: Error (general)2: Warning (estimation hit limit)Skill_Seekers/
├── doc_scraper.py # Main tool
├── estimate_pages.py # Estimator
├── enhance_skill.py # API enhancement
├── enhance_skill_local.py # Local enhancement
├── package_skill.py # Packager
├── run_tests.py # Test runner
├── configs/ # Preset configs
├── tests/ # Test suite
├── docs/ # Documentation
└── output/ # Generated output
# Tool-specific help
python3 cli/doc_scraper.py --help
python3 cli/estimate_pages.py --help
python3 cli/run_tests.py --help
# Documentation
cat CLAUDE.md # Quick reference for Claude Code
cat docs/CLAUDE.md # Detailed technical docs
cat docs/TESTING.md # Testing guide
cat docs/USAGE.md # This file
cat docs/ENHANCEMENT.md # Enhancement guide
cat docs/UPLOAD_GUIDE.md # Upload instructions
cat README.md # Project overview
Essential Commands:
python3 cli/estimate_pages.py configs/react.json # Estimate
python3 cli/doc_scraper.py --config configs/react.json # Scrape
python3 cli/enhance_skill_local.py output/react/ # Enhance
python3 cli/package_skill.py output/react/ # Package
python3 cli/run_tests.py # Test
Quick Start:
pip3 install requests beautifulsoup4
python3 cli/doc_scraper.py --config configs/react.json --enhance-local
python3 cli/package_skill.py output/react/
# Upload output/react.zip to Claude!
Happy skill creating! 🚀