USAGE.md 18 KB

Complete Usage Guide for Skill Seeker

Comprehensive reference for all commands, options, and workflows.

Table of Contents


Quick Reference

# 1. Estimate pages (fast, 1-2 min)
python3 cli/estimate_pages.py configs/react.json

# 2. Scrape documentation (20-40 min)
python3 cli/doc_scraper.py --config configs/react.json

# 3. Enhance with Claude Code (60 sec)
python3 cli/enhance_skill_local.py output/react/

# 4. Package to .zip (instant)
python3 cli/package_skill.py output/react/

# 5. Test everything (1 sec)
python3 cli/run_tests.py

Main Tool: doc_scraper.py

Full Help

usage: doc_scraper.py [-h] [--interactive] [--config CONFIG] [--name NAME]
                      [--url URL] [--description DESCRIPTION] [--skip-scrape]
                      [--dry-run] [--enhance] [--enhance-local]
                      [--api-key API_KEY]

Convert documentation websites to Claude skills

options:
  -h, --help            Show this help message and exit
  --interactive, -i     Interactive configuration mode
  --config, -c CONFIG   Load configuration from file (e.g., configs/godot.json)
  --name NAME           Skill name
  --url URL             Base documentation URL
  --description, -d DESCRIPTION
                        Skill description
  --skip-scrape         Skip scraping, use existing data
  --dry-run             Preview what will be scraped without actually scraping
  --enhance             Enhance SKILL.md using Claude API after building
                        (requires API key)
  --enhance-local       Enhance SKILL.md using Claude Code in new terminal
                        (no API key needed)
  --api-key API_KEY     Anthropic API key for --enhance (or set ANTHROPIC_API_KEY)

Usage Examples

1. Use Preset Config (Recommended)

python3 cli/doc_scraper.py --config configs/godot.json
python3 cli/doc_scraper.py --config configs/react.json
python3 cli/doc_scraper.py --config configs/vue.json
python3 cli/doc_scraper.py --config configs/django.json
python3 cli/doc_scraper.py --config configs/fastapi.json

2. Interactive Mode

python3 cli/doc_scraper.py --interactive
# Wizard walks you through:
# - Skill name
# - Base URL
# - Description
# - Selectors (optional)
# - URL patterns (optional)
# - Rate limit
# - Max pages

3. Quick Mode (Minimal)

python3 cli/doc_scraper.py \
  --name react \
  --url https://react.dev/ \
  --description "React framework for building UIs"

4. Dry-Run (Preview)

python3 cli/doc_scraper.py --config configs/react.json --dry-run
# Shows what will be scraped without downloading data
# No directories created
# Fast validation

5. Skip Scraping (Use Cached Data)

python3 cli/doc_scraper.py --config configs/godot.json --skip-scrape
# Uses existing output/godot_data/
# Fast rebuild (1-3 minutes)
# Useful for testing changes

6. With Local Enhancement

python3 cli/doc_scraper.py --config configs/react.json --enhance-local
# Scrapes + enhances in one command
# Opens new terminal for Claude Code
# No API key needed

7. With API Enhancement

export ANTHROPIC_API_KEY=sk-ant-...
python3 cli/doc_scraper.py --config configs/react.json --enhance

# Or with inline API key:
python3 cli/doc_scraper.py --config configs/react.json --enhance --api-key sk-ant-...

Output Structure

output/
├── {name}_data/              # Scraped raw data (cached)
│   ├── pages/
│   │   ├── page_0.json
│   │   ├── page_1.json
│   │   └── ...
│   └── summary.json          # Scraping stats
│
└── {name}/                   # Built skill directory
    ├── SKILL.md              # Main skill file
    ├── SKILL.md.backup       # Backup (if enhanced)
    ├── references/           # Categorized docs
    │   ├── index.md
    │   ├── getting_started.md
    │   ├── api.md
    │   └── ...
    ├── scripts/              # Empty (user scripts)
    └── assets/               # Empty (user assets)

Estimator: estimate_pages.py

Full Help

usage: estimate_pages.py [-h] [--max-discovery MAX_DISCOVERY]
                         [--timeout TIMEOUT]
                         config

Estimate page count for Skill Seeker configs

positional arguments:
  config                Path to config JSON file

options:
  -h, --help            Show this help message and exit
  --max-discovery, -m MAX_DISCOVERY
                        Maximum pages to discover (default: 1000)
  --timeout, -t TIMEOUT
                        HTTP request timeout in seconds (default: 30)

Usage Examples

1. Quick Estimate (100 pages)

python3 cli/estimate_pages.py configs/react.json --max-discovery 100
# Time: ~30-60 seconds
# Good for: Quick validation

2. Standard Estimate (1000 pages - default)

python3 cli/estimate_pages.py configs/godot.json
# Time: ~1-2 minutes
# Good for: Most use cases

3. Deep Estimate (2000 pages)

python3 cli/estimate_pages.py configs/vue.json --max-discovery 2000
# Time: ~3-5 minutes
# Good for: Large documentation sites

4. Custom Timeout

python3 cli/estimate_pages.py configs/django.json --timeout 60
# Useful for slow servers

Output Example

🔍 Estimating pages for: react
📍 Base URL: https://react.dev/
🎯 Start URLs: 6
⏱️  Rate limit: 0.5s
🔢 Max discovery: 1000

⏳ Discovered: 180 pages (1.3 pages/sec)

======================================================================
📊 ESTIMATION RESULTS
======================================================================

Config: react
Base URL: https://react.dev/

✅ Pages Discovered: 180
⏳ Pages Pending: 50
📈 Estimated Total: 230

⏱️  Time Elapsed: 140.5s
⚡ Discovery Rate: 1.28 pages/sec

======================================================================
💡 RECOMMENDATIONS
======================================================================

✅ Current max_pages (300) is sufficient

⏱️  Estimated full scrape time: 1.9 minutes
   (Based on rate_limit: 0.5s)

What It Shows:

  • Estimated total pages to scrape
  • Whether current max_pages is sufficient
  • Recommended max_pages value
  • Estimated scraping time
  • Discovery rate (pages/sec)

Enhancement Tools

enhance_skill_local.py (Recommended)

No API key needed - uses Claude Code Max plan

# Usage
python3 cli/enhance_skill_local.py output/react/
python3 cli/enhance_skill_local.py output/godot/

# What it does:
# 1. Reads SKILL.md and references/
# 2. Opens new terminal with Claude Code
# 3. Claude enhances SKILL.md
# 4. Backs up original to SKILL.md.backup
# 5. Saves enhanced version

# Time: ~60 seconds
# Cost: Free (uses your Claude Code Max plan)

enhance_skill.py (Alternative)

Requires Anthropic API key

# Install dependency first
pip3 install anthropic

# Usage with environment variable
export ANTHROPIC_API_KEY=sk-ant-...
python3 cli/enhance_skill.py output/react/

# Usage with inline API key
python3 cli/enhance_skill.py output/godot/ --api-key sk-ant-...

# What it does:
# 1. Reads SKILL.md and references/
# 2. Calls Claude API (Sonnet 4)
# 3. Enhances SKILL.md
# 4. Backs up original to SKILL.md.backup
# 5. Saves enhanced version

# Time: ~30-60 seconds
# Cost: ~$0.01-0.10 per skill (depending on size)

Packaging Tool

package_skill.py

# Usage
python3 cli/package_skill.py output/react/
python3 cli/package_skill.py output/godot/

# What it does:
# 1. Validates SKILL.md exists
# 2. Creates .zip with all skill files
# 3. Saves to output/{name}.zip

# Output:
# output/react.zip
# output/godot.zip

# Time: Instant

Testing Tools

run_tests.py

# Run all tests (default)
python3 cli/run_tests.py
# 71 tests, ~1 second

# Verbose output
python3 cli/run_tests.py -v
python3 cli/run_tests.py --verbose

# Quiet output
python3 cli/run_tests.py -q
python3 cli/run_tests.py --quiet

# Stop on first failure
python3 cli/run_tests.py -f
python3 cli/run_tests.py --failfast

# Run specific test suite
python3 cli/run_tests.py --suite config
python3 cli/run_tests.py --suite features
python3 cli/run_tests.py --suite integration

# List all tests
python3 cli/run_tests.py --list

Individual Tests

# Run single test file
python3 -m unittest tests.test_config_validation
python3 -m unittest tests.test_scraper_features
python3 -m unittest tests.test_integration

# Run single test class
python3 -m unittest tests.test_config_validation.TestConfigValidation

# Run single test method
python3 -m unittest tests.test_config_validation.TestConfigValidation.test_valid_complete_config

Available Configs

Preset Configs (Ready to Use)

Config Framework Pages Description
godot.json Godot Engine ~500 Game engine documentation
react.json React ~300 React framework docs
vue.json Vue.js ~250 Vue.js framework docs
django.json Django ~400 Django web framework
fastapi.json FastAPI ~200 FastAPI Python framework
steam-economy-complete.json Steam ~100 Steam Economy API docs

View Config Details

# List all configs
ls configs/

# View config content
cat configs/react.json
python3 -m json.tool configs/godot.json

Config Structure

{
  "name": "react",
  "base_url": "https://react.dev/",
  "description": "React - JavaScript library for building UIs",
  "start_urls": [
    "https://react.dev/learn",
    "https://react.dev/reference/react",
    "https://react.dev/reference/react-dom"
  ],
  "selectors": {
    "main_content": "article",
    "title": "h1",
    "code_blocks": "pre code"
  },
  "url_patterns": {
    "include": ["/learn/", "/reference/"],
    "exclude": ["/blog/", "/community/"]
  },
  "categories": {
    "getting_started": ["learn", "tutorial", "intro"],
    "api": ["reference", "api", "hooks"],
    "guides": ["guide"]
  },
  "rate_limit": 0.5,
  "max_pages": 300
}

Common Workflows

Workflow 1: Use Preset (Fastest)

# 1. Estimate (optional, 1-2 min)
python3 cli/estimate_pages.py configs/react.json

# 2. Scrape with local enhancement (25 min)
python3 cli/doc_scraper.py --config configs/react.json --enhance-local

# 3. Package (instant)
python3 cli/package_skill.py output/react/

# Result: output/react.zip
# Upload to Claude!

Workflow 2: Custom Documentation

# 1. Create config
cat > configs/my-docs.json << 'EOF'
{
  "name": "my-docs",
  "base_url": "https://docs.example.com/",
  "description": "My documentation site",
  "rate_limit": 0.5,
  "max_pages": 200
}
EOF

# 2. Estimate
python3 cli/estimate_pages.py configs/my-docs.json

# 3. Dry-run test
python3 cli/doc_scraper.py --config configs/my-docs.json --dry-run

# 4. Full scrape
python3 cli/doc_scraper.py --config configs/my-docs.json

# 5. Enhance
python3 cli/enhance_skill_local.py output/my-docs/

# 6. Package
python3 cli/package_skill.py output/my-docs/

Workflow 3: Interactive Mode

# 1. Start interactive wizard
python3 cli/doc_scraper.py --interactive

# 2. Answer prompts:
#    - Name: my-framework
#    - URL: https://framework.dev/
#    - Description: My favorite framework
#    - Selectors: (uses defaults)
#    - Rate limit: 0.5
#    - Max pages: 100

# 3. Enhance
python3 cli/enhance_skill_local.py output/my-framework/

# 4. Package
python3 cli/package_skill.py output/my-framework/

Workflow 4: Quick Mode

python3 cli/doc_scraper.py \
  --name vue \
  --url https://vuejs.org/ \
  --description "Vue.js framework" \
  --enhance-local

Workflow 5: Rebuild from Cache

# Already scraped once?
# Skip re-scraping, just rebuild
python3 cli/doc_scraper.py --config configs/godot.json --skip-scrape

# Try new enhancement
python3 cli/enhance_skill_local.py output/godot/

# Re-package
python3 cli/package_skill.py output/godot/

Workflow 6: Testing New Config

# 1. Create test config with low max_pages
cat > configs/test.json << 'EOF'
{
  "name": "test-site",
  "base_url": "https://docs.test.com/",
  "max_pages": 20,
  "rate_limit": 0.1
}
EOF

# 2. Estimate
python3 cli/estimate_pages.py configs/test.json --max-discovery 50

# 3. Dry-run
python3 cli/doc_scraper.py --config configs/test.json --dry-run

# 4. Small scrape
python3 cli/doc_scraper.py --config configs/test.json

# 5. Validate output
ls output/test-site/
ls output/test-site/references/

# 6. If good, increase max_pages and re-run

Troubleshooting

Issue: "Rate limit exceeded"

# Increase rate_limit in config
# Default: 0.5 seconds
# Conservative: 1.0 seconds
# Very conservative: 2.0 seconds

# Edit config:
{
  "rate_limit": 1.0
}

Issue: "Too many pages"

# Estimate first
python3 cli/estimate_pages.py configs/my-config.json

# Set max_pages based on estimate
# Add buffer: estimated + 50

# Edit config:
{
  "max_pages": 350  # for 300 estimated
}

Issue: "No content extracted"

# Wrong selectors
# Test selectors manually:
curl -s https://docs.example.com/ | grep -i 'article\|main\|content'

# Common selectors:
"main_content": "article"
"main_content": "main"
"main_content": ".content"
"main_content": "#main-content"
"main_content": "div[role=\"main\"]"

# Update config with correct selector

Issue: "Tests failing"

# Run specific failing test
python3 -m unittest tests.test_config_validation.TestConfigValidation.test_name -v

# Check error message
# Verify expectations match implementation

Issue: "Enhancement fails"

# Local enhancement:
# Make sure Claude Code is running
# Check terminal output

# API enhancement:
# Verify API key is set:
echo $ANTHROPIC_API_KEY

# Or use inline:
python3 cli/enhance_skill.py output/react/ --api-key sk-ant-...

Issue: "Package fails"

# Verify SKILL.md exists
ls output/my-skill/SKILL.md

# If missing, build first:
python3 cli/doc_scraper.py --config configs/my-skill.json --skip-scrape

Issue: "Can't find output"

# Check output directory
ls output/

# Skill data (cached):
ls output/{name}_data/

# Built skill:
ls output/{name}/

# Packaged skill:
ls output/{name}.zip

Advanced Usage

Custom Selectors

{
  "selectors": {
    "main_content": "div.documentation",
    "title": "h1.page-title",
    "code_blocks": "pre.highlight code",
    "navigation": "nav.sidebar"
  }
}

URL Pattern Filtering

{
  "url_patterns": {
    "include": [
      "/docs/",
      "/guide/",
      "/api/",
      "/tutorial/"
    ],
    "exclude": [
      "/blog/",
      "/news/",
      "/community/",
      "/showcase/"
    ]
  }
}

Custom Categories

{
  "categories": {
    "getting_started": ["intro", "tutorial", "quickstart", "installation"],
    "core_concepts": ["concept", "fundamental", "architecture"],
    "api": ["reference", "api", "method", "function"],
    "guides": ["guide", "how-to", "example"],
    "advanced": ["advanced", "expert", "performance"]
  }
}

Multiple Start URLs

{
  "start_urls": [
    "https://docs.example.com/getting-started/",
    "https://docs.example.com/api/",
    "https://docs.example.com/guides/",
    "https://docs.example.com/examples/"
  ]
}

Performance Tips

  1. Estimate first: Save 20-40 minutes by validating config
  2. Use dry-run: Test selectors before full scrape
  3. Cache data: Use --skip-scrape for fast rebuilds
  4. Adjust rate_limit: Balance speed vs politeness
  5. Set appropriate max_pages: Don't scrape more than needed
  6. Use start_urls: Target specific documentation sections
  7. Filter URLs: Use include/exclude patterns
  8. Run tests: Catch issues early

Environment Variables

# Anthropic API key (for API enhancement)
export ANTHROPIC_API_KEY=sk-ant-...

# Optional: Set custom output directory
export SKILL_SEEKER_OUTPUT_DIR=/path/to/output

Exit Codes

  • 0: Success
  • 1: Error (general)
  • 2: Warning (estimation hit limit)

File Locations

Skill_Seekers/
├── doc_scraper.py           # Main tool
├── estimate_pages.py        # Estimator
├── enhance_skill.py         # API enhancement
├── enhance_skill_local.py   # Local enhancement
├── package_skill.py         # Packager
├── run_tests.py             # Test runner
├── configs/                 # Preset configs
├── tests/                   # Test suite
├── docs/                    # Documentation
└── output/                  # Generated output

Getting Help

# Tool-specific help
python3 cli/doc_scraper.py --help
python3 cli/estimate_pages.py --help
python3 cli/run_tests.py --help

# Documentation
cat CLAUDE.md              # Quick reference for Claude Code
cat docs/CLAUDE.md         # Detailed technical docs
cat docs/TESTING.md        # Testing guide
cat docs/USAGE.md          # This file
cat docs/ENHANCEMENT.md    # Enhancement guide
cat docs/UPLOAD_GUIDE.md   # Upload instructions
cat README.md              # Project overview

Summary

Essential Commands:

python3 cli/estimate_pages.py configs/react.json              # Estimate
python3 cli/doc_scraper.py --config configs/react.json        # Scrape
python3 cli/enhance_skill_local.py output/react/              # Enhance
python3 cli/package_skill.py output/react/                    # Package
python3 cli/run_tests.py                                      # Test

Quick Start:

pip3 install requests beautifulsoup4
python3 cli/doc_scraper.py --config configs/react.json --enhance-local
python3 cli/package_skill.py output/react/
# Upload output/react.zip to Claude!

Happy skill creating! 🚀