This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Version: v2.1.1 (Production Ready - GitHub Analysis Enhanced!) Active Development: Flexible, incremental task-based approach
๐ MAJOR MILESTONE: Published on PyPI! (v2.0.0)
pip install skill-seekers - https://pypi.org/project/skill-seekers/skill-seekers command with Git-style subcommands๐ Unified Multi-Source Scraping (v2.0.0)
โ Community Response (H1 Group):
๐ฆ Configs Status:
๐ Completed (November 29, 2025):
๐ Next Up (Post-v2.1.0):
๐ Roadmap Progress:
This repository includes a fully tested MCP server with 9 tools:
mcp__skill-seeker__list_configs - List all available preset configurationsmcp__skill-seeker__generate_config - Generate a new config file for any docs sitemcp__skill-seeker__validate_config - Validate a config file structuremcp__skill-seeker__estimate_pages - Estimate page count before scrapingmcp__skill-seeker__scrape_docs - Scrape and build a skillmcp__skill-seeker__package_skill - Package skill into .zip file (with auto-upload)mcp__skill-seeker__upload_skill - Upload .zip to Claude (NEW)mcp__skill-seeker__split_config - Split large documentation configsmcp__skill-seeker__generate_router - Generate router/hub skillsSetup: See docs/MCP_SETUP.md or run ./setup_mcp.sh
Status: โ Tested and working in production with Claude Code
Skill Seeker automatically converts any documentation website into a Claude AI skill. It scrapes documentation, organizes content, extracts code patterns, and packages everything into an uploadable .zip file for Claude.
Python Version: Python 3.10 or higher (required for MCP integration)
Installation:
# Install globally or in virtual environment
pip install skill-seekers
# Use the unified CLI immediately
skill-seekers scrape --config configs/react.json
skill-seekers --help
# Clone the repository
git clone https://github.com/yusufkaraaslan/Skill_Seekers.git
cd Skill_Seekers
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # macOS/Linux (Windows: venv\Scripts\activate)
# Install in editable mode
pip install -e .
# Or install dependencies manually
pip install -r requirements.txt
Why use a virtual environment?
Optional (for API-based enhancement):
pip install anthropic
export ANTHROPIC_API_KEY=sk-ant-...
# Single-source scraping (documentation only)
skill-seekers scrape --config configs/godot.json
skill-seekers scrape --config configs/react.json
skill-seekers scrape --config configs/vue.json
skill-seekers scrape --config configs/django.json
skill-seekers scrape --config configs/laravel.json
skill-seekers scrape --config configs/fastapi.json
# Combine documentation + GitHub + PDF in one skill
skill-seekers unified --config configs/react_unified.json
skill-seekers unified --config configs/django_unified.json
skill-seekers unified --config configs/fastapi_unified.json
skill-seekers unified --config configs/godot_unified.json
# Override merge mode
skill-seekers unified --config configs/react_unified.json --merge-mode claude-enhanced
# Result: One comprehensive skill with conflict detection
What makes it special:
See full guide: docs/UNIFIED_SCRAPING.md
# 1. Install from PyPI (one-time, easiest!)
pip install skill-seekers
# 2. Estimate page count BEFORE scraping (fast, no data download)
skill-seekers estimate configs/godot.json
# Time: ~1-2 minutes, shows estimated total pages and recommended max_pages
# 3. Scrape with local enhancement (uses Claude Code Max, no API key)
skill-seekers scrape --config configs/godot.json --enhance-local
# Time: 20-40 minutes scraping + 60 seconds enhancement
# 4. Package the skill
skill-seekers package output/godot/
# Result: godot.zip ready to upload to Claude
# Step-by-step configuration wizard
skill-seekers scrape --interactive
# Create skill from any documentation URL
skill-seekers scrape --name react --url https://react.dev/ --description "React framework for UIs"
# Fast rebuild using previously scraped data
skill-seekers scrape --config configs/godot.json --skip-scrape
# Time: 1-3 minutes (instant rebuild)
# Enable async mode with 8 workers for best performance
skill-seekers scrape --config configs/react.json --async --workers 8
# Quick mode with async
skill-seekers scrape --name react --url https://react.dev/ --async --workers 8
# Dry run with async to test
skill-seekers scrape --config configs/godot.json --async --workers 4 --dry-run
Recommended Settings:
--async --workers 4--async --workers 8--async --workers 8 --no-rate-limitPerformance:
See full guide: ASYNC_SUPPORT.md
LOCAL Enhancement (Recommended - No API Key Required):
# During scraping
skill-seekers scrape --config configs/react.json --enhance-local
# Standalone after scraping
skill-seekers enhance output/react/
API Enhancement (Alternative - Requires API Key):
# During scraping
skill-seekers scrape --config configs/react.json --enhance
# Standalone after scraping
skill-seekers-enhance output/react/
skill-seekers-enhance output/react/ --api-key sk-ant-...
# Package skill (opens folder, shows upload instructions)
skill-seekers package output/godot/
# Result: output/godot.zip
# Package and auto-upload (requires ANTHROPIC_API_KEY)
export ANTHROPIC_API_KEY=sk-ant-...
skill-seekers package output/godot/ --upload
# Upload existing .zip
skill-seekers upload output/godot.zip
# Package without opening folder
skill-seekers package output/godot/ --no-open
# Delete cached data and re-scrape from scratch
rm -rf output/godot_data/
skill-seekers scrape --config configs/godot.json
# Quick estimation - discover up to 100 pages
skill-seekers estimate configs/react.json --max-discovery 100
# Time: ~30-60 seconds
# Full estimation - discover up to 1000 pages (default)
skill-seekers estimate configs/godot.json
# Time: ~1-2 minutes
# Deep estimation - discover up to 2000 pages
skill-seekers estimate configs/vue.json --max-discovery 2000
# Time: ~3-5 minutes
# What it shows:
# - Estimated total pages
# - Recommended max_pages value
# - Estimated scraping time
# - Discovery rate (pages/sec)
Why use estimation:
max_pages valueSkill_Seekers/
โโโ pyproject.toml # Modern Python package configuration (PEP 621)
โโโ src/ # Source code (src/ layout best practice)
โ โโโ skill_seekers/
โ โโโ __init__.py
โ โโโ cli/ # CLI tools (entry points)
โ โ โโโ doc_scraper.py # Main scraper (~790 lines)
โ โ โโโ estimate_pages.py # Page count estimator
โ โ โโโ enhance_skill.py # AI enhancement (API-based)
โ โ โโโ package_skill.py # Skill packager
โ โ โโโ github_scraper.py # GitHub scraper
โ โ โโโ pdf_scraper.py # PDF scraper
โ โ โโโ unified_scraper.py # Unified multi-source scraper
โ โ โโโ merge_sources.py # Source merger
โ โ โโโ conflict_detector.py # Conflict detection
โ โโโ mcp/ # MCP server integration
โ โโโ server.py
โโโ tests/ # Test suite (391 tests passing)
โ โโโ test_scraper_features.py
โ โโโ test_config_validation.py
โ โโโ test_integration.py
โ โโโ test_mcp_server.py
โ โโโ test_unified.py # Unified scraping tests (18 tests)
โ โโโ test_unified_mcp_integration.py # (4 tests)
โ โโโ ...
โโโ configs/ # Preset configurations (24 configs)
โ โโโ godot.json
โ โโโ react.json
โ โโโ django_unified.json # Multi-source configs
โ โโโ ...
โโโ docs/ # Documentation
โ โโโ CLAUDE.md # This file
โ โโโ ENHANCEMENT.md # Enhancement guide
โ โโโ UPLOAD_GUIDE.md # Upload instructions
โ โโโ UNIFIED_SCRAPING.md # Unified scraping guide
โโโ README.md # User documentation
โโโ CHANGELOG.md # Release history
โโโ FUTURE_RELEASES.md # Roadmap
โโโ output/ # Generated output (git-ignored)
โโโ {name}_data/ # Scraped raw data (cached)
โ โโโ pages/*.json # Individual page data
โ โโโ summary.json # Scraping summary
โโโ {name}/ # Built skill directory
โโโ SKILL.md # Main skill file
โโโ SKILL.md.backup # Backup (if enhanced)
โโโ references/ # Categorized documentation
โ โโโ index.md
โ โโโ getting_started.md
โ โโโ api.md
โ โโโ ...
โโโ scripts/ # Empty (user scripts)
โโโ assets/ # Empty (user assets)
Key Changes in v2.0.0:
skill-seekers CLI with subcommandspip install skill-seekersScrape Phase (scrape_all() in src/skill_seekers/cli/doc_scraper.py):
output/{name}_data/pages/*.json + summary.jsonBuild Phase (build_skill() in src/skill_seekers/cli/doc_scraper.py):
output/{name}_data/output/{name}/SKILL.md + output/{name}/references/*.mdEnhancement Phase (optional via enhance_skill.py or enhance_skill_local.py):
Package Phase (via package_skill.py):
{name}.zipUpload Phase (optional via upload_skill.py):
Config files (configs/*.json) define scraping behavior:
{
"name": "godot",
"description": "When to use this skill",
"base_url": "https://docs.godotengine.org/en/stable/",
"selectors": {
"main_content": "div[role='main']",
"title": "title",
"code_blocks": "pre"
},
"url_patterns": {
"include": [],
"exclude": ["/search.html", "/_static/"]
},
"categories": {
"getting_started": ["introduction", "getting_started"],
"scripting": ["scripting", "gdscript"],
"api": ["api", "reference", "class"]
},
"rate_limit": 0.5,
"max_pages": 500
}
Config Parameters:
name: Skill identifier (output directory name)description: When Claude should use this skillbase_url: Starting URL for scrapingselectors.main_content: CSS selector for main content (common: article, main, div[role="main"])selectors.title: CSS selector for page titleselectors.code_blocks: CSS selector for code samplesurl_patterns.include: Only scrape URLs containing these patternsurl_patterns.exclude: Skip URLs containing these patternscategories: Keyword mapping for categorizationrate_limit: Delay between requests (seconds)max_pages: Maximum pages to scrapeskip_llms_txt: Skip llms.txt detection, force HTML scraping (default: false)exclude_dirs_additional: Add custom directories to default exclusions (for local repo analysis)exclude_dirs: Replace default directory exclusions entirely (advanced, for local repo analysis)Tool checks for output/{name}_data/ and prompts to reuse, avoiding re-scraping (check_existing_data() in doc_scraper.py:653-660).
When using local_repo_path for unlimited local repository analysis, you can customize which directories to exclude from analysis.
Smart Defaults:
Automatically excludes common directories: venv, node_modules, __pycache__, .git, build, dist, .pytest_cache, htmlcov, .tox, .mypy_cache, etc.
Extend Mode (exclude_dirs_additional): Add custom exclusions to defaults
{
"sources": [{
"type": "github",
"local_repo_path": "/path/to/repo",
"exclude_dirs_additional": ["proprietary", "legacy", "third_party"]
}]
}
Replace Mode (exclude_dirs): Override defaults entirely (advanced)
{
"sources": [{
"type": "github",
"local_repo_path": "/path/to/repo",
"exclude_dirs": ["node_modules", ".git", "custom_vendor"]
}]
}
Use Cases:
See: should_exclude_dir() in github_scraper.py:304-306
Detects code languages from:
language-*, lang-*)def, const, func, etc.)See: detect_language() in doc_scraper.py:135-165
Looks for "Example:", "Pattern:", "Usage:" markers in content and extracts following code blocks (up to 5 per page).
See: extract_patterns() in doc_scraper.py:167-183
See: smart_categorize() and infer_categories() in doc_scraper.py:282-351
Generated with:
See: create_enhanced_skill_md() in doc_scraper.py:426-542
# 1. Scrape + Build + AI Enhancement (LOCAL, no API key)
skill-seekers scrape --config configs/godot.json --enhance-local
# 2. Wait for enhancement terminal to close (~60 seconds)
# 3. Verify quality
cat output/godot/SKILL.md
# 4. Package
skill-seekers package output/godot/
# Result: godot.zip ready for Claude
# Time: 20-40 minutes (scraping) + 60 seconds (enhancement)
# 1. Use existing data + Local Enhancement
skill-seekers scrape --config configs/godot.json --skip-scrape
skill-seekers enhance output/godot/
# 2. Package
skill-seekers package output/godot/
# Time: 1-3 minutes (build) + 60 seconds (enhancement)
# 1. Scrape + Build (no enhancement)
skill-seekers scrape --config configs/godot.json
# 2. Package
skill-seekers package output/godot/
# Note: SKILL.md will be basic template - enhancement recommended
# Time: 20-40 minutes
Option 1: Interactive
skill-seekers scrape --interactive
# Follow prompts, it creates the config for you
Option 2: Copy and Modify
# Copy a preset
cp configs/react.json configs/myframework.json
# Edit it
nano configs/myframework.json
# Test with limited pages first
# Set "max_pages": 20 in config
# Use it
skill-seekers scrape --config configs/myframework.json
Before creating a config, test selectors with BeautifulSoup:
from bs4 import BeautifulSoup
import requests
url = "https://docs.example.com/page"
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
# Try different selectors
print(soup.select_one('article'))
print(soup.select_one('main'))
print(soup.select_one('div[role="main"]'))
print(soup.select_one('div.content'))
# Test code block selector
print(soup.select('pre code'))
print(soup.select('pre'))
After building, verify the skill quality:
# Check SKILL.md has real examples
cat output/godot/SKILL.md
# Check category structure
cat output/godot/references/index.md
# List all reference files
ls output/godot/references/
# Check specific category content
cat output/godot/references/getting_started.md
# Verify code samples have language detection
grep -A 3 "```" output/godot/references/*.md | head -20
For faster testing, edit config to limit pages:
{
"max_pages": 20 // Test with just 20 pages
}
Problem: Pages scraped but content is empty
Solution: Check main_content selector in config. Try:
articlemaindiv[role="main"]div.contentUse the BeautifulSoup testing approach above to find the right selector.
Problem: Pages not categorized well
Solution: Edit categories section in config with better keywords specific to the documentation structure. Check URL patterns in scraped data:
# See what URLs were scraped
cat output/godot_data/summary.json | grep url | head -20
Problem: Tool won't reuse existing data
Solution: Force re-scrape:
rm -rf output/myframework_data/
skill-seekers scrape --config configs/myframework.json
Problem: Getting rate limited or blocked by documentation server
Solution: Increase rate_limit value in config:
{
"rate_limit": 1.0 // Change from 0.5 to 1.0 seconds
}
Problem: doc_scraper.py shows wrong cli/package_skill.py path
Expected output:
skill-seekers package output/godot/
Not:
python3 /mnt/skills/examples/skill-creator/scripts/cli/package_skill.py output/godot/
The correct command uses the local cli/package_skill.py in the repository root.
Documentation Scraper (src/skill_seekers/cli/doc_scraper.py):
is_valid_url()extract_content()detect_language()extract_patterns()smart_categorize()infer_categories()generate_quick_reference()create_enhanced_skill_md()scrape_all()main()Other Key Files:
src/skill_seekers/cli/github_scraper.pysrc/skill_seekers/cli/pdf_scraper.pysrc/skill_seekers/cli/unified_scraper.pysrc/skill_seekers/cli/conflict_detector.pysrc/skill_seekers/cli/merge_sources.pysrc/skill_seekers/cli/package_skill.pysrc/skill_seekers/cli/upload_skill.pysrc/skill_seekers/mcp/server.pypyproject.toml (project.scripts section)What Enhancement Does:
| Task | Time | Notes |
|---|---|---|
| Scraping | 15-45 min | First time only |
| Building | 1-3 min | Fast! |
| Re-building | <1 min | With --skip-scrape |
| Enhancement (LOCAL) | 30-60 sec | Uses Claude Code Max |
| Enhancement (API) | 20-40 sec | Requires API key |
| Packaging | 5-10 sec | Final zip |
Web Frameworks:
react.json - React (article selector, 7,102 chars)vue.json - Vue.js (main selector, 1,029 chars)astro.json - Astro (article selector, 145 chars)django.json - Django (article selector, 6,468 chars)laravel.json - Laravel 9.x (#main-content selector, 16,131 chars)fastapi.json - FastAPI (article selector, 11,906 chars)hono.json - Hono web framework NEW!DevOps & Automation:
ansible-core.json - Ansible Core 2.19 (div[role='main'] selector, ~32K chars)kubernetes.json - Kubernetes (main selector, 2,100 chars)Game Engines:
godot.json - Godot (div[role='main'] selector, 1,688 chars)godot-large-example.json - Godot large docs exampleCSS & Utilities:
tailwind.json - Tailwind CSS (div.prose selector, 195 chars)Gaming:
steam-economy-complete.json - Steam Economy (div.documentation_bbcode, 588 chars)Development Tools:
claude-code.json - Claude Code documentation NEW!react_unified.json - React (docs + GitHub + code analysis)django_unified.json - Django (docs + GitHub + code analysis)fastapi_unified.json - FastAPI (docs + GitHub + code analysis)fastapi_unified_test.json - FastAPI test configgodot_unified.json - Godot (docs + GitHub + code analysis)godot_github.json - GitHub-only scraping examplereact_github.json - GitHub-only scraping examplepython-tutorial-test.json - Python tutorial testexample_pdf.json - PDF extraction exampletest-manual.json - Manual testing configNote: All configs verified and working! Unified configs fully tested with 22 passing tests. Last verified: November 29, 2025 (Post-v2.1.0 bug fixes)
User Guides:
Technical Documentation:
Project Planning:
Project Status (v2.0.0):
pip install skill-seekersskill-seekers command with Git-style subcommandsArchitecture:
src/skill_seekers/cli/doc_scraper.py (~790 lines)Development Workflow:
pip install -e . (editable mode for development)pytest tests/ (391 tests)uv build or python -m builduv publish (PyPI)Key Points:
output/ (git-ignored)pip install -e . to install package before running tests