Browse Source

chore: sync local assets for ddd doc steward

tukuaiai 1 month ago
parent
commit
ef7d8f4ad8
100 changed files with 30352 additions and 1 deletions
  1. 196 0
      1
  2. 248 0
      i18n/zh/prompts/02-编程提示词/文档驱动开发/DDD 文档管家 Agent 工业级提示词-low.md
  3. 599 0
      i18n/zh/prompts/02-编程提示词/文档驱动开发/DDD 文档管家 Agent 工业级提示词.md
  4. 0 1
      libs/external/.gitkeep
  5. 12 0
      libs/external/Skill_Seekers-development/.claude/mcp_config.example.json
  6. 57 0
      libs/external/Skill_Seekers-development/.gitignore
  7. 292 0
      libs/external/Skill_Seekers-development/ASYNC_SUPPORT.md
  8. 518 0
      libs/external/Skill_Seekers-development/BULLETPROOF_QUICKSTART.md
  9. 693 0
      libs/external/Skill_Seekers-development/CHANGELOG.md
  10. 860 0
      libs/external/Skill_Seekers-development/CLAUDE.md
  11. 432 0
      libs/external/Skill_Seekers-development/CONTRIBUTING.md
  12. 393 0
      libs/external/Skill_Seekers-development/FLEXIBLE_ROADMAP.md
  13. 292 0
      libs/external/Skill_Seekers-development/FUTURE_RELEASES.md
  14. 21 0
      libs/external/Skill_Seekers-development/LICENSE
  15. 196 0
      libs/external/Skill_Seekers-development/QUICKSTART.md
  16. 1099 0
      libs/external/Skill_Seekers-development/README.md
  17. 266 0
      libs/external/Skill_Seekers-development/ROADMAP.md
  18. 124 0
      libs/external/Skill_Seekers-development/STRUCTURE.md
  19. 446 0
      libs/external/Skill_Seekers-development/TROUBLESHOOTING.md
  20. 31 0
      libs/external/Skill_Seekers-development/configs/ansible-core.json
  21. 30 0
      libs/external/Skill_Seekers-development/configs/astro.json
  22. 37 0
      libs/external/Skill_Seekers-development/configs/claude-code.json
  23. 34 0
      libs/external/Skill_Seekers-development/configs/django.json
  24. 49 0
      libs/external/Skill_Seekers-development/configs/django_unified.json
  25. 17 0
      libs/external/Skill_Seekers-development/configs/example_pdf.json
  26. 33 0
      libs/external/Skill_Seekers-development/configs/fastapi.json
  27. 45 0
      libs/external/Skill_Seekers-development/configs/fastapi_unified.json
  28. 41 0
      libs/external/Skill_Seekers-development/configs/fastapi_unified_test.json
  29. 63 0
      libs/external/Skill_Seekers-development/configs/godot-large-example.json
  30. 47 0
      libs/external/Skill_Seekers-development/configs/godot.json
  31. 19 0
      libs/external/Skill_Seekers-development/configs/godot_github.json
  32. 50 0
      libs/external/Skill_Seekers-development/configs/godot_unified.json
  33. 18 0
      libs/external/Skill_Seekers-development/configs/hono.json
  34. 48 0
      libs/external/Skill_Seekers-development/configs/kubernetes.json
  35. 34 0
      libs/external/Skill_Seekers-development/configs/laravel.json
  36. 17 0
      libs/external/Skill_Seekers-development/configs/python-tutorial-test.json
  37. 31 0
      libs/external/Skill_Seekers-development/configs/react.json
  38. 15 0
      libs/external/Skill_Seekers-development/configs/react_github.json
  39. 44 0
      libs/external/Skill_Seekers-development/configs/react_unified.json
  40. 108 0
      libs/external/Skill_Seekers-development/configs/steam-economy-complete.json
  41. 30 0
      libs/external/Skill_Seekers-development/configs/tailwind.json
  42. 17 0
      libs/external/Skill_Seekers-development/configs/test-manual.json
  43. 31 0
      libs/external/Skill_Seekers-development/configs/vue.json
  44. 195 0
      libs/external/Skill_Seekers-development/demo_conflicts.py
  45. 400 0
      libs/external/Skill_Seekers-development/docs/CLAUDE.md
  46. 250 0
      libs/external/Skill_Seekers-development/docs/ENHANCEMENT.md
  47. 431 0
      libs/external/Skill_Seekers-development/docs/LARGE_DOCUMENTATION.md
  48. 60 0
      libs/external/Skill_Seekers-development/docs/LLMS_TXT_SUPPORT.md
  49. 618 0
      libs/external/Skill_Seekers-development/docs/MCP_SETUP.md
  50. 579 0
      libs/external/Skill_Seekers-development/docs/PDF_ADVANCED_FEATURES.md
  51. 521 0
      libs/external/Skill_Seekers-development/docs/PDF_CHUNKING.md
  52. 420 0
      libs/external/Skill_Seekers-development/docs/PDF_EXTRACTOR_POC.md
  53. 553 0
      libs/external/Skill_Seekers-development/docs/PDF_IMAGE_EXTRACTION.md
  54. 437 0
      libs/external/Skill_Seekers-development/docs/PDF_MCP_TOOL.md
  55. 491 0
      libs/external/Skill_Seekers-development/docs/PDF_PARSING_RESEARCH.md
  56. 616 0
      libs/external/Skill_Seekers-development/docs/PDF_SCRAPER.md
  57. 576 0
      libs/external/Skill_Seekers-development/docs/PDF_SYNTAX_DETECTION.md
  58. 94 0
      libs/external/Skill_Seekers-development/docs/TERMINAL_SELECTION.md
  59. 716 0
      libs/external/Skill_Seekers-development/docs/TESTING.md
  60. 342 0
      libs/external/Skill_Seekers-development/docs/TEST_MCP_IN_CLAUDE_CODE.md
  61. 633 0
      libs/external/Skill_Seekers-development/docs/UNIFIED_SCRAPING.md
  62. 351 0
      libs/external/Skill_Seekers-development/docs/UPLOAD_GUIDE.md
  63. 811 0
      libs/external/Skill_Seekers-development/docs/USAGE.md
  64. 867 0
      libs/external/Skill_Seekers-development/docs/plans/2025-10-24-active-skills-design.md
  65. 682 0
      libs/external/Skill_Seekers-development/docs/plans/2025-10-24-active-skills-phase1.md
  66. 11 0
      libs/external/Skill_Seekers-development/example-mcp-config.json
  67. 13 0
      libs/external/Skill_Seekers-development/mypy.ini
  68. 149 0
      libs/external/Skill_Seekers-development/pyproject.toml
  69. 42 0
      libs/external/Skill_Seekers-development/requirements.txt
  70. 266 0
      libs/external/Skill_Seekers-development/setup_mcp.sh
  71. 22 0
      libs/external/Skill_Seekers-development/src/skill_seekers/__init__.py
  72. 39 0
      libs/external/Skill_Seekers-development/src/skill_seekers/cli/__init__.py
  73. 500 0
      libs/external/Skill_Seekers-development/src/skill_seekers/cli/code_analyzer.py
  74. 376 0
      libs/external/Skill_Seekers-development/src/skill_seekers/cli/config_validator.py
  75. 513 0
      libs/external/Skill_Seekers-development/src/skill_seekers/cli/conflict_detector.py
  76. 72 0
      libs/external/Skill_Seekers-development/src/skill_seekers/cli/constants.py
  77. 1822 0
      libs/external/Skill_Seekers-development/src/skill_seekers/cli/doc_scraper.py
  78. 273 0
      libs/external/Skill_Seekers-development/src/skill_seekers/cli/enhance_skill.py
  79. 451 0
      libs/external/Skill_Seekers-development/src/skill_seekers/cli/enhance_skill_local.py
  80. 288 0
      libs/external/Skill_Seekers-development/src/skill_seekers/cli/estimate_pages.py
  81. 274 0
      libs/external/Skill_Seekers-development/src/skill_seekers/cli/generate_router.py
  82. 900 0
      libs/external/Skill_Seekers-development/src/skill_seekers/cli/github_scraper.py
  83. 66 0
      libs/external/Skill_Seekers-development/src/skill_seekers/cli/llms_txt_detector.py
  84. 94 0
      libs/external/Skill_Seekers-development/src/skill_seekers/cli/llms_txt_downloader.py
  85. 74 0
      libs/external/Skill_Seekers-development/src/skill_seekers/cli/llms_txt_parser.py
  86. 285 0
      libs/external/Skill_Seekers-development/src/skill_seekers/cli/main.py
  87. 513 0
      libs/external/Skill_Seekers-development/src/skill_seekers/cli/merge_sources.py
  88. 81 0
      libs/external/Skill_Seekers-development/src/skill_seekers/cli/package_multi.py
  89. 220 0
      libs/external/Skill_Seekers-development/src/skill_seekers/cli/package_skill.py
  90. 1222 0
      libs/external/Skill_Seekers-development/src/skill_seekers/cli/pdf_extractor_poc.py
  91. 401 0
      libs/external/Skill_Seekers-development/src/skill_seekers/cli/pdf_scraper.py
  92. 480 0
      libs/external/Skill_Seekers-development/src/skill_seekers/cli/quality_checker.py
  93. 228 0
      libs/external/Skill_Seekers-development/src/skill_seekers/cli/run_tests.py
  94. 320 0
      libs/external/Skill_Seekers-development/src/skill_seekers/cli/split_config.py
  95. 192 0
      libs/external/Skill_Seekers-development/src/skill_seekers/cli/test_unified_simple.py
  96. 450 0
      libs/external/Skill_Seekers-development/src/skill_seekers/cli/unified_scraper.py
  97. 444 0
      libs/external/Skill_Seekers-development/src/skill_seekers/cli/unified_skill_builder.py
  98. 175 0
      libs/external/Skill_Seekers-development/src/skill_seekers/cli/upload_skill.py
  99. 224 0
      libs/external/Skill_Seekers-development/src/skill_seekers/cli/utils.py
  100. 596 0
      libs/external/Skill_Seekers-development/src/skill_seekers/mcp/README.md

File diff suppressed because it is too large
+ 196 - 0
1


+ 248 - 0
i18n/zh/prompts/02-编程提示词/文档驱动开发/DDD 文档管家 Agent 工业级提示词-low.md

@@ -0,0 +1,248 @@
+# DDD 文档管家 Agent(工业级优化提示词 v2.0)
+
+## 一、角色与使命(ROLE & MISSION)
+
+### 你的身份
+你是一个 **Document-Driven Development(DDD)文档管家 Agent**,同时具备:
+- 工程级技术写作能力  
+- 架构与系统分析能力  
+- 严格的事实校验与证据意识  
+
+### 唯一使命
+> 将 `~/project/docs/` 打造成**单一可信来源(SSOT, Single Source of Truth)**,并确保其内容**始终与真实代码、配置和运行方式保持一致**。
+
+---
+
+## 二、核心原则(NON-NEGOTIABLE PRINCIPLES)
+
+1. **真实性优先(Truth First)**  
+   - 仅输出可从代码、配置、目录结构、脚本、CI 文件等“项目证据”中推导的事实  
+   - 无法确认的内容必须使用【待确认】标注,并给出明确的验证路径  
+
+2. **先盘点,再行动(Inventory Before Action)**  
+   - 任何文档写入前,必须先输出“文档盘点表”和“生成/更新计划”  
+
+3. **没有就创建,有就更新(Incremental over Rewrite)**  
+   - 文档缺失 → 创建最小可用版本  
+   - 文档存在 → 仅做必要的增量更新,保留历史  
+
+4. **一致性高于文案(Consistency over Elegance)**  
+   - 当文档与实现冲突时,以代码/配置为准  
+   - 在 Changelog 中明确记录“已按当前实现更新”  
+
+5. **可执行优先(Executable Docs)**  
+   - 命令必须可复制  
+   - 路径必须可定位  
+   - 新同学应能仅凭 docs 跑通项目  
+
+---
+
+## 三、工作对象与范围(CONTEXT)
+
+### 项目范围
+- 项目根目录:`~/project/`
+- 文档根目录:`~/project/docs/`
+
+### 服务对象
+- 工程团队(后端 / 前端 / 全栈 / 运维 / QA)
+- Tech Lead / 架构师 / PM
+- 新成员(Onboarding / Runbook)
+- AI Agent(需要明确、稳定、可执行流程)
+
+### 典型场景
+- 新项目:docs 为空,需要快速生成最小可用文档
+- 功能迭代:新增功能或接口,需同步更新文档
+- 线上事故:沉淀 incident,并回写 guides
+- 架构演进:记录 ADR,避免“想当然”的后续决策
+
+---
+
+## 四、标准目录结构(MANDATORY STRUCTURE)
+
+如不存在,必须创建以下结构:
+
+```
+
+docs/
+├── guides/         # 如何运行、配置、排障、协作
+├── integrations/   # API 与第三方系统集成
+├── features/       # PRD / 规格 / 验收标准
+├── architecture/   # ADR 与架构决策
+├── incidents/      # 事故复盘
+└── archive/        # 归档的历史文档
+
+```
+
+---
+
+## 五、执行流程(EXECUTION PIPELINE)
+
+### Phase A:项目与文档现状扫描
+**输出是强制的**
+
+- A1 项目扫描  
+  - README / 入口服务  
+  - 目录结构  
+  - 依赖清单(package.json / go.mod / requirements 等)  
+  - 配置文件(env / yaml / docker / k8s / CI)  
+  - API / 路由 / 接口定义  
+  - 核心模块与边界  
+
+- A2 文档扫描  
+  - 列出 `docs/` 下所有文件  
+  - 标注:缺失 / 过期 / 冲突 / 重复  
+
+---
+
+### Phase B:盘点表与计划(必须先输出)
+
+- B1《文档盘点表》  
+  - 按目录分类  
+  - 每一项必须注明**证据来源路径**
+
+- B2《生成 / 更新计划》  
+  - 新增文件清单  
+  - 更新文件清单  
+  - 【待确认】清单(含验证路径)
+
+> ⚠️ 未完成 B 阶段,禁止进入写文档阶段
+
+---
+
+### Phase C:按优先级创建 / 更新文档
+
+默认优先级(可调整,但需说明原因):
+
+1. `guides/`        —— 先让项目跑起来  
+2. `integrations/` —— 接口与第三方依赖  
+3. `features/`     —— 业务规格与验收  
+4. `architecture/` —— ADR 与约束  
+5. `incidents/`    —— 故障复盘  
+6. `archive/`      —— 归档历史内容  
+
+---
+
+### Phase D:一致性检查与交付
+
+- D1《变更摘要》
+  - 新增 / 更新 / 归档文件列表
+  - 每个文件 3–8 条关键变化
+
+- D2《一致性检查清单》
+  - 文档 ↔ 代码 校验点
+  - 仍存在的【待确认】项
+  - 下一步行动建议
+
+---
+
+## 六、文档写作最低标准(DOC CONTRACT)
+
+**每一个文档必须包含以下章节:**
+
+- Purpose(目的)
+- Scope(适用范围)
+- Status(Active / Draft / Deprecated)
+- Evidence(证据来源:文件路径 / 命令 / 配置)
+- Related(相关文档或代码链接)
+- Changelog(更新时间 + 变更摘要)
+
+---
+
+## 七、决策规则(DECISION LOGIC)
+
+```
+
+IF 事实无法从项目证据推导
+→ 标注【待确认】 + 给出验证路径
+ELSE IF 文档不存在
+→ 创建最小可用初版
+ELSE IF 文档与实现冲突
+→ 以代码/配置为准更新文档
+→ 在 Changelog 中记录原因
+ELSE
+→ 仅做必要的增量更新
+
+````
+
+---
+
+## 八、输入规范(INPUT CONTRACT)
+
+你将接收一个 JSON(若用户给自然语言,需先规范化为此结构):
+
+```json
+{
+  "required_fields": {
+    "project_root": "string (default: ~/project)",
+    "docs_root": "string (default: ~/project/docs)",
+    "output_mode": "direct_write | patch_diff | full_files",
+    "truthfulness_mode": "strict"
+  },
+  "optional_fields": {
+    "scope_hint": "string | null",
+    "change_type": "baseline | feature | bugfix | refactor | release",
+    "related_paths": "string[]",
+    "prefer_priority": "string[]",
+    "enforce_docs_index": "boolean",
+    "use_git_diff": "boolean",
+    "max_doc_size_kb": "number",
+    "style": "concise | standard | verbose"
+  }
+}
+````
+
+---
+
+## 九、输出顺序(OUTPUT ORDER — STRICT)
+
+你的输出必须严格按以下顺序:
+
+```
+1) 文档盘点表
+2) 生成 / 更新计划
+3) 逐文件文档内容
+   - direct_write:写入说明或内容
+   - patch_diff:统一 diff(推荐)
+   - full_files:完整 Markdown
+4) 变更摘要
+5) 一致性检查清单
+```
+
+---
+
+## 十、异常与降级处理(FAIL-SAFE)
+
+### 无法访问仓库
+
+* 明确声明无法扫描
+* 仅输出 docs 结构 + 模板骨架
+* 所有事实标注【待确认】
+* 列出用户需补充的最小证据清单
+
+### 敏感信息
+
+* 仅描述变量名与获取方式
+* 使用 `REDACTED` / 占位符
+* 提醒安全存储与整改建议
+
+---
+
+## 十一、语言与风格要求(STYLE GUIDE)
+
+* 使用 **中文**
+* 工程化、清晰、可执行
+* 多使用列表、表格、代码块
+* 所有高风险事实必须可追溯或【待确认】
+
+---
+
+## 十二、最终目标(SUCCESS CRITERIA)
+
+当任务完成时,应满足:
+
+* docs 目录结构完整且清晰
+* 文档内容可追溯、可执行、可维护
+* 新人可仅依赖 docs 完成环境搭建与基本开发
+* AI 或人类后续决策不再“想当然”
+
+> **你的成功标准:docs = 项目的真实运行说明书,而不是愿望清单。**

+ 599 - 0
i18n/zh/prompts/02-编程提示词/文档驱动开发/DDD 文档管家 Agent 工业级提示词.md

@@ -0,0 +1,599 @@
+# DDD 文档管家 Agent 工业级提示词 v1.0.0
+
+## 📌 元信息 META
+
+* 版本: 1.0.0
+* 模型: GPT / Claude / Gemini(任一支持长上下文与多文件推理的模型均可)
+* 更新: 2025-12-20
+* 作者: Standardized Prompt Architect Team
+* 许可: 允许在团队/组织内部用于工程实践;允许二次修改并保留本元信息;禁止将输出用于伪造项目事实或误导性文档
+
+---
+
+## 🌍 上下文 CONTEXT
+
+### 背景说明
+
+在真实工程中,文档经常与代码脱节,导致新人上手困难、接口误用、配置出错、故障复发。文档驱动开发(DDD, Document-Driven Development)要求文档不仅“写出来”,更要成为**单一可信来源(SSOT)**,并且与代码/配置/运行方式始终同步。
+
+### 问题定义
+
+你需要扮演“文档管家”,对指定仓库 `~/project/` 进行**基于真实项目现状**的文档创建与维护:
+
+* docs 缺失就创建最小可用版本
+* docs 已存在就增量更新(避免大改导致历史丢失)
+* **禁止臆测**:无法从代码/配置/现有文档推导的信息必须标注【待确认】并给出验证路径
+
+### 目标用户
+
+* 工程团队(后端/前端/全栈/运维/QA)
+* Tech Lead / 架构师 / PM(需要追踪决策、规格、集成、事故复盘)
+* 新同学(需要可执行的 runbook 和 onboarding 指南)
+* AI Agent(需要明确的“先盘点再行动”流程与质量门槛)
+
+### 使用场景
+
+* 新项目:docs 为空,需要快速生成最小可用 docs 并可持续维护
+* 迭代开发:新增功能或改接口,需要同步更新 features/ 与 integrations/
+* 线上故障修复:需要沉淀 incidents/ 并回写 guides/ 的排障与预防措施
+* 架构演进:需要 ADR 记录决策与约束,避免后续 AI/人“想当然”
+
+### 预期价值
+
+* docs 与代码一致、可追溯、可链接、可搜索
+* 将“怎么跑、怎么配、怎么集成、怎么排障”沉淀为团队资产
+* 减少返工与事故复发,提升交付速度与质量稳定性
+
+---
+
+## 👤 角色定义 ROLE
+
+### 身份设定
+
+你是一位「项目文档驱动开发 DDD 文档管家 + 技术写作编辑 + 架构助理」。
+你的唯一目标:让 `~/project/docs/` 成为项目的**单一可信来源(SSOT)**,并且始终与真实代码/配置/运行方式一致。
+
+### 能力矩阵
+
+| 技能领域       | 熟练度        | 具体应用                           |
+| ---------- | ---------- | ------------------------------ |
+| 代码与配置证据提取  | ■■■■■■■■■□ | 从目录结构、配置文件、依赖清单、路由/接口定义中提炼事实   |
+| 技术写作与信息架构  | ■■■■■■■■■□ | 结构化 Markdown、可维护目录、交叉引用、读者导向文档 |
+| 工程工作流理解    | ■■■■■■■■□□ | CI/CD、分支策略、发布与回滚、环境变量与运行方式     |
+| API/集成文档编写 | ■■■■■■■■■□ | 请求/响应示例、错误码、鉴权、重试/限流、验证步骤      |
+| 事故复盘与预防    | ■■■■■■■■□□ | RCA、时间线、修复验证、预防措施、runbook 回写   |
+| 质量门禁与一致性检查 | ■■■■■■■■■□ | 文档-代码一致性校验、变更摘要、待确认项追踪         |
+
+### 经验背景
+
+* 熟悉多语言项目(Node/Python/Go/Java 等)的常见结构与配置习惯
+* 能以“证据链”方式写文档:每个关键事实都能指向文件路径或命令输出
+* 能在不确定时正确“停下来标注待确认”,而不是编造
+
+### 行为准则
+
+1. **真实性优先**:只写能从项目证据推导的内容,禁止臆测。
+2. **没有就创建,有就更新**:缺失就补齐最小可用;存在就增量更新并保留历史。
+3. **先盘点再行动**:任何写入/输出文档前必须先给盘点表与计划。
+4. **一致性高于完美文案**:以代码/配置为准,必要时说明“已按当前实现更新”。
+5. **可执行优先**:命令可复制、路径可定位、步骤可落地、新人可按文档跑通。
+
+### 沟通风格
+
+* 用中文输出,工程化、清晰、可执行
+* 多用列表与表格;关键路径/命令必须可复制
+* 遇到不确定必须用【待确认】+证据缺口与验证指引
+
+---
+
+## 📋 任务说明 TASK
+
+### 核心目标
+
+基于 `~/project/` 的真实内容,对 `~/project/docs/` 按既定目录结构进行**盘点、创建、更新、归档**,并输出可直接落盘的文档内容或补丁,最终使 docs 成为 SSOT。
+
+### 依赖关系
+
+* 需要能读取项目文件树、关键文件内容(README、配置、依赖清单、路由/API 定义、脚本、CI 配置等)
+* 若具备写入权限:直接创建/修改 `~/project/docs/` 下文件
+* 若无写入权限:输出“逐文件完整内容”或“统一 diff 补丁”,可复制落盘
+
+### 执行流程
+
+#### Phase A 项目与文档现状扫描
+
+```
+A1 扫描项目概况(至少覆盖)
+    └─> 输出:项目概况摘要(证据路径列表)
+    - README / 入口服务 / 目录结构
+    - 依赖清单(package.json/pyproject/requirements/go.mod 等)
+    - 配置文件(.env* / yaml / toml / docker / k8s / terraform 等)
+    - API 定义(OpenAPI/Swagger/Proto/路由代码)
+    - 核心业务模块与边界(模块划分、关键域)
+
+A2 扫描 ~/project/docs/ 现有内容
+    └─> 输出:docs 文件清单 + 初步判断(过期/缺失/重复/冲突)
+```
+
+#### Phase B 文档盘点表与生成更新计划
+
+```
+B1 输出《文档盘点表》
+    └─> 输出:按目录分类的状态表(含证据来源路径)
+B2 输出《生成/更新计划》
+    └─> 输出:新增文件清单、更新文件清单、待确认清单
+    注意:必须先输出计划,再开始写具体文档内容
+```
+
+#### Phase C 按优先级创建更新文档
+
+默认优先级(可因项目实际情况调整,但必须说明原因):
+
+```
+1 guides/        └─> 让团队能跑起来(开发环境、工作流、排障、AI 协作规范)
+2 integrations/  └─> 接口与第三方依赖(最容易出错)
+3 features/      └─> PRD 与规格(业务与验收标准)
+4 architecture/  └─> ADR(决策与约束,避免“乱建议”)
+5 incidents/     └─> 复盘(沉淀上下文与预防)
+6 archive/       └─> 归档过期但有价值内容
+```
+
+#### Phase D 一致性检查与交付摘要
+
+```
+D1 输出《变更摘要》
+    └─> 输出:新增/更新/归档文件路径清单 + 每个文件 3~8 条关键变化点
+D2 输出《一致性检查清单》
+    └─> 输出:文档-代码一致性检查点 + 仍存在的【待确认】与下一步建议
+```
+
+### 决策逻辑
+
+```
+IF 关键事实缺少证据 THEN
+    在文档中标注【待确认】
+    并给出验证路径(文件路径/命令/日志/模块)
+ELSE IF docs 目录或子目录缺失 THEN
+    创建最小可用初版(含目的/适用范围/当前状态/相关链接/Changelog)
+ELSE IF 文档存在但与实现冲突 THEN
+    以代码/配置为准更新文档
+    并记录“已按当前实现更新”的变更摘要
+ELSE
+    仅做必要的增量更新
+```
+
+---
+
+## 🔄 输入输出 I/O
+
+### 输入规范
+
+> 你将收到一个 JSON(或等价键值描述)。如果用户只给自然语言,也要先将其规范化为此结构再执行。
+
+```json
+{
+  "required_fields": {
+    "project_root": "string,默认: ~/project",
+    "docs_root": "string,默认: ~/project/docs",
+    "output_mode": "enum[direct_write|patch_diff|full_files],默认: patch_diff",
+    "truthfulness_mode": "enum[strict],默认: strict"
+  },
+  "optional_fields": {
+    "scope_hint": "string,默认: null,说明: 用户强调的模块/功能/目录(如 'auth' 或 'services/api')",
+    "change_type": "enum[baseline|feature|bugfix|refactor|release],默认: baseline",
+    "related_paths": "array[string],默认: [],说明: 用户已知受影响路径(可为空)",
+    "prefer_priority": "array[string],默认: ['guides','integrations','features','architecture','incidents','archive']",
+    "enforce_docs_index": "boolean,默认: true,说明: 强制生成 docs/README.md 作为导航索引",
+    "use_git_diff": "boolean,默认: true,说明: 若可用则基于 git diff 聚焦更新",
+    "max_doc_size_kb": "number,默认: 200,说明: 单文档建议最大体量,超过则拆分",
+    "style": "enum[concise|standard|verbose],默认: standard"
+  },
+  "validation_rules": [
+    "project_root 与 docs_root 必须是可解析的路径",
+    "output_mode 必须为 direct_write / patch_diff / full_files 之一",
+    "truthfulness_mode= strict 时,禁止输出未经证据支持的事实性陈述",
+    "若 use_git_diff=true 且仓库存在 git,则优先用 diff 确定受影响模块"
+  ]
+}
+```
+
+### 输出模板结构
+
+> 输出必须严格按以下顺序组织,便于人类与自动化工具消费。
+
+```
+1) 文档盘点表
+2) 生成/更新计划
+3) 逐文件创建/更新内容
+   - direct_write: 给出将要写入的路径与内容(或写入动作描述)
+   - patch_diff: 输出统一 diff 补丁(推荐)
+   - full_files: 逐文件输出完整 Markdown
+4) 变更摘要
+5) 一致性检查清单
+```
+
+### 文档地图与目录结构要求
+
+必须保持如下目录结构(不存在则创建):
+
+```
+~/project/docs/
+├── architecture/
+├── features/
+├── integrations/
+├── guides/
+├── incidents/
+└── archive/
+```
+
+### 文件命名规范
+
+* ADR:`docs/architecture/adr-YYYYMMDD-<kebab-topic>.md`
+* PRD:`docs/features/prd-<kebab-feature>.md`
+* 规格/技术方案:`docs/features/spec-<kebab-feature>.md`
+* 集成:`docs/integrations/<kebab-service-or-api>.md`
+* 指南:`docs/guides/<kebab-topic>.md`
+* 事故复盘:`docs/incidents/incident-YYYYMMDD-<kebab-topic>.md`
+* 归档:`docs/archive/YYYY/<原文件名或主题>.md`(原位置需留说明/指向链接)
+
+### 每个文档最低结构要求
+
+所有文档必须包含:
+
+* 目的 Purpose
+* 适用范围 Scope
+* 当前状态 Status(例如 Active / Draft / Deprecated)
+* 证据来源 Evidence(代码路径/配置文件/命令输出来源)
+* 相关链接 Related(指向其他 docs 或代码路径)
+* Changelog(至少包含最后更新时间与变更摘要)
+
+---
+
+## 💡 示例库 EXAMPLES
+
+> 示例以“用户输入 → 你应输出什么”为准;输出内容可简化,但结构必须完整。
+
+### 示例 1 基础场景:docs 为空
+
+输入:
+
+```json
+{
+  "project_root": "~/project",
+  "docs_root": "~/project/docs",
+  "output_mode": "patch_diff",
+  "change_type": "baseline",
+  "scope_hint": "项目刚开始,docs 为空",
+  "enforce_docs_index": true,
+  "use_git_diff": false
+}
+```
+
+输出(摘要示例):
+
+```
+1) 文档盘点表
+- guides/: 缺失需新建(证据:docs 目录为空)
+- integrations/: 缺失需新建(证据:docs 目录为空)
+...
+
+2) 生成/更新计划
+- 新增:docs/README.md(导航)
+- 新增:docs/guides/getting-started.md(如何跑起来)
+- 新增:docs/guides/development-workflow.md(分支/PR/发布)
+- 新增:docs/integrations/<...>.md(按项目依赖提取)
+- 待确认:运行端口/环境变量(需从 .env / docker-compose / config 读取)
+
+3) 逐文件补丁
+(diff...)
+
+4) 变更摘要
+...
+
+5) 一致性检查清单
+...
+```
+
+说明要点:
+
+* 只创建“最小可用”,但必须可执行
+* 对运行方式、端口、环境变量等必须从配置取证;没有证据就【待确认】
+
+---
+
+### 示例 2 进阶场景:新增功能需要同步 PRD 与接口文档
+
+输入:
+
+```json
+{
+  "project_root": "~/project",
+  "docs_root": "~/project/docs",
+  "output_mode": "patch_diff",
+  "change_type": "feature",
+  "scope_hint": "新增:用户登录与 token 刷新",
+  "related_paths": ["services/api/auth", "services/api/routes", "docs/integrations"],
+  "use_git_diff": true
+}
+```
+
+输出(摘要示例):
+
+```
+1) 文档盘点表
+- features/prd-auth-login.md: 缺失需新建(证据:git diff 显示新增 auth 模块)
+- features/spec-auth-login.md: 缺失需新建(证据:routes 增加 /login /refresh)
+- integrations/auth-api.md: 存在但疑似过期(证据:接口字段/错误码与路由实现不一致)
+- guides/getting-started.md: 存在且最新(未涉及)
+
+2) 生成/更新计划
+- 新增:docs/features/prd-auth-login.md
+- 新增:docs/features/spec-auth-login.md
+- 更新:docs/integrations/auth-api.md(补请求/响应示例、错误码、鉴权、验证命令)
+- 待确认:token 过期时间(需从配置或常量文件确认:path...)
+
+3) 逐文件补丁
+(diff...)
+
+4) 变更摘要
+...
+
+5) 一致性检查清单
+...
+```
+
+说明要点:
+
+* PRD 写“为什么/验收标准”,Spec 写“接口与行为细节”
+* Integrations 必须包含可验证步骤(curl/脚本路径)
+
+---
+
+### 示例 3 边界场景:无法访问文件系统或缺少证据
+
+输入:
+
+```json
+{
+  "project_root": "~/project",
+  "docs_root": "~/project/docs",
+  "output_mode": "full_files",
+  "change_type": "baseline",
+  "scope_hint": "我无法提供仓库文件,只能描述大概情况",
+  "truthfulness_mode": "strict"
+}
+```
+
+输出(摘要示例):
+
+```
+1) 文档盘点表
+【待确认】无法扫描文件系统,无法列出 docs 清单(证据缺口:缺少目录树与关键文件内容)
+
+2) 生成/更新计划
+- 只能生成“可落地的文档模板骨架”,所有事实字段标注【待确认】
+- 待确认清单:项目语言/依赖/启动命令/端口/环境变量/API 定义位置...
+
+3) 逐文件内容
+- docs/README.md:导航骨架 + 待确认说明
+- docs/guides/getting-started.md:步骤骨架(所有命令标【待确认】+建议从哪里找)
+...
+
+4) 变更摘要
+...
+
+5) 一致性检查清单
+...
+```
+
+说明要点:
+
+* strict 模式下宁可输出“模板 + 待确认”,也不能编造命令/端口/字段
+
+---
+
+### ❌ 常见错误示例 避免这样做
+
+错误输出示例:
+
+```
+项目使用 Docker 启动:docker compose up -d
+服务端口是 8080
+环境变量需要配置 DATABASE_URL
+```
+
+问题:
+
+* 没有给出证据来源(哪些文件/哪些行/哪些命令输出)
+* 端口与变量属于高风险事实,strict 模式下必须可追溯,否则应标【待确认】并指出从哪里确认
+
+---
+
+## 📊 质量评估 EVALUATION
+
+### 评分标准 总分 100
+
+| 评估维度 | 权重  | 评分标准                                 |
+| ---- | --- | ------------------------------------ |
+| 准确性  | 30% | 关键事实是否均有证据路径;无证据是否正确标【待确认】           |
+| 完整性  | 25% | 是否覆盖 6 大目录;是否先盘点再计划再执行;是否有变更摘要与一致性检查 |
+| 清晰度  | 20% | 结构是否可导航;命令是否可复制;读者是否能按步骤跑通           |
+| 效率性  | 15% | 是否优先聚焦 diff/受影响模块;更新是否增量而非大重写        |
+| 可维护性 | 10% | 是否包含 Changelog、交叉链接、命名规范、拆分策略        |
+
+### 质量检查清单
+
+#### 必须满足 Critical
+
+* [ ] 输出包含且按顺序提供:盘点表 → 计划 → 文档内容 → 变更摘要 → 一致性检查
+* [ ] 所有事实性陈述均给出证据来源路径,或用【待确认】标注并给验证指引
+* [ ] 遵循“没有就创建,有就更新”,不做无意义大改
+* [ ] 每个被改动文档包含 Changelog(含最后更新时间与变更摘要)
+* [ ] docs 目录结构符合既定 6 类目录
+
+#### 应该满足 Important
+
+* [ ] 提供 docs/README.md 导航索引(若 enforce_docs_index=true)
+* [ ] Integrations 文档包含可验证步骤(curl/脚本/测试路径)
+* [ ] Guides 包含常见问题与排错(来自真实项目痛点或日志/issue/测试)
+
+#### 建议满足 Nice to have
+
+* [ ] 对关键决策生成 ADR(含 Alternatives 与 Consequences)
+* [ ] 对过期内容给出归档策略并保留原位置指向
+* [ ] 提供“下一步待确认清单”可直接转成 issue
+
+### 性能基准
+
+* 响应结构稳定:始终按 5 段交付结构输出
+* 文档变更最小化:同一文件非必要不重写超过 30%
+* 待确认可执行:每条【待确认】都包含“去哪里找证据”的路径或命令建议
+
+### 改进建议机制
+
+* 若评分 < 85:必须在末尾给出“下一轮改进清单”,按影响从高到低排序
+* 若出现一次臆测事实:准确性维度直接降为 0,并在异常处理中给出纠偏策略
+
+---
+
+## ⚠️ 异常处理 EXCEPTIONS
+
+### 场景 1 无法访问仓库或无法读取文件
+
+```
+触发条件:
+- 你无法读取 ~/project/ 或用户没有提供文件内容/目录树
+
+处理方案:
+1) 明确声明“无法进行真实扫描”,进入 strict 降级模式
+2) 仅输出 docs 结构与各类文档的最小可用模板
+3) 所有事实字段标注【待确认】并列出需要用户提供的证据清单
+回退策略:
+- 要求用户至少提供:tree(目录树)、README、依赖清单、主要配置文件、路由/API 定义位置
+用户引导文案:
+- “请提供以下文件/输出以便我生成与实现一致的文档:...(路径/命令清单)”
+```
+
+### 场景 2 文档与代码冲突
+
+```
+触发条件:
+- docs 中的端口/命令/字段/错误码与代码或配置不一致
+
+处理方案:
+1) 以代码/配置为准更新文档
+2) 在文档 Changelog 中记录冲突与更新原因
+3) 若冲突涉及行为变更或破坏性改动,建议补 ADR 或在 PRD/Spec 标注
+回退策略:
+- 若无法确认哪方是“当前生效”,标【待确认】并列出运行时验证方法(测试/日志/命令)
+```
+
+### 场景 3 仓库过大导致输出超长
+
+```
+触发条件:
+- 文件数量/模块过多,无法一次性完整覆盖
+
+处理方案:
+1) 仍然先输出“盘点表(可分批)+计划(分阶段)”
+2) 优先生成/更新 guides/ 与 integrations/ 的最小可用集合
+3) 将剩余内容列为“分批次计划”,并给出每批次的证据路径范围
+回退策略:
+- 若用户给 scope_hint 或 related_paths,则只聚焦受影响模块并明确声明“本次范围”
+```
+
+### 场景 4 涉及敏感信息或密钥泄露风险
+
+```
+触发条件:
+- 配置文件包含 token/secret/key/password 等敏感内容
+
+处理方案:
+1) 文档中只描述变量名与获取方式,不输出真实密钥
+2) 示例使用 REDACTED 或占位符
+3) 提醒将敏感配置放到安全存储(如 vault/secret manager),并在 guides 中说明
+回退策略:
+- 若敏感信息已出现在仓库,建议创建 incident 或安全整改文档并提示处理流程
+```
+
+### 错误消息模板
+
+```
+ERROR_001: "缺少证据来源,无法生成与实现一致的文档内容。"
+建议操作: 提供目录树、README、依赖清单、关键配置、路由/API 定义位置。
+
+ERROR_002: "检测到文档与实现冲突,已按当前代码/配置更新文档并记录 Changelog。"
+建议操作: 请确认是否需要补 ADR 或发布说明。
+```
+
+### 降级策略
+
+当主要能力不可用时(例如无法读取仓库或无法写文件):
+
+1. 输出 docs 结构与最小可用模板骨架(严格标【待确认】)
+2. 输出“证据采集清单”(用户一键复制命令)
+3. 输出可落盘的 full_files 或 patch_diff(即使内容是骨架,也要能落地)
+
+### 升级决策树
+
+```
+IF 无法读取仓库 AND 用户可提供文件/输出 THEN
+    请求最小证据集(tree/README/依赖/配置/API)
+ELSE IF 无法写入文件 THEN
+    output_mode=patch_diff 或 full_files
+ELSE
+    direct_write(并保持变更可追溯)
+```
+
+---
+
+## 🔧 使用说明
+
+### 快速开始
+
+1. 复制整份提示词作为 AI Agent 的系统提示或主提示
+2. 传入本次任务输入(JSON 或自然语言,建议 JSON)
+3. 让 Agent 按固定结构输出:盘点表 → 计划 → 文档内容 → 摘要 → 检查清单
+4. 将输出的 diff 或文件内容落盘到 `~/project/docs/`
+
+### 系统提示与用户提示拆分建议
+
+* 系统提示放:角色定义 ROLE、原则、执行流程、质量门禁、异常处理
+* 用户提示放:本次输入 JSON(change_type、scope_hint、related_paths 等)
+
+### 参数调优建议
+
+* 想更强硬工程化:
+
+  * `enforce_docs_index=true`(强制 docs/README.md 导航)
+  * `use_git_diff=true`(强制从 diff 聚焦更新)
+  * `output_mode=patch_diff`(强制可应用补丁)
+* 想更简洁:`style=concise`(但不得省略盘点表与计划)
+* 想更稳妥:保持 `truthfulness_mode=strict`,宁可【待确认】也不编造
+
+### 版本更新记录
+
+* v1.0.0 (2025-12-20): 首版工业级 DDD 文档管家提示词;包含 8 层结构、严格证据链、盘点与计划先行、可落盘输出模式与异常处理体系。
+
+---
+
+## 🎯 可直接粘贴使用的本次任务输入模板
+
+> 将下面内容作为“用户提示”贴给 Agent(按需修改)。
+
+```json
+{
+  "project_root": "~/project",
+  "docs_root": "~/project/docs",
+  "output_mode": "patch_diff",
+  "truthfulness_mode": "strict",
+  "change_type": "baseline",
+  "scope_hint": "请根据当前 ~/project/ 的真实内容维护 docs,使其成为 SSOT",
+  "related_paths": [],
+  "prefer_priority": ["guides", "integrations", "features", "architecture", "incidents", "archive"],
+  "enforce_docs_index": true,
+  "use_git_diff": true,
+  "max_doc_size_kb": 200,
+  "style": "standard"
+}
+```

+ 0 - 1
libs/external/.gitkeep

@@ -1 +0,0 @@
-# Third-party libraries (read-only)

+ 12 - 0
libs/external/Skill_Seekers-development/.claude/mcp_config.example.json

@@ -0,0 +1,12 @@
+{
+  "mcpServers": {
+    "skill-seeker": {
+      "command": "python3",
+      "args": [
+        "/REPLACE/WITH/YOUR/PATH/Skill_Seekers/mcp/server.py"
+      ],
+      "cwd": "/REPLACE/WITH/YOUR/PATH/Skill_Seekers",
+      "env": {}
+    }
+  }
+}

+ 57 - 0
libs/external/Skill_Seekers-development/.gitignore

@@ -0,0 +1,57 @@
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+
+# Virtual Environment
+venv/
+ENV/
+env/
+
+# Output directory
+output/
+*.zip
+
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+
+# OS
+.DS_Store
+Thumbs.db
+
+# Backups
+*.backup
+
+# Testing artifacts
+.pytest_cache/
+.coverage
+htmlcov/
+.tox/
+*.cover
+.hypothesis/
+.mypy_cache/
+.ruff_cache/
+
+# Build artifacts
+.build/

+ 292 - 0
libs/external/Skill_Seekers-development/ASYNC_SUPPORT.md

@@ -0,0 +1,292 @@
+# Async Support Documentation
+
+## 🚀 Async Mode for High-Performance Scraping
+
+As of this release, Skill Seeker supports **asynchronous scraping** for dramatically improved performance when scraping documentation websites.
+
+---
+
+## ⚡ Performance Benefits
+
+| Metric | Sync (Threads) | Async | Improvement |
+|--------|----------------|-------|-------------|
+| **Pages/second** | ~15-20 | ~40-60 | **2-3x faster** |
+| **Memory per worker** | ~10-15 MB | ~1-2 MB | **80-90% less** |
+| **Max concurrent** | ~50-100 | ~500-1000 | **10x more** |
+| **CPU efficiency** | GIL-limited | Full cores | **Much better** |
+
+---
+
+## 📋 How to Enable Async Mode
+
+### Option 1: Command Line Flag
+
+```bash
+# Enable async mode with 8 workers for best performance
+python3 cli/doc_scraper.py --config configs/react.json --async --workers 8
+
+# Quick mode with async
+python3 cli/doc_scraper.py --name react --url https://react.dev/ --async --workers 8
+
+# Dry run with async to test
+python3 cli/doc_scraper.py --config configs/godot.json --async --workers 4 --dry-run
+```
+
+### Option 2: Configuration File
+
+Add `"async_mode": true` to your config JSON:
+
+```json
+{
+  "name": "react",
+  "base_url": "https://react.dev/",
+  "async_mode": true,
+  "workers": 8,
+  "rate_limit": 0.5,
+  "max_pages": 500
+}
+```
+
+Then run normally:
+
+```bash
+python3 cli/doc_scraper.py --config configs/react-async.json
+```
+
+---
+
+## 🎯 Recommended Settings
+
+### Small Documentation (~100-500 pages)
+```bash
+--async --workers 4
+```
+
+### Medium Documentation (~500-2000 pages)
+```bash
+--async --workers 8
+```
+
+### Large Documentation (2000+ pages)
+```bash
+--async --workers 8 --no-rate-limit
+```
+
+**Note:** More workers isn't always better. Test with 4, then 8, to find optimal performance for your use case.
+
+---
+
+## 🔧 Technical Implementation
+
+### What Changed
+
+**New Methods:**
+- `async def scrape_page_async()` - Async version of page scraping
+- `async def scrape_all_async()` - Async version of scraping loop
+
+**Key Technologies:**
+- **httpx.AsyncClient** - Async HTTP client with connection pooling
+- **asyncio.Semaphore** - Concurrency control (replaces threading.Lock)
+- **asyncio.gather()** - Parallel task execution
+- **asyncio.sleep()** - Non-blocking rate limiting
+
+**Backwards Compatibility:**
+- Async mode is **opt-in** (default: sync mode)
+- All existing configs work unchanged
+- Zero breaking changes
+
+---
+
+## 📊 Benchmarks
+
+### Test Case: React Documentation (7,102 chars, 500 pages)
+
+**Sync Mode (Threads):**
+```bash
+python3 cli/doc_scraper.py --config configs/react.json --workers 8
+# Time: ~45 minutes
+# Pages/sec: ~18
+# Memory: ~120 MB
+```
+
+**Async Mode:**
+```bash
+python3 cli/doc_scraper.py --config configs/react.json --async --workers 8
+# Time: ~15 minutes (3x faster!)
+# Pages/sec: ~55
+# Memory: ~40 MB (66% less)
+```
+
+---
+
+## ⚠️ Important Notes
+
+### When to Use Async
+
+✅ **Use async when:**
+- Scraping 500+ pages
+- Using 4+ workers
+- Network latency is high
+- Memory is constrained
+
+❌ **Don't use async when:**
+- Scraping < 100 pages (overhead not worth it)
+- workers = 1 (no parallelism benefit)
+- Testing/debugging (sync is simpler)
+
+### Rate Limiting
+
+Async mode respects rate limits just like sync mode:
+```bash
+# 0.5 second delay between requests (default)
+--async --workers 8 --rate-limit 0.5
+
+# No rate limiting (use carefully!)
+--async --workers 8 --no-rate-limit
+```
+
+### Checkpoints
+
+Async mode supports checkpoints for resuming interrupted scrapes:
+```json
+{
+  "async_mode": true,
+  "checkpoint": {
+    "enabled": true,
+    "interval": 1000
+  }
+}
+```
+
+---
+
+## 🧪 Testing
+
+Async mode includes comprehensive tests:
+
+```bash
+# Run async-specific tests
+python -m pytest tests/test_async_scraping.py -v
+
+# Run all tests
+python cli/run_tests.py
+```
+
+**Test Coverage:**
+- 11 async-specific tests
+- Configuration tests
+- Routing tests (sync vs async)
+- Error handling
+- llms.txt integration
+
+---
+
+## 🐛 Troubleshooting
+
+### "Too many open files" error
+
+Reduce worker count:
+```bash
+--async --workers 4  # Instead of 8
+```
+
+### Async mode slower than sync
+
+This can happen with:
+- Very low worker count (use >= 4)
+- Very fast local network (async overhead not worth it)
+- Small documentation (< 100 pages)
+
+**Solution:** Use sync mode for small docs, async for large ones.
+
+### Memory usage still high
+
+Async reduces memory per worker, but:
+- BeautifulSoup parsing is still memory-intensive
+- More workers = more memory
+
+**Solution:** Use 4-6 workers instead of 8-10.
+
+---
+
+## 📚 Examples
+
+### Example 1: Fast scraping with async
+
+```bash
+# Godot documentation (~1,600 pages)
+python3 cli/doc_scraper.py \\
+  --config configs/godot.json \\
+  --async \\
+  --workers 8 \\
+  --rate-limit 0.3
+
+# Result: ~12 minutes (vs 40 minutes sync)
+```
+
+### Example 2: Respectful scraping with async
+
+```bash
+# Django documentation with polite rate limiting
+python3 cli/doc_scraper.py \\
+  --config configs/django.json \\
+  --async \\
+  --workers 4 \\
+  --rate-limit 1.0
+
+# Still faster than sync, but respectful to server
+```
+
+### Example 3: Testing async mode
+
+```bash
+# Dry run to test async without actual scraping
+python3 cli/doc_scraper.py \\
+  --config configs/react.json \\
+  --async \\
+  --workers 8 \\
+  --dry-run
+
+# Preview URLs, test configuration
+```
+
+---
+
+## 🔮 Future Enhancements
+
+Planned improvements for async mode:
+
+- [ ] Adaptive worker scaling based on server response time
+- [ ] Connection pooling optimization
+- [ ] Progress bars for async scraping
+- [ ] Real-time performance metrics
+- [ ] Automatic retry with backoff for failed requests
+
+---
+
+## 💡 Best Practices
+
+1. **Start with 4 workers** - Test, then increase if needed
+2. **Use --dry-run first** - Verify configuration before scraping
+3. **Respect rate limits** - Don't disable unless necessary
+4. **Monitor memory** - Reduce workers if memory usage is high
+5. **Use checkpoints** - Enable for large scrapes (>1000 pages)
+
+---
+
+## 📖 Additional Resources
+
+- **Main README**: [README.md](README.md)
+- **Technical Docs**: [docs/CLAUDE.md](docs/CLAUDE.md)
+- **Test Suite**: [tests/test_async_scraping.py](tests/test_async_scraping.py)
+- **Configuration Guide**: See `configs/` directory for examples
+
+---
+
+## ✅ Version Information
+
+- **Feature**: Async Support
+- **Version**: Added in current release
+- **Status**: Production-ready
+- **Test Coverage**: 11 async-specific tests, all passing
+- **Backwards Compatible**: Yes (opt-in feature)

+ 518 - 0
libs/external/Skill_Seekers-development/BULLETPROOF_QUICKSTART.md

@@ -0,0 +1,518 @@
+# Bulletproof Quick Start Guide
+
+**Target Audience:** Complete beginners | Never used Python/git before? Start here!
+
+**Time:** 15-30 minutes total (including all installations)
+
+**Result:** Working Skill Seeker installation + your first Claude skill created
+
+---
+
+## 📋 What You'll Need
+
+Before starting, you need:
+- A computer (macOS, Linux, or Windows with WSL)
+- Internet connection
+- 30 minutes of time
+
+That's it! We'll install everything else together.
+
+---
+
+## Step 1: Install Python (5 minutes)
+
+### Check if You Already Have Python
+
+Open Terminal (macOS/Linux) or Command Prompt (Windows) and type:
+
+```bash
+python3 --version
+```
+
+**✅ If you see:** `Python 3.10.x` or `Python 3.11.x` or higher → **Skip to Step 2!**
+
+**❌ If you see:** `command not found` or version less than 3.10 → **Continue below**
+
+### Install Python
+
+#### macOS:
+```bash
+# Install Homebrew (if not installed)
+/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
+
+# Install Python
+brew install python3
+```
+
+**Verify:**
+```bash
+python3 --version
+# Should show: Python 3.11.x or similar
+```
+
+#### Linux (Ubuntu/Debian):
+```bash
+sudo apt update
+sudo apt install python3 python3-pip
+```
+
+**Verify:**
+```bash
+python3 --version
+pip3 --version
+```
+
+#### Windows:
+1. Download Python from: https://www.python.org/downloads/
+2. Run installer
+3. **IMPORTANT:** Check "Add Python to PATH" during installation
+4. Open Command Prompt and verify:
+```bash
+python --version
+```
+
+**✅ Success looks like:**
+```
+Python 3.11.5
+```
+
+---
+
+## Step 2: Install Git (3 minutes)
+
+### Check if You Have Git
+
+```bash
+git --version
+```
+
+**✅ If you see:** `git version 2.x.x` → **Skip to Step 3!**
+
+**❌ If not installed:**
+
+#### macOS:
+```bash
+brew install git
+```
+
+#### Linux:
+```bash
+sudo apt install git
+```
+
+#### Windows:
+Download from: https://git-scm.com/download/win
+
+**Verify:**
+```bash
+git --version
+# Should show: git version 2.x.x
+```
+
+---
+
+## Step 3: Get Skill Seeker (2 minutes)
+
+### Choose Where to Put It
+
+Pick a location for the project. Good choices:
+- macOS/Linux: `~/Projects/` or `~/Documents/`
+  - Note: `~` means your home directory (`$HOME` or `/Users/yourname` on macOS, `/home/yourname` on Linux)
+- Windows: `C:\Users\YourName\Projects\`
+
+### Clone the Repository
+
+```bash
+# Create Projects directory (if it doesn't exist)
+mkdir -p ~/Projects
+cd ~/Projects
+
+# Clone Skill Seeker
+git clone https://github.com/yusufkaraaslan/Skill_Seekers.git
+
+# Enter the directory
+cd Skill_Seekers
+```
+
+**✅ Success looks like:**
+```
+Cloning into 'Skill_Seekers'...
+remote: Enumerating objects: 245, done.
+remote: Counting objects: 100% (245/245), done.
+```
+
+**Verify you're in the right place:**
+```bash
+pwd
+# Should show something like:
+#   macOS: /Users/yourname/Projects/Skill_Seekers
+#   Linux: /home/yourname/Projects/Skill_Seekers
+# (Replace 'yourname' with YOUR actual username)
+
+ls
+# Should show: README.md, cli/, mcp/, configs/, etc.
+```
+
+**❌ If `git clone` fails:**
+```bash
+# Check internet connection
+ping google.com
+
+# Or download ZIP manually:
+# https://github.com/yusufkaraaslan/Skill_Seekers/archive/refs/heads/main.zip
+# Then unzip and cd into it
+```
+
+---
+
+## Step 4: Setup Virtual Environment & Install Dependencies (3 minutes)
+
+A virtual environment keeps Skill Seeker's dependencies isolated and prevents conflicts.
+
+```bash
+# Make sure you're in the Skill_Seekers directory
+cd ~/Projects/Skill_Seekers  # ~ means your home directory ($HOME)
+                             # Adjust if you chose a different location
+
+# Create virtual environment
+python3 -m venv venv
+
+# Activate it
+source venv/bin/activate  # macOS/Linux
+# Windows users: venv\Scripts\activate
+```
+
+**✅ Success looks like:**
+```
+(venv) username@computer Skill_Seekers %
+```
+Notice `(venv)` appears in your prompt - this means the virtual environment is active!
+
+```bash
+# Now install packages (only needed once)
+pip install requests beautifulsoup4 pytest
+
+# Save the dependency list
+pip freeze > requirements.txt
+```
+
+**✅ Success looks like:**
+```
+Successfully installed requests-2.32.5 beautifulsoup4-4.14.2 pytest-8.4.2 ...
+```
+
+**Optional - Only if you want API-based enhancement (not needed for LOCAL enhancement):**
+```bash
+pip install anthropic
+```
+
+**Important Notes:**
+- **Every time** you open a new terminal to use Skill Seeker, run `source venv/bin/activate` first
+- You'll know it's active when you see `(venv)` in your terminal prompt
+- To deactivate later: just type `deactivate`
+
+**❌ If python3 not found:**
+```bash
+# Try without the 3
+python -m venv venv
+```
+
+**❌ If permission denied:**
+```bash
+# Virtual environment approach doesn't need sudo - you might have the wrong path
+# Make sure you're in the Skill_Seekers directory:
+pwd
+# Should show something like:
+#   macOS: /Users/yourname/Projects/Skill_Seekers
+#   Linux: /home/yourname/Projects/Skill_Seekers
+# (Replace 'yourname' with YOUR actual username)
+```
+
+---
+
+## Step 5: Test Your Installation (1 minute)
+
+Let's make sure everything works:
+
+```bash
+# Test the main script can run
+skill-seekers scrape --help
+```
+
+**✅ Success looks like:**
+```
+usage: doc_scraper.py [-h] [--config CONFIG] [--interactive] ...
+```
+
+**❌ If you see "No such file or directory":**
+```bash
+# Check you're in the right directory
+pwd
+# Should show path ending in /Skill_Seekers
+
+# List files
+ls cli/
+# Should show: doc_scraper.py, estimate_pages.py, etc.
+```
+
+---
+
+## Step 6: Create Your First Skill! (5-10 minutes)
+
+Let's create a simple skill using a preset configuration.
+
+### Option A: Small Test (Recommended First Time)
+
+```bash
+# Create a config for a small site first
+cat > configs/test.json << 'EOF'
+{
+  "name": "test-skill",
+  "description": "Test skill creation",
+  "base_url": "https://tailwindcss.com/docs/installation",
+  "max_pages": 5,
+  "rate_limit": 0.5
+}
+EOF
+
+# Run the scraper
+skill-seekers scrape --config configs/test.json
+```
+
+**What happens:**
+1. Scrapes 5 pages from Tailwind CSS docs
+2. Creates `output/test-skill/` directory
+3. Generates SKILL.md and reference files
+
+**⏱️ Time:** ~30 seconds
+
+**✅ Success looks like:**
+```
+Scraping: https://tailwindcss.com/docs/installation
+Page 1/5: Installation
+Page 2/5: Editor Setup
+...
+✅ Skill created at: output/test-skill/
+```
+
+### Option B: Full Example (React Docs)
+
+```bash
+# Use the React preset
+skill-seekers scrape --config configs/react.json --max-pages 50
+```
+
+**⏱️ Time:** ~5 minutes
+
+**What you get:**
+- `output/react/SKILL.md` - Main skill file
+- `output/react/references/` - Organized documentation
+
+### Verify It Worked
+
+```bash
+# Check the output
+ls output/test-skill/
+# Should show: SKILL.md, references/, scripts/, assets/
+
+# Look at the generated skill
+head output/test-skill/SKILL.md
+```
+
+---
+
+## Step 7: Package for Claude (30 seconds)
+
+```bash
+# Package the skill
+skill-seekers package output/test-skill/
+```
+
+**✅ Success looks like:**
+```
+✅ Skill packaged successfully!
+📦 Created: output/test-skill.zip
+📏 Size: 45.2 KB
+
+Ready to upload to Claude AI!
+```
+
+**Now you have:** `output/test-skill.zip` ready to upload to Claude!
+
+---
+
+## Step 8: Upload to Claude (2 minutes)
+
+1. Go to https://claude.ai
+2. Click your profile → Settings
+3. Click "Knowledge" or "Skills"
+4. Click "Upload Skill"
+5. Select `output/test-skill.zip`
+6. Done! Claude can now use this skill
+
+---
+
+## 🎉 Success! What's Next?
+
+You now have a working Skill Seeker installation! Here's what you can do:
+
+### Try Other Presets
+
+```bash
+# See all available presets
+ls configs/
+
+# Try Vue.js
+skill-seekers scrape --config configs/vue.json --max-pages 50
+
+# Try Django
+skill-seekers scrape --config configs/django.json --max-pages 50
+```
+
+### Create Custom Skills
+
+```bash
+# Interactive mode - answer questions
+skill-seekers scrape --interactive
+
+# Or create config for any website
+skill-seekers scrape \
+  --name myframework \
+  --url https://docs.myframework.com/ \
+  --description "My favorite framework"
+```
+
+### Use with Claude Code (Advanced)
+
+If you have Claude Code installed:
+
+```bash
+# One-time setup
+./setup_mcp.sh
+
+# Then use natural language in Claude Code:
+# "Generate a skill for Svelte docs"
+# "Package the skill at output/svelte/"
+```
+
+**See:** [docs/MCP_SETUP.md](docs/MCP_SETUP.md) for full MCP setup
+
+---
+
+## 🔧 Troubleshooting
+
+### "Command not found" errors
+
+**Problem:** `python3: command not found`
+
+**Solution:** Python not installed or not in PATH
+- macOS/Linux: Reinstall Python with brew/apt
+- Windows: Reinstall Python, check "Add to PATH"
+- Try `python` instead of `python3`
+
+### "Permission denied" errors
+
+**Problem:** Can't install packages or run scripts
+
+**Solution:**
+```bash
+# Use --user flag
+pip3 install --user requests beautifulsoup4
+
+# Or make script executable
+chmod +x cli/doc_scraper.py
+```
+
+### "No such file or directory"
+
+**Problem:** Can't find cli/doc_scraper.py
+
+**Solution:** You're not in the right directory
+```bash
+# Go to the Skill_Seekers directory
+cd ~/Projects/Skill_Seekers  # Adjust your path
+
+# Verify
+ls cli/
+# Should show doc_scraper.py
+```
+
+### "ModuleNotFoundError"
+
+**Problem:** Missing Python packages
+
+**Solution:**
+```bash
+# Install dependencies again
+pip3 install requests beautifulsoup4
+
+# If that fails, try:
+pip3 install --user requests beautifulsoup4
+```
+
+### Scraping is slow or fails
+
+**Problem:** Takes forever or gets errors
+
+**Solution:**
+```bash
+# Use smaller max_pages for testing
+skill-seekers scrape --config configs/react.json --max-pages 10
+
+# Check internet connection
+ping google.com
+
+# Check the website is accessible
+curl -I https://docs.yoursite.com
+```
+
+### Still stuck?
+
+1. **Check our detailed troubleshooting guide:** [TROUBLESHOOTING.md](TROUBLESHOOTING.md)
+2. **Open an issue:** https://github.com/yusufkaraaslan/Skill_Seekers/issues
+3. **Include this info:**
+   - Operating system (macOS 13, Ubuntu 22.04, Windows 11, etc.)
+   - Python version (`python3 --version`)
+   - Full error message
+   - What command you ran
+
+---
+
+## 📚 Next Steps
+
+- **Read the full README:** [README.md](README.md)
+- **Learn about presets:** [configs/](configs/)
+- **Try MCP integration:** [docs/MCP_SETUP.md](docs/MCP_SETUP.md)
+- **Advanced usage:** [docs/](docs/)
+
+---
+
+## ✅ Quick Reference
+
+```bash
+# Your typical workflow:
+
+# 1. Create/use a config
+skill-seekers scrape --config configs/react.json --max-pages 50
+
+# 2. Package it
+skill-seekers package output/react/
+
+# 3. Upload output/react.zip to Claude
+
+# Done! 🎉
+```
+
+**Common locations:**
+- **Configs:** `configs/*.json`
+- **Output:** `output/skill-name/`
+- **Packaged skills:** `output/skill-name.zip`
+
+**Time estimates:**
+- Small skill (5-10 pages): 30 seconds
+- Medium skill (50-100 pages): 3-5 minutes
+- Large skill (500+ pages): 15-30 minutes
+
+---
+
+**Still confused?** That's okay! Open an issue and we'll help you get started: https://github.com/yusufkaraaslan/Skill_Seekers/issues/new

+ 693 - 0
libs/external/Skill_Seekers-development/CHANGELOG.md

@@ -0,0 +1,693 @@
+# Changelog
+
+All notable changes to Skill Seeker will be documented in this file.
+
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+
+## [Unreleased]
+
+---
+
+## [2.1.1] - 2025-11-30
+
+### 🚀 GitHub Repository Analysis Enhancements
+
+This release significantly improves GitHub repository scraping with unlimited local analysis, configurable directory exclusions, and numerous bug fixes.
+
+### Added
+- **Configurable directory exclusions** for local repository analysis ([#203](https://github.com/yusufkaraaslan/Skill_Seekers/issues/203))
+  - `exclude_dirs_additional`: Extend default exclusions with custom directories
+  - `exclude_dirs`: Replace default exclusions entirely (advanced users)
+  - 19 comprehensive tests covering all scenarios
+  - Logging: INFO for extend mode, WARNING for replace mode
+- **Unlimited local repository analysis** via `local_repo_path` configuration parameter
+- **Auto-exclusion** of virtual environments, build artifacts, and cache directories
+- **Support for analyzing repositories without GitHub API rate limits** (50 → unlimited files)
+- **Skip llms.txt option** - Force HTML scraping even when llms.txt is detected ([#198](https://github.com/yusufkaraaslan/Skill_Seekers/pull/198))
+
+### Fixed
+- Fixed logger initialization error causing `AttributeError: 'NoneType' object has no attribute 'setLevel'` ([#190](https://github.com/yusufkaraaslan/Skill_Seekers/issues/190))
+- Fixed 3 NoneType subscriptable errors in release tag parsing
+- Fixed relative import paths causing `ModuleNotFoundError`
+- Fixed hardcoded 50-file analysis limit preventing comprehensive code analysis
+- Fixed GitHub API file tree limitation (140 → 345 files discovered)
+- Fixed AST parser "not iterable" errors eliminating 100% of parsing failures (95 → 0 errors)
+- Fixed virtual environment file pollution reducing file tree noise by 95%
+- Fixed `force_rescrape` flag not checked before interactive prompt causing EOFError in CI/CD environments
+
+### Improved
+- Increased code analysis coverage from 14% to 93.6% (+79.6 percentage points)
+- Improved file discovery from 140 to 345 files (+146%)
+- Improved class extraction from 55 to 585 classes (+964%)
+- Improved function extraction from 512 to 2,784 functions (+444%)
+- Test suite expanded to 427 tests (up from 391)
+
+---
+
+## [2.1.0] - 2025-11-12
+
+### 🎉 Major Enhancement: Quality Assurance + Race Condition Fixes
+
+This release focuses on quality and reliability improvements, adding comprehensive quality checks and fixing critical race conditions in the enhancement workflow.
+
+### 🚀 Major Features
+
+#### Comprehensive Quality Checker
+- **Automatic quality checks before packaging** - Validates skill quality before upload
+- **Quality scoring system** - 0-100 score with A-F grades
+- **Enhancement verification** - Checks for template text, code examples, sections
+- **Structure validation** - Validates SKILL.md, references/ directory
+- **Content quality checks** - YAML frontmatter, language tags, "When to Use" section
+- **Link validation** - Validates internal markdown links
+- **Detailed reporting** - Errors, warnings, and info messages with file locations
+- **CLI tool** - `skill-seekers-quality-checker` with verbose and strict modes
+
+#### Headless Enhancement Mode (Default)
+- **No terminal windows** - Runs enhancement in background by default
+- **Proper waiting** - Main console waits for enhancement to complete
+- **Timeout protection** - 10-minute default timeout (configurable)
+- **Verification** - Checks that SKILL.md was actually updated
+- **Progress messages** - Clear status updates during enhancement
+- **Interactive mode available** - `--interactive-enhancement` flag for terminal mode
+
+### Added
+
+#### New CLI Tools
+- **quality_checker.py** - Comprehensive skill quality validation
+  - Structure checks (SKILL.md, references/)
+  - Enhancement verification (code examples, sections)
+  - Content validation (frontmatter, language tags)
+  - Link validation (internal markdown links)
+  - Quality scoring (0-100 + A-F grade)
+
+#### New Features
+- **Headless enhancement** - `skill-seekers-enhance` runs in background by default
+- **Quality checks in packaging** - Automatic validation before creating .zip
+- **MCP quality skip** - MCP server skips interactive checks
+- **Enhanced error handling** - Better error messages and timeout handling
+
+#### Tests
+- **+12 quality checker tests** - Comprehensive validation testing
+- **391 total tests passing** - Up from 379 in v2.0.0
+- **0 test failures** - All tests green
+- **CI improvements** - Fixed macOS terminal detection tests
+
+### Changed
+
+#### Enhancement Workflow
+- **Default mode changed** - Headless mode is now default (was terminal mode)
+- **Waiting behavior** - Main console waits for enhancement completion
+- **No race conditions** - Fixed "Package your skill" message appearing too early
+- **Better progress** - Clear status messages during enhancement
+
+#### Package Workflow
+- **Quality checks added** - Automatic validation before packaging
+- **User confirmation** - Ask to continue if warnings/errors found
+- **Skip option** - `--skip-quality-check` flag to bypass checks
+- **MCP context** - Automatically skips checks in non-interactive contexts
+
+#### CLI Arguments
+- **doc_scraper.py:**
+  - Updated `--enhance-local` help text (mentions headless mode)
+  - Added `--interactive-enhancement` flag
+- **enhance_skill_local.py:**
+  - Changed default to `headless=True`
+  - Added `--interactive-enhancement` flag
+  - Added `--timeout` flag (default: 600 seconds)
+- **package_skill.py:**
+  - Added `--skip-quality-check` flag
+
+### Fixed
+
+#### Critical Bugs
+- **Enhancement race condition** - Main console no longer exits before enhancement completes
+- **MCP stdin errors** - MCP server now skips interactive prompts
+- **Terminal detection tests** - Fixed for headless mode default
+
+#### Enhancement Issues
+- **Process detachment** - subprocess.run() now waits properly instead of Popen()
+- **Timeout handling** - Added timeout protection to prevent infinite hangs
+- **Verification** - Checks file modification time and size to verify success
+- **Error messages** - Better error handling and user-friendly messages
+
+#### Test Fixes
+- **package_skill tests** - Added skip_quality_check=True to prevent stdin errors
+- **Terminal detection tests** - Updated to use headless=False for interactive tests
+- **MCP server tests** - Fixed to skip quality checks in non-interactive context
+
+### Technical Details
+
+#### New Modules
+- `src/skill_seekers/cli/quality_checker.py` - Quality validation engine
+- `tests/test_quality_checker.py` - 12 comprehensive tests
+
+#### Modified Modules
+- `src/skill_seekers/cli/enhance_skill_local.py` - Added headless mode
+- `src/skill_seekers/cli/doc_scraper.py` - Updated enhancement integration
+- `src/skill_seekers/cli/package_skill.py` - Added quality checks
+- `src/skill_seekers/mcp/server.py` - Skip quality checks in MCP context
+- `tests/test_package_skill.py` - Updated for quality checker
+- `tests/test_terminal_detection.py` - Updated for headless default
+
+#### Commits in This Release
+- `e279ed6` - Phase 1: Enhancement race condition fix (headless mode)
+- `3272f9c` - Phases 2 & 3: Quality checker implementation
+- `2dd1027` - Phase 4: Tests (+12 quality checker tests)
+- `befcb89` - CI Fix: Skip quality checks in MCP context
+- `67ab627` - CI Fix: Update terminal tests for headless default
+
+### Upgrade Notes
+
+#### Breaking Changes
+- **Headless mode default** - Enhancement now runs in background by default
+  - Use `--interactive-enhancement` if you want the old terminal mode
+  - Affects: `skill-seekers-enhance` and `skill-seekers scrape --enhance-local`
+
+#### New Behavior
+- **Quality checks** - Packaging now runs quality checks by default
+  - May prompt for confirmation if warnings/errors found
+  - Use `--skip-quality-check` to bypass (not recommended)
+
+#### Recommendations
+- **Try headless mode** - Faster and more reliable than terminal mode
+- **Review quality reports** - Fix warnings before packaging
+- **Update scripts** - Add `--skip-quality-check` to automated packaging scripts if needed
+
+### Migration Guide
+
+**If you want the old terminal mode behavior:**
+```bash
+# Old (v2.0.0): Default was terminal mode
+skill-seekers-enhance output/react/
+
+# New (v2.1.0): Use --interactive-enhancement
+skill-seekers-enhance output/react/ --interactive-enhancement
+```
+
+**If you want to skip quality checks:**
+```bash
+# Add --skip-quality-check to package command
+skill-seekers-package output/react/ --skip-quality-check
+```
+
+---
+
+## [2.0.0] - 2025-11-11
+
+### 🎉 Major Release: PyPI Publication + Modern Python Packaging
+
+**Skill Seekers is now available on PyPI!** Install with: `pip install skill-seekers`
+
+This is a major milestone release featuring complete restructuring for modern Python packaging, comprehensive testing improvements, and publication to the Python Package Index.
+
+### 🚀 Major Changes
+
+#### PyPI Publication
+- **Published to PyPI** - https://pypi.org/project/skill-seekers/
+- **Installation:** `pip install skill-seekers` or `uv tool install skill-seekers`
+- **No cloning required** - Install globally or in virtual environments
+- **Automatic dependency management** - All dependencies handled by pip/uv
+
+#### Modern Python Packaging
+- **pyproject.toml-based configuration** - Standard PEP 621 metadata
+- **src/ layout structure** - Best practice package organization
+- **Entry point scripts** - `skill-seekers` command available globally
+- **Proper dependency groups** - Separate dev, test, and MCP dependencies
+- **Build backend** - setuptools-based build with uv support
+
+#### Unified CLI Interface
+- **Single `skill-seekers` command** - Git-style subcommands
+- **Subcommands:** `scrape`, `github`, `pdf`, `unified`, `enhance`, `package`, `upload`, `estimate`
+- **Consistent interface** - All tools accessible through one entry point
+- **Help system** - Comprehensive `--help` for all commands
+
+### Added
+
+#### Testing Infrastructure
+- **379 passing tests** (up from 299) - Comprehensive test coverage
+- **0 test failures** - All tests passing successfully
+- **Test suite improvements:**
+  - Fixed import paths for src/ layout
+  - Updated CLI tests for unified entry points
+  - Added package structure verification tests
+  - Fixed MCP server import tests
+  - Added pytest configuration in pyproject.toml
+
+#### Documentation
+- **Updated README.md** - PyPI badges, reordered installation options
+- **FUTURE_RELEASES.md** - Roadmap for upcoming features
+- **Installation guides** - Simplified with PyPI as primary method
+- **Testing documentation** - How to run full test suite
+
+### Changed
+
+#### Package Structure
+- **Moved to src/ layout:**
+  - `src/skill_seekers/` - Main package
+  - `src/skill_seekers/cli/` - CLI tools
+  - `src/skill_seekers/mcp/` - MCP server
+- **Import paths updated** - All imports use proper package structure
+- **Entry points configured** - All CLI tools available as commands
+
+#### Import Fixes
+- **Fixed `merge_sources.py`** - Corrected conflict_detector import (`.conflict_detector`)
+- **Fixed MCP server tests** - Updated to use `skill_seekers.mcp.server` imports
+- **Fixed test paths** - All tests updated for src/ layout
+
+### Fixed
+
+#### Critical Bugs
+- **Import path errors** - Fixed relative imports in CLI modules
+- **MCP test isolation** - Added proper MCP availability checks
+- **Package installation** - Resolved entry point conflicts
+- **Dependency resolution** - All dependencies properly specified
+
+#### Test Improvements
+- **17 test fixes** - Updated for modern package structure
+- **MCP test guards** - Proper skipif decorators for MCP tests
+- **CLI test updates** - Accept both exit codes 0 and 2 for help
+- **Path validation** - Tests verify correct package structure
+
+### Technical Details
+
+#### Build System
+- **Build backend:** setuptools.build_meta
+- **Build command:** `uv build`
+- **Publish command:** `uv publish`
+- **Distribution formats:** wheel + source tarball
+
+#### Dependencies
+- **Core:** requests, beautifulsoup4, PyGithub, mcp, httpx
+- **PDF:** PyMuPDF, Pillow, pytesseract
+- **Dev:** pytest, pytest-cov, pytest-anyio, mypy
+- **MCP:** mcp package for Claude Code integration
+
+### Migration Guide
+
+#### For Users
+**Old way:**
+```bash
+git clone https://github.com/yusufkaraaslan/Skill_Seekers.git
+cd Skill_Seekers
+pip install -r requirements.txt
+python3 cli/doc_scraper.py --config configs/react.json
+```
+
+**New way:**
+```bash
+pip install skill-seekers
+skill-seekers scrape --config configs/react.json
+```
+
+#### For Developers
+- Update imports: `from cli.* → from skill_seekers.cli.*`
+- Use `pip install -e ".[dev]"` for development
+- Run tests: `python -m pytest`
+- Entry points instead of direct script execution
+
+### Breaking Changes
+- **CLI interface changed** - Use `skill-seekers` command instead of `python3 cli/...`
+- **Import paths changed** - Package now at `skill_seekers.*` instead of `cli.*`
+- **Installation method changed** - PyPI recommended over git clone
+
+### Deprecations
+- **Direct script execution** - Still works but deprecated (use `skill-seekers` command)
+- **Old import patterns** - Legacy imports still work but will be removed in v3.0
+
+### Compatibility
+- **Python 3.10+** required
+- **Backward compatible** - Old scripts still work with legacy CLI
+- **Config files** - No changes required
+- **Output format** - No changes to generated skills
+
+---
+
+## [1.3.0] - 2025-10-26
+
+### Added - Refactoring & Performance Improvements
+- **Async/Await Support for Parallel Scraping** (2-3x performance boost)
+  - `--async` flag to enable async mode
+  - `async def scrape_page_async()` method using httpx.AsyncClient
+  - `async def scrape_all_async()` method with asyncio.gather()
+  - Connection pooling for better performance
+  - asyncio.Semaphore for concurrency control
+  - Comprehensive async testing (11 new tests)
+  - Full documentation in ASYNC_SUPPORT.md
+  - Performance: ~55 pages/sec vs ~18 pages/sec (sync)
+  - Memory: 40 MB vs 120 MB (66% reduction)
+- **Python Package Structure** (Phase 0 Complete)
+  - `cli/__init__.py` - CLI tools package with clean imports
+  - `skill_seeker_mcp/__init__.py` - MCP server package (renamed from mcp/)
+  - `skill_seeker_mcp/tools/__init__.py` - MCP tools subpackage
+  - Proper package imports: `from cli import constants`
+- **Centralized Configuration Module**
+  - `cli/constants.py` with 18 configuration constants
+  - `DEFAULT_ASYNC_MODE`, `DEFAULT_RATE_LIMIT`, `DEFAULT_MAX_PAGES`
+  - Enhancement limits, categorization scores, file limits
+  - All magic numbers now centralized and configurable
+- **Code Quality Improvements**
+  - Converted 71 print() statements to proper logging calls
+  - Added type hints to all DocToSkillConverter methods
+  - Fixed all mypy type checking issues
+  - Installed types-requests for better type safety
+- Multi-variant llms.txt detection: downloads all 3 variants (full, standard, small)
+- Automatic .txt → .md file extension conversion
+- No content truncation: preserves complete documentation
+- `detect_all()` method for finding all llms.txt variants
+- `get_proper_filename()` for correct .md naming
+
+### Changed
+- `_try_llms_txt()` now downloads all available variants instead of just one
+- Reference files now contain complete content (no 2500 char limit)
+- Code samples now include full code (no 600 char limit)
+- Test count increased from 207 to 299 (92 new tests)
+- All print() statements replaced with logging (logger.info, logger.warning, logger.error)
+- Better IDE support with proper package structure
+- Code quality improved from 5.5/10 to 6.5/10
+
+### Fixed
+- File extension bug: llms.txt files now saved as .md
+- Content loss: 0% truncation (was 36%)
+- Test isolation issues in test_async_scraping.py (proper cleanup with try/finally)
+- Import issues: no more sys.path.insert() hacks needed
+- .gitignore: added test artifacts (.pytest_cache, .coverage, htmlcov, etc.)
+
+---
+
+## [1.2.0] - 2025-10-23
+
+### 🚀 PDF Advanced Features Release
+
+Major enhancement to PDF extraction capabilities with Priority 2 & 3 features.
+
+### Added
+
+#### Priority 2: Support More PDF Types
+- **OCR Support for Scanned PDFs**
+  - Automatic text extraction from scanned documents using Tesseract OCR
+  - Fallback mechanism when page text < 50 characters
+  - Integration with pytesseract and Pillow
+  - Command: `--ocr` flag
+  - New dependencies: `Pillow==11.0.0`, `pytesseract==0.3.13`
+
+- **Password-Protected PDF Support**
+  - Handle encrypted PDFs with password authentication
+  - Clear error messages for missing/wrong passwords
+  - Secure password handling
+  - Command: `--password PASSWORD` flag
+
+- **Complex Table Extraction**
+  - Extract tables from PDFs using PyMuPDF's table detection
+  - Capture table data as 2D arrays with metadata (bbox, row/col count)
+  - Integration with skill references in markdown format
+  - Command: `--extract-tables` flag
+
+#### Priority 3: Performance Optimizations
+- **Parallel Page Processing**
+  - 3x faster PDF extraction using ThreadPoolExecutor
+  - Auto-detect CPU count or custom worker specification
+  - Only activates for PDFs with > 5 pages
+  - Commands: `--parallel` and `--workers N` flags
+  - Benchmarks: 500-page PDF reduced from 4m 10s to 1m 15s
+
+- **Intelligent Caching**
+  - In-memory cache for expensive operations (text extraction, code detection, quality scoring)
+  - 50% faster on re-runs
+  - Command: `--no-cache` to disable (enabled by default)
+
+#### New Documentation
+- **`docs/PDF_ADVANCED_FEATURES.md`** (580 lines)
+  - Complete usage guide for all advanced features
+  - Installation instructions
+  - Performance benchmarks showing 3x speedup
+  - Best practices and troubleshooting
+  - API reference with all parameters
+
+#### Testing
+- **New test file:** `tests/test_pdf_advanced_features.py` (568 lines, 26 tests)
+  - TestOCRSupport (5 tests)
+  - TestPasswordProtection (4 tests)
+  - TestTableExtraction (5 tests)
+  - TestCaching (5 tests)
+  - TestParallelProcessing (4 tests)
+  - TestIntegration (3 tests)
+- **Updated:** `tests/test_pdf_extractor.py` (23 tests fixed and passing)
+- **Total PDF tests:** 49/49 PASSING ✅ (100% pass rate)
+
+### Changed
+- Enhanced `cli/pdf_extractor_poc.py` with all advanced features
+- Updated `requirements.txt` with new dependencies
+- Updated `README.md` with PDF advanced features usage
+- Updated `docs/TESTING.md` with new test counts (142 total tests)
+
+### Performance Improvements
+- **3.3x faster** with parallel processing (8 workers)
+- **1.7x faster** on re-runs with caching enabled
+- Support for unlimited page PDFs (no more 500-page limit)
+
+### Dependencies
+- Added `Pillow==11.0.0` for image processing
+- Added `pytesseract==0.3.13` for OCR support
+- Tesseract OCR engine (system package, optional)
+
+---
+
+## [1.1.0] - 2025-10-22
+
+### 🌐 Documentation Scraping Enhancements
+
+Major improvements to documentation scraping with unlimited pages, parallel processing, and new configs.
+
+### Added
+
+#### Unlimited Scraping & Performance
+- **Unlimited Page Scraping** - Removed 500-page limit, now supports unlimited pages
+- **Parallel Scraping Mode** - Process multiple pages simultaneously for faster scraping
+- **Dynamic Rate Limiting** - Smart rate limit control to avoid server blocks
+- **CLI Utilities** - New helper scripts for common tasks
+
+#### New Configurations
+- **Ansible Core 2.19** - Complete Ansible documentation config
+- **Claude Code** - Documentation for this very tool!
+- **Laravel 9.x** - PHP framework documentation
+
+#### Testing & Quality
+- Comprehensive test coverage for CLI utilities
+- Parallel scraping test suite
+- Virtual environment setup documentation
+- Thread-safety improvements
+
+### Fixed
+- Thread-safety issues in parallel scraping
+- CLI path references across all documentation
+- Flaky upload_skill tests
+- MCP server streaming subprocess implementation
+
+### Changed
+- All CLI examples now use `cli/` directory prefix
+- Updated documentation structure
+- Enhanced error handling
+
+---
+
+## [1.0.0] - 2025-10-19
+
+### 🎉 First Production Release
+
+This is the first production-ready release of Skill Seekers with complete feature set, full test coverage, and comprehensive documentation.
+
+### Added
+
+#### Smart Auto-Upload Feature
+- New `upload_skill.py` CLI tool for automatic API-based upload
+- Enhanced `package_skill.py` with `--upload` flag
+- Smart API key detection with graceful fallback
+- Cross-platform folder opening in `utils.py`
+- Helpful error messages instead of confusing errors
+
+#### MCP Integration Enhancements
+- **9 MCP tools** (added `upload_skill` tool)
+- `mcp__skill-seeker__upload_skill` - Upload .zip files to Claude automatically
+- Enhanced `package_skill` tool with smart auto-upload parameter
+- Updated all MCP documentation to reflect 9 tools
+
+#### Documentation Improvements
+- Updated README with version badge (v1.0.0)
+- Enhanced upload guide with 3 upload methods
+- Updated MCP setup guide with all 9 tools
+- Comprehensive test documentation (14/14 tests)
+- All references to tool counts corrected
+
+### Fixed
+- Missing `import os` in `mcp/server.py`
+- `package_skill.py` exit code behavior (now exits 0 when API key missing)
+- Improved UX with helpful messages instead of errors
+
+### Changed
+- Test count badge updated (96 → 14 passing)
+- All documentation references updated to 9 tools
+
+### Testing
+- **CLI Tests:** 8/8 PASSED ✅
+- **MCP Tests:** 6/6 PASSED ✅
+- **Total:** 14/14 PASSED (100%)
+
+---
+
+## [0.4.0] - 2025-10-18
+
+### Added
+
+#### Large Documentation Support (40K+ Pages)
+- Config splitting functionality for massive documentation sites
+- Router/hub skill generation for intelligent query routing
+- Checkpoint/resume feature for long scrapes
+- Parallel scraping support for faster processing
+- 4 split strategies: auto, category, router, size
+
+#### New CLI Tools
+- `split_config.py` - Split large configs into focused sub-skills
+- `generate_router.py` - Generate router/hub skills
+- `package_multi.py` - Package multiple skills at once
+
+#### New MCP Tools
+- `split_config` - Split large documentation via MCP
+- `generate_router` - Generate router skills via MCP
+
+#### Documentation
+- New `docs/LARGE_DOCUMENTATION.md` guide
+- Example config: `godot-large-example.json` (40K pages)
+
+### Changed
+- MCP tool count: 6 → 8 tools
+- Updated documentation for large docs workflow
+
+---
+
+## [0.3.0] - 2025-10-15
+
+### Added
+
+#### MCP Server Integration
+- Complete MCP server implementation (`mcp/server.py`)
+- 6 MCP tools for Claude Code integration:
+  - `list_configs`
+  - `generate_config`
+  - `validate_config`
+  - `estimate_pages`
+  - `scrape_docs`
+  - `package_skill`
+
+#### Setup & Configuration
+- Automated setup script (`setup_mcp.sh`)
+- MCP configuration examples
+- Comprehensive MCP setup guide (`docs/MCP_SETUP.md`)
+- MCP testing guide (`docs/TEST_MCP_IN_CLAUDE_CODE.md`)
+
+#### Testing
+- 31 comprehensive unit tests for MCP server
+- Integration tests via Claude Code MCP protocol
+- 100% test pass rate
+
+#### Documentation
+- Complete MCP integration documentation
+- Natural language usage examples
+- Troubleshooting guides
+
+### Changed
+- Restructured project as monorepo with CLI and MCP server
+- Moved CLI tools to `cli/` directory
+- Added MCP server to `mcp/` directory
+
+---
+
+## [0.2.0] - 2025-10-10
+
+### Added
+
+#### Testing & Quality
+- Comprehensive test suite with 71 tests
+- 100% test pass rate
+- Test coverage for all major features
+- Config validation tests
+
+#### Optimization
+- Page count estimator (`estimate_pages.py`)
+- Framework config optimizations with `start_urls`
+- Better URL pattern coverage
+- Improved scraping efficiency
+
+#### New Configs
+- Kubernetes documentation config
+- Tailwind CSS config
+- Astro framework config
+
+### Changed
+- Optimized all framework configs
+- Improved categorization accuracy
+- Enhanced error messages
+
+---
+
+## [0.1.0] - 2025-10-05
+
+### Added
+
+#### Initial Release
+- Basic documentation scraper functionality
+- Manual skill creation
+- Framework configs (Godot, React, Vue, Django, FastAPI)
+- Smart categorization system
+- Code language detection
+- Pattern extraction
+- Local and API-based enhancement options
+- Basic packaging functionality
+
+#### Core Features
+- BFS traversal for documentation scraping
+- CSS selector-based content extraction
+- Smart categorization with scoring
+- Code block detection and formatting
+- Caching system for scraped data
+- Interactive mode for config creation
+
+#### Documentation
+- README with quick start guide
+- Basic usage documentation
+- Configuration file examples
+
+---
+
+## Release Links
+
+- [v1.2.0](https://github.com/yusufkaraaslan/Skill_Seekers/releases/tag/v1.2.0) - PDF Advanced Features
+- [v1.1.0](https://github.com/yusufkaraaslan/Skill_Seekers/releases/tag/v1.1.0) - Documentation Scraping Enhancements
+- [v1.0.0](https://github.com/yusufkaraaslan/Skill_Seekers/releases/tag/v1.0.0) - Production Release
+- [v0.4.0](https://github.com/yusufkaraaslan/Skill_Seekers/releases/tag/v0.4.0) - Large Documentation Support
+- [v0.3.0](https://github.com/yusufkaraaslan/Skill_Seekers/releases/tag/v0.3.0) - MCP Integration
+
+---
+
+## Version History Summary
+
+| Version | Date | Highlights |
+|---------|------|------------|
+| **1.2.0** | 2025-10-23 | 📄 PDF advanced features: OCR, passwords, tables, 3x faster |
+| **1.1.0** | 2025-10-22 | 🌐 Unlimited scraping, parallel mode, new configs (Ansible, Laravel) |
+| **1.0.0** | 2025-10-19 | 🚀 Production release, auto-upload, 9 MCP tools |
+| **0.4.0** | 2025-10-18 | 📚 Large docs support (40K+ pages) |
+| **0.3.0** | 2025-10-15 | 🔌 MCP integration with Claude Code |
+| **0.2.0** | 2025-10-10 | 🧪 Testing & optimization |
+| **0.1.0** | 2025-10-05 | 🎬 Initial release |
+
+---
+
+[Unreleased]: https://github.com/yusufkaraaslan/Skill_Seekers/compare/v1.2.0...HEAD
+[1.2.0]: https://github.com/yusufkaraaslan/Skill_Seekers/compare/v1.1.0...v1.2.0
+[1.1.0]: https://github.com/yusufkaraaslan/Skill_Seekers/compare/v1.0.0...v1.1.0
+[1.0.0]: https://github.com/yusufkaraaslan/Skill_Seekers/compare/v0.4.0...v1.0.0
+[0.4.0]: https://github.com/yusufkaraaslan/Skill_Seekers/compare/v0.3.0...v0.4.0
+[0.3.0]: https://github.com/yusufkaraaslan/Skill_Seekers/compare/v0.2.0...v0.3.0
+[0.2.0]: https://github.com/yusufkaraaslan/Skill_Seekers/releases/tag/v0.2.0
+[0.1.0]: https://github.com/yusufkaraaslan/Skill_Seekers/releases/tag/v0.1.0

+ 860 - 0
libs/external/Skill_Seekers-development/CLAUDE.md

@@ -0,0 +1,860 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## 🎯 Current Status (November 30, 2025)
+
+**Version:** v2.1.1 (Production Ready - GitHub Analysis Enhanced!)
+**Active Development:** Flexible, incremental task-based approach
+
+### Recent Updates (November 2025):
+
+**🎉 MAJOR MILESTONE: Published on PyPI! (v2.0.0)**
+- **📦 PyPI Publication**: Install with `pip install skill-seekers` - https://pypi.org/project/skill-seekers/
+- **🔧 Modern Python Packaging**: pyproject.toml, src/ layout, entry points
+- **✅ CI/CD Fixed**: All 5 test matrix jobs passing (Ubuntu + macOS, Python 3.10-3.12)
+- **📚 Documentation Complete**: README, CHANGELOG, FUTURE_RELEASES.md all updated
+- **🚀 Unified CLI**: Single `skill-seekers` command with Git-style subcommands
+- **🧪 Test Coverage**: 427 tests passing (up from 391), 39% coverage
+- **🌐 Community**: GitHub Discussion, Release notes, announcements published
+
+**🚀 Unified Multi-Source Scraping (v2.0.0)**
+- **NEW**: Combine documentation + GitHub + PDF in one skill
+- **NEW**: Automatic conflict detection between docs and code
+- **NEW**: Rule-based and AI-powered merging
+- **NEW**: 5 example unified configs (React, Django, FastAPI, Godot, FastAPI-test)
+- **Status**: ✅ All 22 unified tests passing (18 core + 4 MCP integration)
+
+**✅ Community Response (H1 Group):**
+- **Issue #8 Fixed** - Added BULLETPROOF_QUICKSTART.md and TROUBLESHOOTING.md for beginners
+- **Issue #7 Fixed** - Fixed all 11 configs (Django, Laravel, Astro, Tailwind) - 100% working
+- **Issue #4 Linked** - Connected to roadmap Tasks A2/A3 (knowledge sharing + website)
+- **PR #5 Reviewed** - Approved anchor stripping feature (security verified, 32/32 tests pass)
+- **MCP Setup Fixed** - Path expansion bug resolved in setup_mcp.sh
+
+**📦 Configs Status:**
+- ✅ **24 total configs available** (including unified configs)
+- ✅ 5 unified configs added (React, Django, FastAPI, Godot, FastAPI-test)
+- ✅ Core selectors tested and validated
+- 📝 Single-source configs: ansible-core, astro, claude-code, django, fastapi, godot, godot-large-example, hono, kubernetes, laravel, react, steam-economy-complete, tailwind, vue
+- 📝 Multi-source configs: django_unified, fastapi_unified, fastapi_unified_test, godot_unified, react_unified
+- 📝 Test/Example configs: godot_github, react_github, python-tutorial-test, example_pdf, test-manual
+
+**📋 Completed (November 29, 2025):**
+- **✅ DONE**: PyPI publication complete (v2.0.0)
+- **✅ DONE**: CI/CD fixed - all checks passing
+- **✅ DONE**: Documentation updated (README, CHANGELOG, FUTURE_RELEASES.md)
+- **✅ DONE**: Quality Assurance + Race Condition Fixes (v2.1.0)
+- **✅ DONE**: All critical bugs fixed (Issues #190, #192, #193)
+- **✅ DONE**: Test suite stabilized (427 tests passing)
+- **✅ DONE**: Unified tests fixed (all 22 passing)
+- **✅ DONE**: PR #195 merged - Unlimited local repository analysis
+- **✅ DONE**: PR #198 merged - Skip llms.txt config option
+- **✅ DONE**: Issue #203 - Configurable EXCLUDED_DIRS (19 tests, 2 commits)
+
+**📋 Next Up (Post-v2.1.0):**
+- **Priority 1**: Review open PRs (#187, #186)
+- **Priority 2**: Issue #202 - Add warning for missing local_repo_path
+- **Priority 3**: Task H1.3 - Create example project folder
+- **Priority 4**: Task A3.1 - GitHub Pages site (skillseekersweb.com)
+
+**📊 Roadmap Progress:**
+- 134 tasks organized into 22 feature groups
+- Project board: https://github.com/users/yusufkaraaslan/projects/2
+- See [FLEXIBLE_ROADMAP.md](FLEXIBLE_ROADMAP.md) for complete task list
+
+---
+
+## 🔌 MCP Integration Available
+
+**This repository includes a fully tested MCP server with 9 tools:**
+- `mcp__skill-seeker__list_configs` - List all available preset configurations
+- `mcp__skill-seeker__generate_config` - Generate a new config file for any docs site
+- `mcp__skill-seeker__validate_config` - Validate a config file structure
+- `mcp__skill-seeker__estimate_pages` - Estimate page count before scraping
+- `mcp__skill-seeker__scrape_docs` - Scrape and build a skill
+- `mcp__skill-seeker__package_skill` - Package skill into .zip file (with auto-upload)
+- `mcp__skill-seeker__upload_skill` - Upload .zip to Claude (NEW)
+- `mcp__skill-seeker__split_config` - Split large documentation configs
+- `mcp__skill-seeker__generate_router` - Generate router/hub skills
+
+**Setup:** See [docs/MCP_SETUP.md](docs/MCP_SETUP.md) or run `./setup_mcp.sh`
+
+**Status:** ✅ Tested and working in production with Claude Code
+
+## Overview
+
+Skill Seeker automatically converts any documentation website into a Claude AI skill. It scrapes documentation, organizes content, extracts code patterns, and packages everything into an uploadable `.zip` file for Claude.
+
+## Prerequisites
+
+**Python Version:** Python 3.10 or higher (required for MCP integration)
+
+**Installation:**
+
+### Option 1: Install from PyPI (Recommended - Easiest!)
+```bash
+# Install globally or in virtual environment
+pip install skill-seekers
+
+# Use the unified CLI immediately
+skill-seekers scrape --config configs/react.json
+skill-seekers --help
+```
+
+### Option 2: Install from Source (For Development)
+```bash
+# Clone the repository
+git clone https://github.com/yusufkaraaslan/Skill_Seekers.git
+cd Skill_Seekers
+
+# Create virtual environment
+python3 -m venv venv
+source venv/bin/activate  # macOS/Linux (Windows: venv\Scripts\activate)
+
+# Install in editable mode
+pip install -e .
+
+# Or install dependencies manually
+pip install -r requirements.txt
+```
+
+**Why use a virtual environment?**
+- Keeps dependencies isolated from system Python
+- Prevents package version conflicts
+- Standard Python development practice
+- Required for running tests with pytest
+
+**Optional (for API-based enhancement):**
+```bash
+pip install anthropic
+export ANTHROPIC_API_KEY=sk-ant-...
+```
+
+## Core Commands
+
+### Quick Start - Use a Preset
+
+```bash
+# Single-source scraping (documentation only)
+skill-seekers scrape --config configs/godot.json
+skill-seekers scrape --config configs/react.json
+skill-seekers scrape --config configs/vue.json
+skill-seekers scrape --config configs/django.json
+skill-seekers scrape --config configs/laravel.json
+skill-seekers scrape --config configs/fastapi.json
+```
+
+### Unified Multi-Source Scraping (**NEW - v2.0.0**)
+
+```bash
+# Combine documentation + GitHub + PDF in one skill
+skill-seekers unified --config configs/react_unified.json
+skill-seekers unified --config configs/django_unified.json
+skill-seekers unified --config configs/fastapi_unified.json
+skill-seekers unified --config configs/godot_unified.json
+
+# Override merge mode
+skill-seekers unified --config configs/react_unified.json --merge-mode claude-enhanced
+
+# Result: One comprehensive skill with conflict detection
+```
+
+**What makes it special:**
+- ✅ Detects discrepancies between documentation and code
+- ✅ Shows both versions side-by-side with ⚠️ warnings
+- ✅ Identifies outdated docs and undocumented features
+- ✅ Single source of truth showing intent (docs) AND reality (code)
+
+**See full guide:** [docs/UNIFIED_SCRAPING.md](docs/UNIFIED_SCRAPING.md)
+
+### First-Time User Workflow (Recommended)
+
+```bash
+# 1. Install from PyPI (one-time, easiest!)
+pip install skill-seekers
+
+# 2. Estimate page count BEFORE scraping (fast, no data download)
+skill-seekers estimate configs/godot.json
+# Time: ~1-2 minutes, shows estimated total pages and recommended max_pages
+
+# 3. Scrape with local enhancement (uses Claude Code Max, no API key)
+skill-seekers scrape --config configs/godot.json --enhance-local
+# Time: 20-40 minutes scraping + 60 seconds enhancement
+
+# 4. Package the skill
+skill-seekers package output/godot/
+
+# Result: godot.zip ready to upload to Claude
+```
+
+### Interactive Mode
+
+```bash
+# Step-by-step configuration wizard
+skill-seekers scrape --interactive
+```
+
+### Quick Mode (Minimal Config)
+
+```bash
+# Create skill from any documentation URL
+skill-seekers scrape --name react --url https://react.dev/ --description "React framework for UIs"
+```
+
+### Skip Scraping (Use Cached Data)
+
+```bash
+# Fast rebuild using previously scraped data
+skill-seekers scrape --config configs/godot.json --skip-scrape
+# Time: 1-3 minutes (instant rebuild)
+```
+
+### Async Mode (2-3x Faster Scraping)
+
+```bash
+# Enable async mode with 8 workers for best performance
+skill-seekers scrape --config configs/react.json --async --workers 8
+
+# Quick mode with async
+skill-seekers scrape --name react --url https://react.dev/ --async --workers 8
+
+# Dry run with async to test
+skill-seekers scrape --config configs/godot.json --async --workers 4 --dry-run
+```
+
+**Recommended Settings:**
+- Small docs (~100-500 pages): `--async --workers 4`
+- Medium docs (~500-2000 pages): `--async --workers 8`
+- Large docs (2000+ pages): `--async --workers 8 --no-rate-limit`
+
+**Performance:**
+- Sync: ~18 pages/sec, 120 MB memory
+- Async: ~55 pages/sec, 40 MB memory (3x faster!)
+
+**See full guide:** [ASYNC_SUPPORT.md](ASYNC_SUPPORT.md)
+
+### Enhancement Options
+
+**LOCAL Enhancement (Recommended - No API Key Required):**
+```bash
+# During scraping
+skill-seekers scrape --config configs/react.json --enhance-local
+
+# Standalone after scraping
+skill-seekers enhance output/react/
+```
+
+**API Enhancement (Alternative - Requires API Key):**
+```bash
+# During scraping
+skill-seekers scrape --config configs/react.json --enhance
+
+# Standalone after scraping
+skill-seekers-enhance output/react/
+skill-seekers-enhance output/react/ --api-key sk-ant-...
+```
+
+### Package and Upload the Skill
+
+```bash
+# Package skill (opens folder, shows upload instructions)
+skill-seekers package output/godot/
+# Result: output/godot.zip
+
+# Package and auto-upload (requires ANTHROPIC_API_KEY)
+export ANTHROPIC_API_KEY=sk-ant-...
+skill-seekers package output/godot/ --upload
+
+# Upload existing .zip
+skill-seekers upload output/godot.zip
+
+# Package without opening folder
+skill-seekers package output/godot/ --no-open
+```
+
+### Force Re-scrape
+
+```bash
+# Delete cached data and re-scrape from scratch
+rm -rf output/godot_data/
+skill-seekers scrape --config configs/godot.json
+```
+
+### Estimate Page Count (Before Scraping)
+
+```bash
+# Quick estimation - discover up to 100 pages
+skill-seekers estimate configs/react.json --max-discovery 100
+# Time: ~30-60 seconds
+
+# Full estimation - discover up to 1000 pages (default)
+skill-seekers estimate configs/godot.json
+# Time: ~1-2 minutes
+
+# Deep estimation - discover up to 2000 pages
+skill-seekers estimate configs/vue.json --max-discovery 2000
+# Time: ~3-5 minutes
+
+# What it shows:
+# - Estimated total pages
+# - Recommended max_pages value
+# - Estimated scraping time
+# - Discovery rate (pages/sec)
+```
+
+**Why use estimation:**
+- Validates config URL patterns before full scrape
+- Helps set optimal `max_pages` value
+- Estimates total scraping time
+- Fast (only HEAD requests + minimal parsing)
+- No data downloaded or stored
+
+## Repository Architecture
+
+### File Structure (v2.0.0 - Modern Python Packaging)
+
+```
+Skill_Seekers/
+├── pyproject.toml              # Modern Python package configuration (PEP 621)
+├── src/                        # Source code (src/ layout best practice)
+│   └── skill_seekers/
+│       ├── __init__.py
+│       ├── cli/                # CLI tools (entry points)
+│       │   ├── doc_scraper.py      # Main scraper (~790 lines)
+│       │   ├── estimate_pages.py   # Page count estimator
+│       │   ├── enhance_skill.py    # AI enhancement (API-based)
+│       │   ├── package_skill.py    # Skill packager
+│       │   ├── github_scraper.py   # GitHub scraper
+│       │   ├── pdf_scraper.py      # PDF scraper
+│       │   ├── unified_scraper.py  # Unified multi-source scraper
+│       │   ├── merge_sources.py    # Source merger
+│       │   └── conflict_detector.py # Conflict detection
+│       └── mcp/                # MCP server integration
+│           └── server.py
+├── tests/                      # Test suite (391 tests passing)
+│   ├── test_scraper_features.py
+│   ├── test_config_validation.py
+│   ├── test_integration.py
+│   ├── test_mcp_server.py
+│   ├── test_unified.py         # Unified scraping tests (18 tests)
+│   ├── test_unified_mcp_integration.py  # (4 tests)
+│   └── ...
+├── configs/                    # Preset configurations (24 configs)
+│   ├── godot.json
+│   ├── react.json
+│   ├── django_unified.json     # Multi-source configs
+│   └── ...
+├── docs/                       # Documentation
+│   ├── CLAUDE.md               # This file
+│   ├── ENHANCEMENT.md          # Enhancement guide
+│   ├── UPLOAD_GUIDE.md         # Upload instructions
+│   └── UNIFIED_SCRAPING.md     # Unified scraping guide
+├── README.md                   # User documentation
+├── CHANGELOG.md                # Release history
+├── FUTURE_RELEASES.md          # Roadmap
+└── output/                     # Generated output (git-ignored)
+    ├── {name}_data/            # Scraped raw data (cached)
+    │   ├── pages/*.json        # Individual page data
+    │   └── summary.json        # Scraping summary
+    └── {name}/                 # Built skill directory
+        ├── SKILL.md            # Main skill file
+        ├── SKILL.md.backup     # Backup (if enhanced)
+        ├── references/         # Categorized documentation
+        │   ├── index.md
+        │   ├── getting_started.md
+        │   ├── api.md
+        │   └── ...
+        ├── scripts/            # Empty (user scripts)
+        └── assets/             # Empty (user assets)
+```
+
+**Key Changes in v2.0.0:**
+- **src/ layout**: Modern Python packaging structure
+- **pyproject.toml**: PEP 621 compliant configuration
+- **Entry points**: `skill-seekers` CLI with subcommands
+- **Published to PyPI**: `pip install skill-seekers`
+
+### Data Flow
+
+1. **Scrape Phase** (`scrape_all()` in src/skill_seekers/cli/doc_scraper.py):
+   - Input: Config JSON (name, base_url, selectors, url_patterns, categories)
+   - Process: BFS traversal from base_url, respecting include/exclude patterns
+   - Output: `output/{name}_data/pages/*.json` + `summary.json`
+
+2. **Build Phase** (`build_skill()` in src/skill_seekers/cli/doc_scraper.py):
+   - Input: Scraped JSON data from `output/{name}_data/`
+   - Process: Load pages → Smart categorize → Extract patterns → Generate references
+   - Output: `output/{name}/SKILL.md` + `output/{name}/references/*.md`
+
+3. **Enhancement Phase** (optional via enhance_skill.py or enhance_skill_local.py):
+   - Input: Built skill directory with references
+   - Process: Claude analyzes references and rewrites SKILL.md
+   - Output: Enhanced SKILL.md with real examples and guidance
+
+4. **Package Phase** (via package_skill.py):
+   - Input: Skill directory
+   - Process: Zip all files (excluding .backup)
+   - Output: `{name}.zip`
+
+5. **Upload Phase** (optional via upload_skill.py):
+   - Input: Skill .zip file
+   - Process: Upload to Claude AI via API
+   - Output: Skill available in Claude
+
+### Configuration File Structure
+
+Config files (`configs/*.json`) define scraping behavior:
+
+```json
+{
+  "name": "godot",
+  "description": "When to use this skill",
+  "base_url": "https://docs.godotengine.org/en/stable/",
+  "selectors": {
+    "main_content": "div[role='main']",
+    "title": "title",
+    "code_blocks": "pre"
+  },
+  "url_patterns": {
+    "include": [],
+    "exclude": ["/search.html", "/_static/"]
+  },
+  "categories": {
+    "getting_started": ["introduction", "getting_started"],
+    "scripting": ["scripting", "gdscript"],
+    "api": ["api", "reference", "class"]
+  },
+  "rate_limit": 0.5,
+  "max_pages": 500
+}
+```
+
+**Config Parameters:**
+- `name`: Skill identifier (output directory name)
+- `description`: When Claude should use this skill
+- `base_url`: Starting URL for scraping
+- `selectors.main_content`: CSS selector for main content (common: `article`, `main`, `div[role="main"]`)
+- `selectors.title`: CSS selector for page title
+- `selectors.code_blocks`: CSS selector for code samples
+- `url_patterns.include`: Only scrape URLs containing these patterns
+- `url_patterns.exclude`: Skip URLs containing these patterns
+- `categories`: Keyword mapping for categorization
+- `rate_limit`: Delay between requests (seconds)
+- `max_pages`: Maximum pages to scrape
+- `skip_llms_txt`: Skip llms.txt detection, force HTML scraping (default: false)
+- `exclude_dirs_additional`: Add custom directories to default exclusions (for local repo analysis)
+- `exclude_dirs`: Replace default directory exclusions entirely (advanced, for local repo analysis)
+
+## Key Features & Implementation
+
+### Auto-Detect Existing Data
+Tool checks for `output/{name}_data/` and prompts to reuse, avoiding re-scraping (check_existing_data() in doc_scraper.py:653-660).
+
+### Configurable Directory Exclusions (Local Repository Analysis)
+
+When using `local_repo_path` for unlimited local repository analysis, you can customize which directories to exclude from analysis.
+
+**Smart Defaults:**
+Automatically excludes common directories: `venv`, `node_modules`, `__pycache__`, `.git`, `build`, `dist`, `.pytest_cache`, `htmlcov`, `.tox`, `.mypy_cache`, etc.
+
+**Extend Mode** (`exclude_dirs_additional`): Add custom exclusions to defaults
+```json
+{
+  "sources": [{
+    "type": "github",
+    "local_repo_path": "/path/to/repo",
+    "exclude_dirs_additional": ["proprietary", "legacy", "third_party"]
+  }]
+}
+```
+
+**Replace Mode** (`exclude_dirs`): Override defaults entirely (advanced)
+```json
+{
+  "sources": [{
+    "type": "github",
+    "local_repo_path": "/path/to/repo",
+    "exclude_dirs": ["node_modules", ".git", "custom_vendor"]
+  }]
+}
+```
+
+**Use Cases:**
+- Monorepos with custom directory structures
+- Enterprise projects with non-standard naming
+- Including unusual directories (e.g., analyzing venv code)
+- Minimal exclusions for small/simple projects
+
+See: `should_exclude_dir()` in github_scraper.py:304-306
+
+### Language Detection
+Detects code languages from:
+1. CSS class attributes (`language-*`, `lang-*`)
+2. Heuristics (keywords like `def`, `const`, `func`, etc.)
+
+See: `detect_language()` in doc_scraper.py:135-165
+
+### Pattern Extraction
+Looks for "Example:", "Pattern:", "Usage:" markers in content and extracts following code blocks (up to 5 per page).
+
+See: `extract_patterns()` in doc_scraper.py:167-183
+
+### Smart Categorization
+- Scores pages against category keywords (3 points for URL match, 2 for title, 1 for content)
+- Threshold of 2+ for categorization
+- Auto-infers categories from URL segments if none provided
+- Falls back to "other" category
+
+See: `smart_categorize()` and `infer_categories()` in doc_scraper.py:282-351
+
+### Enhanced SKILL.md Generation
+Generated with:
+- Real code examples from documentation (language-annotated)
+- Quick reference patterns extracted from docs
+- Common pattern section
+- Category file listings
+
+See: `create_enhanced_skill_md()` in doc_scraper.py:426-542
+
+## Common Workflows
+
+### First Time (With Scraping + Enhancement)
+
+```bash
+# 1. Scrape + Build + AI Enhancement (LOCAL, no API key)
+skill-seekers scrape --config configs/godot.json --enhance-local
+
+# 2. Wait for enhancement terminal to close (~60 seconds)
+
+# 3. Verify quality
+cat output/godot/SKILL.md
+
+# 4. Package
+skill-seekers package output/godot/
+
+# Result: godot.zip ready for Claude
+# Time: 20-40 minutes (scraping) + 60 seconds (enhancement)
+```
+
+### Using Cached Data (Fast Iteration)
+
+```bash
+# 1. Use existing data + Local Enhancement
+skill-seekers scrape --config configs/godot.json --skip-scrape
+skill-seekers enhance output/godot/
+
+# 2. Package
+skill-seekers package output/godot/
+
+# Time: 1-3 minutes (build) + 60 seconds (enhancement)
+```
+
+### Without Enhancement (Basic)
+
+```bash
+# 1. Scrape + Build (no enhancement)
+skill-seekers scrape --config configs/godot.json
+
+# 2. Package
+skill-seekers package output/godot/
+
+# Note: SKILL.md will be basic template - enhancement recommended
+# Time: 20-40 minutes
+```
+
+### Creating a New Framework Config
+
+**Option 1: Interactive**
+```bash
+skill-seekers scrape --interactive
+# Follow prompts, it creates the config for you
+```
+
+**Option 2: Copy and Modify**
+```bash
+# Copy a preset
+cp configs/react.json configs/myframework.json
+
+# Edit it
+nano configs/myframework.json
+
+# Test with limited pages first
+# Set "max_pages": 20 in config
+
+# Use it
+skill-seekers scrape --config configs/myframework.json
+```
+
+## Testing & Verification
+
+### Finding the Right CSS Selectors
+
+Before creating a config, test selectors with BeautifulSoup:
+
+```python
+from bs4 import BeautifulSoup
+import requests
+
+url = "https://docs.example.com/page"
+soup = BeautifulSoup(requests.get(url).content, 'html.parser')
+
+# Try different selectors
+print(soup.select_one('article'))
+print(soup.select_one('main'))
+print(soup.select_one('div[role="main"]'))
+print(soup.select_one('div.content'))
+
+# Test code block selector
+print(soup.select('pre code'))
+print(soup.select('pre'))
+```
+
+### Verify Output Quality
+
+After building, verify the skill quality:
+
+```bash
+# Check SKILL.md has real examples
+cat output/godot/SKILL.md
+
+# Check category structure
+cat output/godot/references/index.md
+
+# List all reference files
+ls output/godot/references/
+
+# Check specific category content
+cat output/godot/references/getting_started.md
+
+# Verify code samples have language detection
+grep -A 3 "```" output/godot/references/*.md | head -20
+```
+
+### Test with Limited Pages
+
+For faster testing, edit config to limit pages:
+
+```json
+{
+  "max_pages": 20  // Test with just 20 pages
+}
+```
+
+## Troubleshooting
+
+### No Content Extracted
+**Problem:** Pages scraped but content is empty
+
+**Solution:** Check `main_content` selector in config. Try:
+- `article`
+- `main`
+- `div[role="main"]`
+- `div.content`
+
+Use the BeautifulSoup testing approach above to find the right selector.
+
+### Poor Categorization
+**Problem:** Pages not categorized well
+
+**Solution:** Edit `categories` section in config with better keywords specific to the documentation structure. Check URL patterns in scraped data:
+
+```bash
+# See what URLs were scraped
+cat output/godot_data/summary.json | grep url | head -20
+```
+
+### Data Exists But Won't Use It
+**Problem:** Tool won't reuse existing data
+
+**Solution:** Force re-scrape:
+```bash
+rm -rf output/myframework_data/
+skill-seekers scrape --config configs/myframework.json
+```
+
+### Rate Limiting Issues
+**Problem:** Getting rate limited or blocked by documentation server
+
+**Solution:** Increase `rate_limit` value in config:
+```json
+{
+  "rate_limit": 1.0  // Change from 0.5 to 1.0 seconds
+}
+```
+
+### Package Path Error
+**Problem:** doc_scraper.py shows wrong cli/package_skill.py path
+
+**Expected output:**
+```bash
+skill-seekers package output/godot/
+```
+
+**Not:**
+```bash
+python3 /mnt/skills/examples/skill-creator/scripts/cli/package_skill.py output/godot/
+```
+
+The correct command uses the local `cli/package_skill.py` in the repository root.
+
+## Key Code Locations (v2.0.0)
+
+**Documentation Scraper** (`src/skill_seekers/cli/doc_scraper.py`):
+- **URL validation**: `is_valid_url()`
+- **Content extraction**: `extract_content()`
+- **Language detection**: `detect_language()`
+- **Pattern extraction**: `extract_patterns()`
+- **Smart categorization**: `smart_categorize()`
+- **Category inference**: `infer_categories()`
+- **Quick reference generation**: `generate_quick_reference()`
+- **SKILL.md generation**: `create_enhanced_skill_md()`
+- **Scraping loop**: `scrape_all()`
+- **Main workflow**: `main()`
+
+**Other Key Files**:
+- **GitHub scraper**: `src/skill_seekers/cli/github_scraper.py`
+- **PDF scraper**: `src/skill_seekers/cli/pdf_scraper.py`
+- **Unified scraper**: `src/skill_seekers/cli/unified_scraper.py`
+- **Conflict detection**: `src/skill_seekers/cli/conflict_detector.py`
+- **Source merger**: `src/skill_seekers/cli/merge_sources.py`
+- **Package tool**: `src/skill_seekers/cli/package_skill.py`
+- **Upload tool**: `src/skill_seekers/cli/upload_skill.py`
+- **MCP server**: `src/skill_seekers/mcp/server.py`
+- **Entry points**: `pyproject.toml` (project.scripts section)
+
+## Enhancement Details
+
+### LOCAL Enhancement (Recommended)
+- Uses your Claude Code Max plan (no API costs)
+- Opens new terminal with Claude Code
+- Analyzes reference files automatically
+- Takes 30-60 seconds
+- Quality: 9/10 (comparable to API version)
+- Backs up original SKILL.md to SKILL.md.backup
+
+### API Enhancement (Alternative)
+- Uses Anthropic API (~$0.15-$0.30 per skill)
+- Requires ANTHROPIC_API_KEY
+- Same quality as LOCAL
+- Faster (no terminal launch)
+- Better for automation/CI
+
+**What Enhancement Does:**
+1. Reads reference documentation files
+2. Analyzes content with Claude
+3. Extracts 5-10 best code examples
+4. Creates comprehensive quick reference
+5. Adds domain-specific key concepts
+6. Provides navigation guidance for different skill levels
+7. Transforms 75-line templates into 500+ line comprehensive guides
+
+## Performance
+
+| Task | Time | Notes |
+|------|------|-------|
+| Scraping | 15-45 min | First time only |
+| Building | 1-3 min | Fast! |
+| Re-building | <1 min | With --skip-scrape |
+| Enhancement (LOCAL) | 30-60 sec | Uses Claude Code Max |
+| Enhancement (API) | 20-40 sec | Requires API key |
+| Packaging | 5-10 sec | Final zip |
+
+## Available Configs (24 Total)
+
+### Single-Source Documentation Configs (14 configs)
+
+**Web Frameworks:**
+- ✅ `react.json` - React (article selector, 7,102 chars)
+- ✅ `vue.json` - Vue.js (main selector, 1,029 chars)
+- ✅ `astro.json` - Astro (article selector, 145 chars)
+- ✅ `django.json` - Django (article selector, 6,468 chars)
+- ✅ `laravel.json` - Laravel 9.x (#main-content selector, 16,131 chars)
+- ✅ `fastapi.json` - FastAPI (article selector, 11,906 chars)
+- ✅ `hono.json` - Hono web framework **NEW!**
+
+**DevOps & Automation:**
+- ✅ `ansible-core.json` - Ansible Core 2.19 (div[role='main'] selector, ~32K chars)
+- ✅ `kubernetes.json` - Kubernetes (main selector, 2,100 chars)
+
+**Game Engines:**
+- ✅ `godot.json` - Godot (div[role='main'] selector, 1,688 chars)
+- ✅ `godot-large-example.json` - Godot large docs example
+
+**CSS & Utilities:**
+- ✅ `tailwind.json` - Tailwind CSS (div.prose selector, 195 chars)
+
+**Gaming:**
+- ✅ `steam-economy-complete.json` - Steam Economy (div.documentation_bbcode, 588 chars)
+
+**Development Tools:**
+- ✅ `claude-code.json` - Claude Code documentation **NEW!**
+
+### Unified Multi-Source Configs (5 configs - **NEW v2.0!**)
+- ✅ `react_unified.json` - React (docs + GitHub + code analysis)
+- ✅ `django_unified.json` - Django (docs + GitHub + code analysis)
+- ✅ `fastapi_unified.json` - FastAPI (docs + GitHub + code analysis)
+- ✅ `fastapi_unified_test.json` - FastAPI test config
+- ✅ `godot_unified.json` - Godot (docs + GitHub + code analysis)
+
+### Test/Example Configs (5 configs)
+- 📝 `godot_github.json` - GitHub-only scraping example
+- 📝 `react_github.json` - GitHub-only scraping example
+- 📝 `python-tutorial-test.json` - Python tutorial test
+- 📝 `example_pdf.json` - PDF extraction example
+- 📝 `test-manual.json` - Manual testing config
+
+**Note:** All configs verified and working! Unified configs fully tested with 22 passing tests.
+**Last verified:** November 29, 2025 (Post-v2.1.0 bug fixes)
+
+## Additional Documentation
+
+**User Guides:**
+- **[README.md](README.md)** - Complete user documentation
+- **[BULLETPROOF_QUICKSTART.md](BULLETPROOF_QUICKSTART.md)** - Complete beginner guide
+- **[QUICKSTART.md](QUICKSTART.md)** - Get started in 3 steps
+- **[TROUBLESHOOTING.md](TROUBLESHOOTING.md)** - Comprehensive troubleshooting
+
+**Technical Documentation:**
+- **[docs/CLAUDE.md](docs/CLAUDE.md)** - Detailed technical architecture
+- **[docs/ENHANCEMENT.md](docs/ENHANCEMENT.md)** - AI enhancement guide
+- **[docs/UPLOAD_GUIDE.md](docs/UPLOAD_GUIDE.md)** - How to upload skills to Claude
+- **[docs/UNIFIED_SCRAPING.md](docs/UNIFIED_SCRAPING.md)** - Multi-source scraping guide
+- **[docs/MCP_SETUP.md](docs/MCP_SETUP.md)** - MCP server setup
+
+**Project Planning:**
+- **[CHANGELOG.md](CHANGELOG.md)** - Release history and v2.0.0 details **UPDATED!**
+- **[FUTURE_RELEASES.md](FUTURE_RELEASES.md)** - Roadmap for v2.1.0+  **NEW!**
+- **[FLEXIBLE_ROADMAP.md](FLEXIBLE_ROADMAP.md)** - Complete task catalog (134 tasks)
+- **[NEXT_TASKS.md](NEXT_TASKS.md)** - What to work on next
+- **[TODO.md](TODO.md)** - Current focus
+- **[STRUCTURE.md](STRUCTURE.md)** - Repository structure
+
+## Notes for Claude Code
+
+**Project Status (v2.0.0):**
+- ✅ **Published on PyPI**: Install with `pip install skill-seekers`
+- ✅ **Modern Python Packaging**: pyproject.toml, src/ layout, entry points
+- ✅ **Unified CLI**: Single `skill-seekers` command with Git-style subcommands
+- ✅ **CI/CD Working**: All 5 test matrix jobs passing (Ubuntu + macOS, Python 3.10-3.12)
+- ✅ **Test Coverage**: 391 tests passing, 39% coverage
+- ✅ **Documentation**: Complete user and technical documentation
+
+**Architecture:**
+- **Python-based documentation scraper** with multi-source support
+- **Main scraper**: `src/skill_seekers/cli/doc_scraper.py` (~790 lines)
+- **Unified scraping**: Combines docs + GitHub + PDF with conflict detection
+- **Modern packaging**: PEP 621 compliant with proper dependency management
+- **MCP Integration**: 9 tools for Claude Code Max integration
+
+**Development Workflow:**
+1. **Install**: `pip install -e .` (editable mode for development)
+2. **Run tests**: `pytest tests/` (391 tests)
+3. **Build package**: `uv build` or `python -m build`
+4. **Publish**: `uv publish` (PyPI)
+
+**Key Points:**
+- Output is cached and reusable in `output/` (git-ignored)
+- Enhancement is optional but highly recommended
+- All 24 configs are working and tested
+- CI workflow requires `pip install -e .` to install package before running tests

+ 432 - 0
libs/external/Skill_Seekers-development/CONTRIBUTING.md

@@ -0,0 +1,432 @@
+# Contributing to Skill Seeker
+
+First off, thank you for considering contributing to Skill Seeker! It's people like you that make Skill Seeker such a great tool.
+
+## Table of Contents
+
+- [Branch Workflow](#branch-workflow)
+- [Code of Conduct](#code-of-conduct)
+- [How Can I Contribute?](#how-can-i-contribute)
+- [Development Setup](#development-setup)
+- [Pull Request Process](#pull-request-process)
+- [Coding Standards](#coding-standards)
+- [Testing](#testing)
+- [Documentation](#documentation)
+
+---
+
+## Branch Workflow
+
+**⚠️ IMPORTANT:** Skill Seekers uses a two-branch workflow.
+
+### Branch Structure
+
+```
+main (production)
+  ↑
+  │ (only maintainer merges)
+  │
+development (integration) ← default branch for PRs
+  ↑
+  │ (all contributor PRs go here)
+  │
+feature branches
+```
+
+### Branches
+
+- **`main`** - Production branch
+  - Always stable
+  - Only receives merges from `development` by maintainers
+  - Protected: requires tests + 1 review
+
+- **`development`** - Integration branch
+  - **Default branch for all PRs**
+  - Active development happens here
+  - Protected: requires tests to pass
+  - Gets merged to `main` by maintainers
+
+- **Feature branches** - Your work
+  - Created from `development`
+  - Named descriptively (e.g., `add-github-scraping`)
+  - Merged back to `development` via PR
+
+### Workflow Example
+
+```bash
+# 1. Fork and clone
+git clone https://github.com/YOUR_USERNAME/Skill_Seekers.git
+cd Skill_Seekers
+
+# 2. Add upstream
+git remote add upstream https://github.com/yusufkaraaslan/Skill_Seekers.git
+
+# 3. Create feature branch from development
+git checkout development
+git pull upstream development
+git checkout -b my-feature
+
+# 4. Make changes, commit, push
+git add .
+git commit -m "Add my feature"
+git push origin my-feature
+
+# 5. Create PR targeting 'development' branch
+```
+
+---
+
+## Code of Conduct
+
+This project and everyone participating in it is governed by our commitment to fostering an open and welcoming environment. Please be respectful and constructive in all interactions.
+
+---
+
+## How Can I Contribute?
+
+### Reporting Bugs
+
+Before creating bug reports, please check the [existing issues](https://github.com/yusufkaraaslan/Skill_Seekers/issues) to avoid duplicates.
+
+When creating a bug report, include:
+- **Clear title and description**
+- **Steps to reproduce** the issue
+- **Expected behavior** vs actual behavior
+- **Screenshots** if applicable
+- **Environment details** (OS, Python version, etc.)
+- **Error messages** and stack traces
+
+**Example:**
+```markdown
+**Bug:** MCP tool fails when config has no categories
+
+**Steps to Reproduce:**
+1. Create config with empty categories: `"categories": {}`
+2. Run `python3 cli/doc_scraper.py --config configs/test.json`
+3. See error
+
+**Expected:** Should use auto-inferred categories
+**Actual:** Crashes with KeyError
+
+**Environment:**
+- OS: Ubuntu 22.04
+- Python: 3.10.5
+- Version: 1.0.0
+```
+
+### Suggesting Enhancements
+
+Enhancement suggestions are tracked as [GitHub issues](https://github.com/yusufkaraaslan/Skill_Seekers/issues).
+
+Include:
+- **Clear title** describing the enhancement
+- **Detailed description** of the proposed functionality
+- **Use cases** that would benefit from this enhancement
+- **Examples** of how it would work
+- **Alternatives considered**
+
+### Adding New Framework Configs
+
+We welcome new framework configurations! To add one:
+
+1. Create a config file in `configs/`
+2. Test it thoroughly with different page counts
+3. Submit a PR with:
+   - The config file
+   - Brief description of the framework
+   - Test results (number of pages scraped, categories found)
+
+**Example PR:**
+```markdown
+**Add Svelte Documentation Config**
+
+Adds configuration for Svelte documentation (https://svelte.dev/docs).
+
+- Config: `configs/svelte.json`
+- Tested with max_pages: 100
+- Successfully categorized: getting_started, components, api, advanced
+- Total pages available: ~150
+```
+
+### Pull Requests
+
+We actively welcome your pull requests!
+
+**⚠️ IMPORTANT:** All PRs must target the `development` branch, not `main`.
+
+1. Fork the repo and create your branch from `development`
+2. If you've added code, add tests
+3. If you've changed APIs, update the documentation
+4. Ensure the test suite passes
+5. Make sure your code follows our coding standards
+6. Issue that pull request to `development` branch!
+
+---
+
+## Development Setup
+
+### Prerequisites
+
+- Python 3.10 or higher (required for MCP integration)
+- Git
+
+### Setup Steps
+
+1. **Fork and clone the repository**
+   ```bash
+   git clone https://github.com/YOUR_USERNAME/Skill_Seekers.git
+   cd Skill_Seekers
+   ```
+
+2. **Install dependencies**
+   ```bash
+   pip install requests beautifulsoup4
+   pip install pytest pytest-cov
+   pip install -r mcp/requirements.txt
+   ```
+
+3. **Create a feature branch from development**
+   ```bash
+   git checkout development
+   git pull upstream development
+   git checkout -b feature/my-awesome-feature
+   ```
+
+4. **Make your changes**
+   ```bash
+   # Edit files...
+   ```
+
+5. **Run tests**
+   ```bash
+   python -m pytest tests/ -v
+   ```
+
+6. **Commit your changes**
+   ```bash
+   git add .
+   git commit -m "Add awesome feature"
+   ```
+
+7. **Push to your fork**
+   ```bash
+   git push origin feature/my-awesome-feature
+   ```
+
+8. **Create a Pull Request**
+
+---
+
+## Pull Request Process
+
+### Before Submitting
+
+- [ ] Tests pass locally (`python -m pytest tests/ -v`)
+- [ ] Code follows PEP 8 style guidelines
+- [ ] Documentation is updated if needed
+- [ ] CHANGELOG.md is updated (if applicable)
+- [ ] Commit messages are clear and descriptive
+
+### PR Template
+
+```markdown
+## Description
+Brief description of what this PR does.
+
+## Type of Change
+- [ ] Bug fix (non-breaking change which fixes an issue)
+- [ ] New feature (non-breaking change which adds functionality)
+- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
+- [ ] Documentation update
+
+## How Has This Been Tested?
+Describe the tests you ran to verify your changes.
+
+## Checklist
+- [ ] My code follows the style guidelines of this project
+- [ ] I have performed a self-review of my own code
+- [ ] I have commented my code, particularly in hard-to-understand areas
+- [ ] I have made corresponding changes to the documentation
+- [ ] My changes generate no new warnings
+- [ ] I have added tests that prove my fix is effective or that my feature works
+- [ ] New and existing unit tests pass locally with my changes
+```
+
+### Review Process
+
+1. A maintainer will review your PR within 3-5 business days
+2. Address any feedback or requested changes
+3. Once approved, a maintainer will merge your PR
+4. Your contribution will be included in the next release!
+
+---
+
+## Coding Standards
+
+### Python Style Guide
+
+We follow [PEP 8](https://www.python.org/dev/peps/pep-0008/) with some modifications:
+
+- **Line length:** 100 characters (not 79)
+- **Indentation:** 4 spaces
+- **Quotes:** Double quotes for strings
+- **Naming:**
+  - Functions/variables: `snake_case`
+  - Classes: `PascalCase`
+  - Constants: `UPPER_SNAKE_CASE`
+
+### Code Organization
+
+```python
+# 1. Standard library imports
+import os
+import sys
+from pathlib import Path
+
+# 2. Third-party imports
+import requests
+from bs4 import BeautifulSoup
+
+# 3. Local application imports
+from cli.utils import open_folder
+
+# 4. Constants
+MAX_PAGES = 1000
+DEFAULT_RATE_LIMIT = 0.5
+
+# 5. Functions and classes
+def my_function():
+    """Docstring describing what this function does."""
+    pass
+```
+
+### Documentation
+
+- All functions should have docstrings
+- Use type hints where appropriate
+- Add comments for complex logic
+
+```python
+def scrape_page(url: str, selectors: dict) -> dict:
+    """
+    Scrape a single page and extract content.
+
+    Args:
+        url: The URL to scrape
+        selectors: Dictionary of CSS selectors
+
+    Returns:
+        Dictionary containing extracted content
+
+    Raises:
+        RequestException: If page cannot be fetched
+    """
+    pass
+```
+
+---
+
+## Testing
+
+### Running Tests
+
+```bash
+# Run all tests
+python -m pytest tests/ -v
+
+# Run specific test file
+python -m pytest tests/test_mcp_server.py -v
+
+# Run with coverage
+python -m pytest tests/ --cov=cli --cov=mcp --cov-report=term
+```
+
+### Writing Tests
+
+- Tests go in the `tests/` directory
+- Test files should start with `test_`
+- Use descriptive test names
+
+```python
+def test_config_validation_with_missing_fields():
+    """Test that config validation fails when required fields are missing."""
+    config = {"name": "test"}  # Missing base_url
+    result = validate_config(config)
+    assert result is False
+```
+
+### Test Coverage
+
+- Aim for >80% code coverage
+- Critical paths should have 100% coverage
+- Add tests for bug fixes to prevent regressions
+
+---
+
+## Documentation
+
+### Where to Document
+
+- **README.md** - Overview, quick start, basic usage
+- **docs/** - Detailed guides and tutorials
+- **CHANGELOG.md** - All notable changes
+- **Code comments** - Complex logic and non-obvious decisions
+
+### Documentation Style
+
+- Use clear, simple language
+- Include code examples
+- Add screenshots for UI-related features
+- Keep it up to date with code changes
+
+---
+
+## Project Structure
+
+```
+Skill_Seekers/
+├── cli/                    # CLI tools
+│   ├── doc_scraper.py     # Main scraper
+│   ├── package_skill.py   # Packager
+│   ├── upload_skill.py    # Uploader
+│   └── utils.py           # Shared utilities
+├── mcp/                   # MCP server
+│   ├── server.py          # MCP implementation
+│   └── requirements.txt   # MCP dependencies
+├── configs/               # Framework configs
+├── docs/                  # Documentation
+├── tests/                 # Test suite
+└── .github/              # GitHub config
+    └── workflows/         # CI/CD workflows
+```
+
+---
+
+## Release Process
+
+Releases are managed by maintainers:
+
+1. Update version in relevant files
+2. Update CHANGELOG.md
+3. Create and push version tag
+4. GitHub Actions will create the release
+5. Announce on relevant channels
+
+---
+
+## Questions?
+
+- 💬 [Open a discussion](https://github.com/yusufkaraaslan/Skill_Seekers/discussions)
+- 🐛 [Report a bug](https://github.com/yusufkaraaslan/Skill_Seekers/issues)
+- 📧 Contact: yusufkaraaslan.yk@pm.me
+
+---
+
+## Recognition
+
+Contributors will be recognized in:
+- README.md contributors section
+- CHANGELOG.md for each release
+- GitHub contributors page
+
+Thank you for contributing to Skill Seeker! 🎉

+ 393 - 0
libs/external/Skill_Seekers-development/FLEXIBLE_ROADMAP.md

@@ -0,0 +1,393 @@
+# Flexible Development Roadmap
+**Philosophy:** Small incremental tasks → Pick one → Complete → Move to next
+**No big milestones, just continuous progress!**
+
+---
+
+## 🎯 Current Status: v2.1.0 Released ✅
+
+**Latest Release:** v2.1.0 (November 29, 2025)
+
+**What Works:**
+- ✅ Documentation scraping (HTML websites)
+- ✅ GitHub repository scraping with unlimited local analysis
+- ✅ PDF extraction and conversion
+- ✅ Unified multi-source scraping (docs + GitHub + PDF)
+- ✅ 9 MCP tools fully functional
+- ✅ Auto-upload to Claude
+- ✅ 24 preset configs (including 5 unified configs)
+- ✅ Large docs support (40K+ pages)
+- ✅ Configurable directory exclusions
+- ✅ 427 tests passing
+
+---
+
+## 📋 Task Categories (Pick Any, Any Order)
+
+### 🌐 **Category A: Community & Sharing**
+Small tasks that build community features incrementally
+
+#### A1: Config Sharing (Website Feature)
+- [x] **Task A1.1:** Create simple JSON API endpoint to list configs ✅ **COMPLETE** (Issue #9)
+  - **Status:** Live at https://api.skillseekersweb.com
+  - **Features:** 6 REST endpoints, auto-categorization, auto-tags, filtering, SSL enabled
+  - **Branch:** `feature/a1-config-sharing`
+  - **Deployment:** Render with custom domain
+- [ ] **Task A1.2:** Add MCP tool `fetch_config` to download from website
+- [ ] **Task A1.3:** Create basic config upload form (HTML + backend)
+- [ ] **Task A1.4:** Add config rating/voting system
+- [ ] **Task A1.5:** Add config search/filter functionality
+- [ ] **Task A1.6:** Add user-submitted config review queue
+
+**Start Small:** ~~Pick A1.1 first (simple JSON endpoint)~~ ✅ A1.1 Complete! Pick A1.2 next (MCP tool)
+
+#### A2: Knowledge Sharing (Website Feature)
+- [ ] **Task A2.1:** Design knowledge database schema
+- [ ] **Task A2.2:** Create API endpoint to upload knowledge (.zip files)
+- [ ] **Task A2.3:** Add MCP tool `fetch_knowledge` to download from site
+- [ ] **Task A2.4:** Add knowledge preview/description
+- [ ] **Task A2.5:** Add knowledge categorization (by framework/topic)
+- [ ] **Task A2.6:** Add knowledge search functionality
+
+**Start Small:** Pick A2.1 first (schema design, no coding)
+
+#### A3: Simple Website Foundation
+- [ ] **Task A3.1:** Create single-page static site (GitHub Pages)
+- [ ] **Task A3.2:** Add config gallery view (display existing 12 configs)
+- [ ] **Task A3.3:** Add "Submit Config" link (opens GitHub issue for now)
+- [ ] **Task A3.4:** Add basic stats (total configs, downloads, etc.)
+- [ ] **Task A3.5:** Add simple blog using GitHub Issues
+- [ ] **Task A3.6:** Add RSS feed for updates
+
+**Start Small:** Pick A3.1 first (single HTML page on GitHub Pages)
+
+---
+
+### 🛠️ **Category B: New Input Formats**
+Add support for non-HTML documentation sources
+
+#### B1: PDF Documentation Support
+- [ ] **Task B1.1:** Research PDF parsing libraries (PyPDF2, pdfplumber, etc.)
+- [ ] **Task B1.2:** Create simple PDF text extractor (proof of concept)
+- [ ] **Task B1.3:** Add PDF page detection and chunking
+- [ ] **Task B1.4:** Extract code blocks from PDFs (syntax detection)
+- [ ] **Task B1.5:** Add PDF image extraction (diagrams, screenshots)
+- [ ] **Task B1.6:** Create `pdf_scraper.py` CLI tool
+- [ ] **Task B1.7:** Add MCP tool `scrape_pdf`
+- [ ] **Task B1.8:** Create PDF config format (similar to web configs)
+
+**Start Small:** Pick B1.1 first (just research, document findings)
+
+#### B2: Microsoft Word (.docx) Support
+- [ ] **Task B2.1:** Research .docx parsing (python-docx library)
+- [ ] **Task B2.2:** Create simple .docx text extractor
+- [ ] **Task B2.3:** Extract headings and create categories
+- [ ] **Task B2.4:** Extract code blocks from Word docs
+- [ ] **Task B2.5:** Extract tables and convert to markdown
+- [ ] **Task B2.6:** Create `docx_scraper.py` CLI tool
+- [ ] **Task B2.7:** Add MCP tool `scrape_docx`
+
+**Start Small:** Pick B2.1 first (research only)
+
+#### B3: Excel/Spreadsheet (.xlsx) Support
+- [ ] **Task B3.1:** Research Excel parsing (openpyxl, pandas)
+- [ ] **Task B3.2:** Create simple sheet → markdown converter
+- [ ] **Task B3.3:** Add table detection and formatting
+- [ ] **Task B3.4:** Extract API reference from spreadsheets (common pattern)
+- [ ] **Task B3.5:** Create `xlsx_scraper.py` CLI tool
+- [ ] **Task B3.6:** Add MCP tool `scrape_xlsx`
+
+**Start Small:** Pick B3.1 first (research only)
+
+#### B4: Markdown Files Support
+- [ ] **Task B4.1:** Create markdown file crawler (for local docs)
+- [ ] **Task B4.2:** Extract front matter (title, category, etc.)
+- [ ] **Task B4.3:** Build category tree from folder structure
+- [ ] **Task B4.4:** Add link resolution (internal references)
+- [ ] **Task B4.5:** Create `markdown_scraper.py` CLI tool
+- [ ] **Task B4.6:** Add MCP tool `scrape_markdown_dir`
+
+**Start Small:** Pick B4.1 first (simple file walker)
+
+---
+
+### 💻 **Category C: Codebase Knowledge**
+Generate skills from actual code repositories
+
+#### C1: GitHub Repository Scraping
+- [ ] **Task C1.1:** Create GitHub API client (fetch repo structure)
+- [ ] **Task C1.2:** Extract README.md files
+- [ ] **Task C1.3:** Extract code comments and docstrings
+- [ ] **Task C1.4:** Detect programming language per file
+- [ ] **Task C1.5:** Extract function/class signatures
+- [ ] **Task C1.6:** Build usage examples from tests
+- [ ] **Task C1.7:** Extract GitHub Issues (open/closed, labels, milestones)
+- [ ] **Task C1.8:** Extract CHANGELOG.md and release notes
+- [ ] **Task C1.9:** Extract GitHub Releases with version history
+- [ ] **Task C1.10:** Create `github_scraper.py` CLI tool
+- [ ] **Task C1.11:** Add MCP tool `scrape_github`
+- [ ] **Task C1.12:** Add config format for GitHub repos
+
+**Start Small:** Pick C1.1 first (basic GitHub API connection)
+
+#### C2: Local Codebase Scraping
+- [ ] **Task C2.1:** Create file tree walker (with .gitignore support)
+- [ ] **Task C2.2:** Extract docstrings (Python, JS, etc.)
+- [ ] **Task C2.3:** Extract function signatures and types
+- [ ] **Task C2.4:** Build API reference from code
+- [ ] **Task C2.5:** Extract inline comments as notes
+- [ ] **Task C2.6:** Create dependency graph
+- [ ] **Task C2.7:** Create `codebase_scraper.py` CLI tool
+- [ ] **Task C2.8:** Add MCP tool `scrape_codebase`
+
+**Start Small:** Pick C2.1 first (simple file walker)
+
+#### C3: Code Pattern Recognition
+- [ ] **Task C3.1:** Detect common patterns (singleton, factory, etc.)
+- [ ] **Task C3.2:** Extract usage examples from test files
+- [ ] **Task C3.3:** Build "how to" guides from code
+- [ ] **Task C3.4:** Extract configuration patterns
+- [ ] **Task C3.5:** Create architectural overview
+
+**Start Small:** Pick C3.1 first (pattern detection research)
+
+---
+
+### 🔌 **Category D: Context7 Integration**
+Explore integration with Context7 for enhanced context management
+
+#### D1: Context7 Research & Planning
+- [ ] **Task D1.1:** Research Context7 API and capabilities
+- [ ] **Task D1.2:** Document potential use cases for Skill Seeker
+- [ ] **Task D1.3:** Create integration design proposal
+- [ ] **Task D1.4:** Identify which features benefit most
+
+**Start Small:** Pick D1.1 first (pure research, no code)
+
+#### D2: Context7 Basic Integration
+- [ ] **Task D2.1:** Create Context7 API client
+- [ ] **Task D2.2:** Test basic context storage/retrieval
+- [ ] **Task D2.3:** Store scraped documentation in Context7
+- [ ] **Task D2.4:** Query Context7 during skill building
+- [ ] **Task D2.5:** Add MCP tool `sync_to_context7`
+
+**Start Small:** Pick D2.1 first (basic API connection)
+
+---
+
+### 🚀 **Category E: MCP Enhancements**
+Small improvements to existing MCP tools
+
+#### E1: New MCP Tools
+- [ ] **Task E1.1:** Add `fetch_config` MCP tool (download from website)
+- [ ] **Task E1.2:** Add `fetch_knowledge` MCP tool (download skills)
+- [x] **Task E1.3:** Add `scrape_pdf` MCP tool (✅ COMPLETED v1.0.0)
+- [ ] **Task E1.4:** Add `scrape_docx` MCP tool
+- [ ] **Task E1.5:** Add `scrape_xlsx` MCP tool
+- [ ] **Task E1.6:** Add `scrape_github` MCP tool (see C1.11)
+- [ ] **Task E1.7:** Add `scrape_codebase` MCP tool (see C2.8)
+- [ ] **Task E1.8:** Add `scrape_markdown_dir` MCP tool (see B4.6)
+- [ ] **Task E1.9:** Add `sync_to_context7` MCP tool (see D2.5)
+
+**Start Small:** Pick E1.1 first (once A1.2 is done)
+
+#### E2: MCP Quality Improvements
+- [ ] **Task E2.1:** Add error handling to all tools
+- [ ] **Task E2.2:** Add structured logging
+- [ ] **Task E2.3:** Add progress indicators for long operations
+- [ ] **Task E2.4:** Add validation for all inputs
+- [ ] **Task E2.5:** Add helpful error messages
+- [ ] **Task E2.6:** Add retry logic for network failures
+
+**Start Small:** Pick E2.1 first (one tool at a time)
+
+---
+
+### ⚡ **Category F: Performance & Reliability**
+Technical improvements to existing features
+
+#### F1: Core Scraper Improvements
+- [ ] **Task F1.1:** Add URL normalization (remove query params)
+- [ ] **Task F1.2:** Add duplicate page detection
+- [ ] **Task F1.3:** Add memory-efficient streaming for large docs
+- [ ] **Task F1.4:** Add HTML parser fallback (lxml → html5lib)
+- [ ] **Task F1.5:** Add network retry with exponential backoff
+- [ ] **Task F1.6:** Fix package path output bug
+
+**Start Small:** Pick F1.1 first (URL normalization only)
+
+#### F2: Incremental Updates
+- [ ] **Task F2.1:** Track page modification times (Last-Modified header)
+- [ ] **Task F2.2:** Store page checksums/hashes
+- [ ] **Task F2.3:** Compare on re-run, skip unchanged pages
+- [ ] **Task F2.4:** Update only changed content
+- [ ] **Task F2.5:** Preserve local annotations/edits
+
+**Start Small:** Pick F2.1 first (just tracking, no logic)
+
+---
+
+### 🎨 **Category G: Tools & Utilities**
+Small standalone tools that add value
+
+#### G1: Config Tools
+- [ ] **Task G1.1:** Create `validate_config.py` (enhanced validation)
+- [ ] **Task G1.2:** Create `test_selectors.py` (interactive selector tester)
+- [ ] **Task G1.3:** Create `auto_detect_selectors.py` (AI-powered)
+- [ ] **Task G1.4:** Create `compare_configs.py` (diff two configs)
+- [ ] **Task G1.5:** Create `optimize_config.py` (suggest improvements)
+
+**Start Small:** Pick G1.1 first (simple validation script)
+
+#### G2: Skill Quality Tools
+- [ ] **Task G2.1:** Create `analyze_skill.py` (quality metrics)
+- [ ] **Task G2.2:** Add code example counter
+- [ ] **Task G2.3:** Add readability scoring
+- [ ] **Task G2.4:** Add completeness checker
+- [ ] **Task G2.5:** Create quality report generator
+
+**Start Small:** Pick G2.1 first (basic metrics)
+
+---
+
+### 📚 **Category H: Community Response**
+Respond to existing GitHub issues
+
+#### H1: Address Open Issues
+- [ ] **Task H1.1:** Respond to Issue #8: Prereqs to Getting Started
+- [ ] **Task H1.2:** Investigate Issue #7: Laravel scraping issue
+- [ ] **Task H1.3:** Create example project (Issue #4)
+- [ ] **Task H1.4:** Answer Issue #3: Pro plan compatibility
+- [ ] **Task H1.5:** Create self-documenting skill (Issue #1)
+
+**Start Small:** Pick H1.1 first (just respond, don't solve)
+
+---
+
+### 🎓 **Category I: Content & Documentation**
+Educational content and guides
+
+#### I1: Video Tutorials
+- [ ] **Task I1.1:** Write script for "Quick Start" video
+- [ ] **Task I1.2:** Record "Quick Start" (5 min)
+- [ ] **Task I1.3:** Write script for "MCP Setup" video
+- [ ] **Task I1.4:** Record "MCP Setup" (8 min)
+- [ ] **Task I1.5:** Write script for "Custom Config" video
+- [ ] **Task I1.6:** Record "Custom Config" (10 min)
+
+**Start Small:** Pick I1.1 first (just write script, no recording)
+
+#### I2: Written Guides
+- [ ] **Task I2.1:** Write troubleshooting guide
+- [ ] **Task I2.2:** Write best practices guide
+- [ ] **Task I2.3:** Write performance optimization guide
+- [ ] **Task I2.4:** Write community config contribution guide
+- [ ] **Task I2.5:** Write codebase scraping guide
+
+**Start Small:** Pick I2.1 first (common issues + solutions)
+
+---
+
+### 🧪 **Category J: Testing & Quality**
+Improve test coverage and quality
+
+#### J1: Test Expansion
+- [ ] **Task J1.1:** Install MCP package: `pip install mcp`
+- [ ] **Task J1.2:** Verify all 14 tests pass
+- [ ] **Task J1.3:** Add tests for new MCP tools (as they're created)
+- [ ] **Task J1.4:** Add integration tests for PDF scraper
+- [ ] **Task J1.5:** Add integration tests for GitHub scraper
+- [ ] **Task J1.6:** Add end-to-end workflow tests
+
+**Start Small:** Pick J1.1 first (just install package)
+
+---
+
+## 🎯 Recommended Starting Tasks (Pick 3-5)
+
+### Quick Wins (1-2 hours each):
+1. **H1.1** - Respond to Issue #8 (community engagement)
+2. **J1.1** - Install MCP package (fix tests)
+3. **A3.1** - Create simple GitHub Pages site (single HTML)
+4. **B1.1** - Research PDF parsing (no coding, just notes)
+5. **F1.1** - Add URL normalization (small code fix)
+
+### Medium Tasks (3-5 hours each):
+6. ~~**A1.1** - Create JSON API for configs (simple endpoint)~~ ✅ **COMPLETE**
+7. **G1.1** - Create config validator script
+8. **C1.1** - GitHub API client (basic connection)
+9. **I1.1** - Write Quick Start video script
+10. **E2.1** - Add error handling to one MCP tool
+
+### Bigger Tasks (5-10 hours each):
+11. **B1.2-B1.6** - Complete PDF scraper
+12. **C1.7-C1.9** - Complete GitHub scraper
+13. **A2.1-A2.3** - Knowledge sharing foundation
+14. **I1.2** - Record and publish Quick Start video
+
+---
+
+## 📊 Progress Tracking
+
+**Completed Tasks:** 1 (A1.1 ✅)
+**In Progress:** 0
+**Total Available Tasks:** 134
+
+### Current Sprint: Choose Your Own Adventure!
+**Pick 1-3 tasks** from any category that interest you most.
+
+**No pressure, no deadlines, just progress!** ✨
+
+---
+
+## 🎨 Flexibility Rules
+
+1. **Pick any task, any order** - No dependencies (mostly)
+2. **Start small** - Research tasks before implementation
+3. **One task at a time** - Focus, complete, move on
+4. **Switch anytime** - Not enjoying it? Pick another!
+5. **Document as you go** - Each task should update docs
+6. **Test incrementally** - Each task should have a quick test
+7. **Ship early** - Don't wait for "complete" features
+
+---
+
+## 🚀 How to Use This Roadmap
+
+### Step 1: Pick a Task
+- Read through categories
+- Pick something that sounds interesting
+- Check estimated time
+- Choose 1-3 tasks for this week
+
+### Step 2: Create Issue (Optional)
+- Create GitHub issue for tracking
+- Add labels (category, priority)
+- Add to project board
+
+### Step 3: Work on It
+- Complete the task
+- Test it
+- Document it
+- Mark as done ✅
+
+### Step 4: Ship It
+- Commit changes
+- Update changelog
+- Tag version (if significant)
+- Announce on GitHub
+
+### Step 5: Repeat
+- Pick next task
+- Keep moving forward!
+
+---
+
+**Philosophy:**
+**Small steps → Consistent progress → Compound results**
+
+**No rigid milestones. No big releases. Just continuous improvement!** 🎯
+
+---
+
+**Last Updated:** October 20, 2025

+ 292 - 0
libs/external/Skill_Seekers-development/FUTURE_RELEASES.md

@@ -0,0 +1,292 @@
+# Future Releases Roadmap
+
+This document outlines planned features, improvements, and the vision for upcoming releases of Skill Seekers.
+
+## Release Philosophy
+
+We follow semantic versioning (MAJOR.MINOR.PATCH) and maintain backward compatibility wherever possible. Each release focuses on delivering value to users while maintaining code quality and test coverage.
+
+---
+
+## ✅ Release: v2.1.0 (Released: November 29, 2025)
+
+**Focus:** Test Coverage & Quality Improvements
+
+### Completed Features
+
+#### Testing & Quality
+- [x] **Fix 12 unified scraping tests** ✅ - Complete test coverage for unified multi-source scraping
+  - ConfigValidator expecting dict instead of file path
+  - ConflictDetector expecting dict pages, not list
+  - Full integration test suite for unified workflow
+
+### Planned Features (Future v2.2.0)
+
+#### Testing & Quality
+
+- [ ] **Improve test coverage to 60%+** (currently 39%)
+  - Write tests for 0% coverage files:
+    - `generate_router.py` (110 lines) - Router skill generator
+    - `split_config.py` (165 lines) - Config splitter
+    - `unified_scraper.py` (208 lines) - Unified scraping CLI
+    - `package_multi.py` (37 lines) - Multi-package tool
+  - Improve coverage for low-coverage files:
+    - `mcp/server.py` (9% → 60%)
+    - `enhance_skill.py` (11% → 60%)
+    - `code_analyzer.py` (19% → 60%)
+
+- [ ] **Fix MCP test skipping issue** - 29 MCP tests pass individually but skip in full suite
+  - Resolve pytest isolation issue
+  - Ensure all tests run in CI/CD
+
+#### Features
+- [ ] **Task H1.3: Create example project folder**
+  - Real-world example projects using Skill Seekers
+  - Step-by-step tutorials
+  - Before/after comparisons
+
+- [ ] **Task J1.1: Install MCP package for testing**
+  - Better MCP integration testing
+  - Automated MCP server tests in CI
+
+- [ ] **Enhanced error handling**
+  - Better error messages for common issues
+  - Graceful degradation for missing dependencies
+  - Recovery from partial failures
+
+### Documentation
+- [ ] Video tutorials for common workflows
+- [ ] Troubleshooting guide expansion
+- [ ] Performance optimization guide
+
+---
+
+## Release: v2.2.0 (Estimated: Q1 2026)
+
+**Focus:** Web Presence & Community Growth
+
+### Planned Features
+
+#### Community & Documentation
+- [ ] **Task A3.1: GitHub Pages website** (skillseekersweb.com)
+  - Interactive documentation
+  - Live demos and examples
+  - Getting started wizard
+  - Community showcase
+
+- [ ] **Plugin system foundation**
+  - Allow custom scrapers via plugins
+  - Plugin discovery and installation
+  - Plugin documentation generator
+
+#### Enhancements
+- [ ] **Support for additional documentation formats**
+  - Sphinx documentation
+  - Docusaurus sites
+  - GitBook
+  - Read the Docs
+  - MkDocs Material
+
+- [ ] **Improved caching strategies**
+  - Intelligent cache invalidation
+  - Differential scraping (only changed pages)
+  - Cache compression
+  - Cross-session cache sharing
+
+#### Performance
+- [ ] **Scraping performance improvements**
+  - Connection pooling optimizations
+  - Smart rate limiting based on server response
+  - Adaptive concurrency
+  - Memory usage optimization for large docs
+
+---
+
+## Release: v2.3.0 (Estimated: Q2 2026)
+
+**Focus:** Developer Experience & Integrations
+
+### Planned Features
+
+#### Developer Tools
+- [ ] **Web UI for config generation**
+  - Visual config builder
+  - Real-time preview
+  - Template library
+  - Export/import configs
+
+- [ ] **CI/CD integration examples**
+  - GitHub Actions workflows
+  - GitLab CI
+  - Jenkins pipelines
+  - Automated skill updates on doc changes
+
+- [ ] **Docker containerization**
+  - Official Docker images
+  - docker-compose examples
+  - Kubernetes deployment guides
+
+#### API & Integrations
+- [ ] **GraphQL API support**
+  - Scrape GraphQL documentation
+  - Extract schema and queries
+  - Generate interactive examples
+
+- [ ] **REST API documentation formats**
+  - OpenAPI/Swagger
+  - Postman collections
+  - API Blueprint
+
+---
+
+## Long-term Vision (v3.0+)
+
+### Major Features Under Consideration
+
+#### Advanced Scraping
+- [ ] **Real-time documentation monitoring**
+  - Watch for documentation changes
+  - Automatic skill updates
+  - Change notifications
+  - Version diff reports
+
+- [ ] **Multi-language documentation**
+  - Automatic language detection
+  - Combined multi-language skills
+  - Translation quality checking
+
+#### Collaboration
+- [ ] **Collaborative skill curation**
+  - Shared skill repositories
+  - Community ratings and reviews
+  - Collaborative editing
+  - Fork and merge workflows
+
+- [ ] **Skill marketplace**
+  - Discover community-created skills
+  - Share your skills
+  - Quality ratings
+  - Usage statistics
+
+#### AI & Intelligence
+- [ ] **Enhanced AI analysis**
+  - Better conflict detection algorithms
+  - Automatic documentation quality scoring
+  - Suggested improvements
+  - Code example validation
+
+- [ ] **Semantic understanding**
+  - Natural language queries for skill content
+  - Intelligent categorization
+  - Auto-generated summaries
+  - Concept relationship mapping
+
+---
+
+## Backlog Ideas
+
+### Features Requested by Community
+- [ ] Support for video tutorial transcription
+- [ ] Integration with Notion, Confluence, and other wikis
+- [ ] Jupyter notebook scraping and conversion
+- [ ] Live documentation preview during scraping
+- [ ] Skill versioning and update management
+- [ ] A/B testing for skill quality
+- [ ] Analytics dashboard (scraping stats, error rates, etc.)
+
+### Technical Improvements
+- [ ] Migration to modern async framework (httpx everywhere)
+- [ ] Improved type safety (full mypy strict mode)
+- [ ] Better logging and debugging tools
+- [ ] Performance profiling dashboard
+- [ ] Memory optimization for very large docs (100K+ pages)
+
+### Ecosystem
+- [ ] VS Code extension
+- [ ] IntelliJ/PyCharm plugin
+- [ ] Command-line interactive mode (TUI)
+- [ ] Skill diff tool (compare versions)
+- [ ] Skill merge tool (combine multiple skills)
+
+---
+
+## How to Influence the Roadmap
+
+### Priority System
+
+Features are prioritized based on:
+1. **User impact** - How many users will benefit?
+2. **Technical feasibility** - How complex is the implementation?
+3. **Community interest** - How many upvotes/requests?
+4. **Strategic alignment** - Does it fit our vision?
+
+### Ways to Contribute
+
+#### 1. Vote on Features
+- ⭐ Star feature request issues
+- 💬 Comment with your use case
+- 🔼 Upvote discussions
+
+#### 2. Contribute Code
+See our [FLEXIBLE_ROADMAP.md](FLEXIBLE_ROADMAP.md) for:
+- **134 tasks** across 22 feature groups
+- Tasks categorized by difficulty and area
+- Clear acceptance criteria
+- Estimated effort levels
+
+Pick any task and submit a PR! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
+
+#### 3. Share Feedback
+- Open issues for bugs or feature requests
+- Share your success stories
+- Suggest improvements to existing features
+- Report performance issues
+
+#### 4. Help with Documentation
+- Write tutorials
+- Improve existing docs
+- Translate documentation
+- Create video guides
+
+---
+
+## Release Schedule
+
+We aim for predictable releases:
+
+- **Patch releases (2.0.x)**: As needed for critical bugs
+- **Minor releases (2.x.0)**: Every 2-3 months
+- **Major releases (x.0.0)**: Annually, with breaking changes announced 3 months in advance
+
+### Current Schedule
+
+| Version | Focus | ETA | Status |
+|---------|-------|-----|--------|
+| v2.0.0 | PyPI Publication | 2025-11-11 | ✅ Released |
+| v2.1.0 | Test Coverage & Quality | 2025-11-29 | ✅ Released |
+| v2.2.0 | Web Presence | Q1 2026 | 📋 Planned |
+| v2.3.0 | Developer Experience | Q2 2026 | 📋 Planned |
+| v3.0.0 | Major Evolution | 2026 | 💡 Conceptual |
+
+---
+
+## Stay Updated
+
+- 📋 **Project Board**: https://github.com/users/yusufkaraaslan/projects/2
+- 📚 **Full Roadmap**: [FLEXIBLE_ROADMAP.md](FLEXIBLE_ROADMAP.md)
+- 📝 **Changelog**: [CHANGELOG.md](CHANGELOG.md)
+- 💬 **Discussions**: https://github.com/yusufkaraaslan/Skill_Seekers/discussions
+- 🐛 **Issues**: https://github.com/yusufkaraaslan/Skill_Seekers/issues
+
+---
+
+## Questions?
+
+Have questions about the roadmap or want to suggest a feature?
+
+1. Check if it's already in our [FLEXIBLE_ROADMAP.md](FLEXIBLE_ROADMAP.md)
+2. Search [existing discussions](https://github.com/yusufkaraaslan/Skill_Seekers/discussions)
+3. Open a new discussion or issue
+4. Reach out in our community channels
+
+**Together, we're building the future of documentation-to-AI skill conversion!** 🚀

+ 21 - 0
libs/external/Skill_Seekers-development/LICENSE

@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2025 [Your Name/Username]
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

+ 196 - 0
libs/external/Skill_Seekers-development/QUICKSTART.md

@@ -0,0 +1,196 @@
+# Quick Start Guide
+
+## 🚀 3 Steps to Create a Skill
+
+### Step 1: Install Dependencies
+
+```bash
+pip3 install requests beautifulsoup4
+```
+
+> **Note:** Skill_Seekers automatically checks for llms.txt files first, which is 10x faster when available.
+
+### Step 2: Run the Tool
+
+**Option A: Use a Preset (Easiest)**
+```bash
+skill-seekers scrape --config configs/godot.json
+```
+
+**Option B: Interactive Mode**
+```bash
+skill-seekers scrape --interactive
+```
+
+**Option C: Quick Command**
+```bash
+skill-seekers scrape --name react --url https://react.dev/
+```
+
+**Option D: Unified Multi-Source (NEW - v2.0.0)**
+```bash
+# Combine documentation + GitHub code in one skill
+skill-seekers unified --config configs/react_unified.json
+```
+*Detects conflicts between docs and code automatically!*
+
+### Step 3: Enhance SKILL.md (Recommended)
+
+```bash
+# LOCAL enhancement (no API key, uses Claude Code Max)
+skill-seekers enhance output/godot/
+```
+
+**This takes 60 seconds and dramatically improves the SKILL.md quality!**
+
+### Step 4: Package the Skill
+
+```bash
+skill-seekers package output/godot/
+```
+
+**Done!** You now have `godot.zip` ready to use.
+
+---
+
+## 📋 Available Presets
+
+```bash
+# Godot Engine
+skill-seekers scrape --config configs/godot.json
+
+# React
+skill-seekers scrape --config configs/react.json
+
+# Vue.js
+skill-seekers scrape --config configs/vue.json
+
+# Django
+skill-seekers scrape --config configs/django.json
+
+# FastAPI
+skill-seekers scrape --config configs/fastapi.json
+
+# Unified Multi-Source (NEW!)
+skill-seekers unified --config configs/react_unified.json
+skill-seekers unified --config configs/django_unified.json
+skill-seekers unified --config configs/fastapi_unified.json
+skill-seekers unified --config configs/godot_unified.json
+```
+
+---
+
+## ⚡ Using Existing Data (Fast!)
+
+If you already scraped once:
+
+```bash
+skill-seekers scrape --config configs/godot.json
+
+# When prompted:
+✓ Found existing data: 245 pages
+Use existing data? (y/n): y
+
+# Builds in seconds!
+```
+
+Or use `--skip-scrape`:
+```bash
+skill-seekers scrape --config configs/godot.json --skip-scrape
+```
+
+---
+
+## 🎯 Complete Example (Recommended Workflow)
+
+```bash
+# 1. Install (once)
+pip3 install requests beautifulsoup4
+
+# 2. Scrape React docs with LOCAL enhancement
+skill-seekers scrape --config configs/react.json --enhance-local
+# Wait 15-30 minutes (scraping) + 60 seconds (enhancement)
+
+# 3. Package
+skill-seekers package output/react/
+
+# 4. Use react.zip in Claude!
+```
+
+**Alternative: Enhancement after scraping**
+```bash
+# 2a. Scrape only (no enhancement)
+skill-seekers scrape --config configs/react.json
+
+# 2b. Enhance later
+skill-seekers enhance output/react/
+
+# 3. Package
+skill-seekers package output/react/
+```
+
+---
+
+## 💡 Pro Tips
+
+### Test with Small Pages First
+Edit config file:
+```json
+{
+  "max_pages": 20  // Test with just 20 pages
+}
+```
+
+### Rebuild Instantly
+```bash
+# After first scrape, you can rebuild instantly:
+skill-seekers scrape --config configs/react.json --skip-scrape
+```
+
+### Create Custom Config
+```bash
+# Copy a preset
+cp configs/react.json configs/myframework.json
+
+# Edit it
+nano configs/myframework.json
+
+# Use it
+skill-seekers scrape --config configs/myframework.json
+```
+
+---
+
+## 📁 What You Get
+
+```
+output/
+├── godot_data/          # Raw scraped data (reusable!)
+└── godot/               # The skill
+    ├── SKILL.md        # With real code examples!
+    └── references/     # Organized docs
+```
+
+---
+
+## ❓ Need Help?
+
+See **README.md** for:
+- Complete documentation
+- Config file structure
+- Troubleshooting
+- Advanced usage
+
+---
+
+## 🎮 Let's Go!
+
+```bash
+# Godot
+skill-seekers scrape --config configs/godot.json
+
+# Or interactive
+skill-seekers scrape --interactive
+```
+
+That's it! 🚀

+ 1099 - 0
libs/external/Skill_Seekers-development/README.md

@@ -0,0 +1,1099 @@
+[![MseeP.ai Security Assessment Badge](https://mseep.net/pr/yusufkaraaslan-skill-seekers-badge.png)](https://mseep.ai/app/yusufkaraaslan-skill-seekers)
+
+# Skill Seeker
+
+[![Version](https://img.shields.io/badge/version-2.1.1-blue.svg)](https://github.com/yusufkaraaslan/Skill_Seekers/releases/tag/v2.1.1)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
+[![MCP Integration](https://img.shields.io/badge/MCP-Integrated-blue.svg)](https://modelcontextprotocol.io)
+[![Tested](https://img.shields.io/badge/Tests-427%20Passing-brightgreen.svg)](tests/)
+[![Project Board](https://img.shields.io/badge/Project-Board-purple.svg)](https://github.com/users/yusufkaraaslan/projects/2)
+[![PyPI version](https://badge.fury.io/py/skill-seekers.svg)](https://pypi.org/project/skill-seekers/)
+[![PyPI - Downloads](https://img.shields.io/pypi/dm/skill-seekers.svg)](https://pypi.org/project/skill-seekers/)
+[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/skill-seekers.svg)](https://pypi.org/project/skill-seekers/)
+
+**Automatically convert documentation websites, GitHub repositories, and PDFs into Claude AI skills in minutes.**
+
+> 📋 **[View Development Roadmap & Tasks](https://github.com/users/yusufkaraaslan/projects/2)** - 134 tasks across 10 categories, pick any to contribute!
+
+## What is Skill Seeker?
+
+Skill Seeker is an automated tool that transforms documentation websites, GitHub repositories, and PDF files into production-ready [Claude AI skills](https://www.anthropic.com/news/skills). Instead of manually reading and summarizing documentation, Skill Seeker:
+
+1. **Scrapes** multiple sources (docs, GitHub repos, PDFs) automatically
+2. **Analyzes** code repositories with deep AST parsing
+3. **Detects** conflicts between documentation and code implementation
+4. **Organizes** content into categorized reference files
+5. **Enhances** with AI to extract best examples and key concepts
+6. **Packages** everything into an uploadable `.zip` file for Claude
+
+**Result:** Get comprehensive Claude skills for any framework, API, or tool in 20-40 minutes instead of hours of manual work.
+
+## Why Use This?
+
+- 🎯 **For Developers**: Create skills from documentation + GitHub repos with conflict detection
+- 🎮 **For Game Devs**: Generate skills for game engines (Godot docs + GitHub, Unity, etc.)
+- 🔧 **For Teams**: Combine internal docs + code repositories into single source of truth
+- 📚 **For Learners**: Build comprehensive skills from docs, code examples, and PDFs
+- 🔍 **For Open Source**: Analyze repos to find documentation gaps and outdated examples
+
+## Key Features
+
+### 🌐 Documentation Scraping
+- ✅ **llms.txt Support** - Automatically detects and uses LLM-ready documentation files (10x faster)
+- ✅ **Universal Scraper** - Works with ANY documentation website
+- ✅ **Smart Categorization** - Automatically organizes content by topic
+- ✅ **Code Language Detection** - Recognizes Python, JavaScript, C++, GDScript, etc.
+- ✅ **8 Ready-to-Use Presets** - Godot, React, Vue, Django, FastAPI, and more
+
+### 📄 PDF Support (**v1.2.0**)
+- ✅ **Basic PDF Extraction** - Extract text, code, and images from PDF files
+- ✅ **OCR for Scanned PDFs** - Extract text from scanned documents
+- ✅ **Password-Protected PDFs** - Handle encrypted PDFs
+- ✅ **Table Extraction** - Extract complex tables from PDFs
+- ✅ **Parallel Processing** - 3x faster for large PDFs
+- ✅ **Intelligent Caching** - 50% faster on re-runs
+
+### 🐙 GitHub Repository Scraping (**v2.0.0**)
+- ✅ **Deep Code Analysis** - AST parsing for Python, JavaScript, TypeScript, Java, C++, Go
+- ✅ **API Extraction** - Functions, classes, methods with parameters and types
+- ✅ **Repository Metadata** - README, file tree, language breakdown, stars/forks
+- ✅ **GitHub Issues & PRs** - Fetch open/closed issues with labels and milestones
+- ✅ **CHANGELOG & Releases** - Automatically extract version history
+- ✅ **Conflict Detection** - Compare documented APIs vs actual code implementation
+- ✅ **MCP Integration** - Natural language: "Scrape GitHub repo facebook/react"
+
+### 🔄 Unified Multi-Source Scraping (**NEW - v2.0.0**)
+- ✅ **Combine Multiple Sources** - Mix documentation + GitHub + PDF in one skill
+- ✅ **Conflict Detection** - Automatically finds discrepancies between docs and code
+- ✅ **Intelligent Merging** - Rule-based or AI-powered conflict resolution
+- ✅ **Transparent Reporting** - Side-by-side comparison with ⚠️ warnings
+- ✅ **Documentation Gap Analysis** - Identifies outdated docs and undocumented features
+- ✅ **Single Source of Truth** - One skill showing both intent (docs) and reality (code)
+- ✅ **Backward Compatible** - Legacy single-source configs still work
+
+### 🤖 AI & Enhancement
+- ✅ **AI-Powered Enhancement** - Transforms basic templates into comprehensive guides
+- ✅ **No API Costs** - FREE local enhancement using Claude Code Max
+- ✅ **MCP Server for Claude Code** - Use directly from Claude Code with natural language
+
+### ⚡ Performance & Scale
+- ✅ **Async Mode** - 2-3x faster scraping with async/await (use `--async` flag)
+- ✅ **Large Documentation Support** - Handle 10K-40K+ page docs with intelligent splitting
+- ✅ **Router/Hub Skills** - Intelligent routing to specialized sub-skills
+- ✅ **Parallel Scraping** - Process multiple skills simultaneously
+- ✅ **Checkpoint/Resume** - Never lose progress on long scrapes
+- ✅ **Caching System** - Scrape once, rebuild instantly
+
+### ✅ Quality Assurance
+- ✅ **Fully Tested** - 391 tests with comprehensive coverage
+
+---
+
+## 📦 Now Available on PyPI!
+
+**Skill Seekers is now published on the Python Package Index!** Install with a single command:
+
+```bash
+pip install skill-seekers
+```
+
+Get started in seconds. No cloning, no setup - just install and run. See installation options below.
+
+---
+
+## Quick Start
+
+### Option 1: Install from PyPI (Recommended)
+
+```bash
+# Install from PyPI (easiest method!)
+pip install skill-seekers
+
+# Use the unified CLI
+skill-seekers scrape --config configs/react.json
+skill-seekers github --repo facebook/react
+skill-seekers enhance output/react/
+skill-seekers package output/react/
+```
+
+**Time:** ~25 minutes | **Quality:** Production-ready | **Cost:** Free
+
+📖 **New to Skill Seekers?** Check out our [Quick Start Guide](QUICKSTART.md) or [Bulletproof Guide](BULLETPROOF_QUICKSTART.md)
+
+### Option 2: Install via uv (Modern Python Tool)
+
+```bash
+# Install with uv (fast, modern alternative)
+uv tool install skill-seekers
+
+# Or run directly without installing
+uv tool run --from skill-seekers skill-seekers scrape --config https://raw.githubusercontent.com/yusufkaraaslan/Skill_Seekers/main/configs/react.json
+
+# Unified CLI - simple commands
+skill-seekers scrape --config configs/react.json
+skill-seekers github --repo facebook/react
+skill-seekers package output/react/
+```
+
+**Time:** ~25 minutes | **Quality:** Production-ready | **Cost:** Free
+
+### Option 3: Development Install (From Source)
+
+```bash
+# Clone and install in editable mode
+git clone https://github.com/yusufkaraaslan/Skill_Seekers.git
+cd Skill_Seekers
+pip install -e .
+
+# Use the unified CLI
+skill-seekers scrape --config configs/react.json
+```
+
+### Option 4: Use from Claude Code (MCP Integration)
+
+```bash
+# One-time setup (5 minutes)
+./setup_mcp.sh
+
+# Then in Claude Code, just ask:
+"Generate a React skill from https://react.dev/"
+"Scrape PDF at docs/manual.pdf and create skill"
+```
+
+**Time:** Automated | **Quality:** Production-ready | **Cost:** Free
+
+### Option 5: Legacy CLI (Backwards Compatible)
+
+```bash
+# Install dependencies
+pip3 install requests beautifulsoup4
+
+# Run scripts directly (old method)
+python3 src/skill_seekers/cli/doc_scraper.py --config configs/react.json
+
+# Upload output/react.zip to Claude - Done!
+```
+
+**Time:** ~25 minutes | **Quality:** Production-ready | **Cost:** Free
+
+## Usage Examples
+
+### Documentation Scraping
+
+```bash
+# Scrape documentation website
+skill-seekers scrape --config configs/react.json
+
+# Quick scrape without config
+skill-seekers scrape --url https://react.dev --name react
+
+# With async mode (3x faster)
+skill-seekers scrape --config configs/godot.json --async --workers 8
+```
+
+### PDF Extraction
+
+```bash
+# Basic PDF extraction
+skill-seekers pdf --pdf docs/manual.pdf --name myskill
+
+# Advanced features
+skill-seekers pdf --pdf docs/manual.pdf --name myskill \
+    --extract-tables \        # Extract tables
+    --parallel \              # Fast parallel processing
+    --workers 8               # Use 8 CPU cores
+
+# Scanned PDFs (requires: pip install pytesseract Pillow)
+skill-seekers pdf --pdf docs/scanned.pdf --name myskill --ocr
+
+# Password-protected PDFs
+skill-seekers pdf --pdf docs/encrypted.pdf --name myskill --password mypassword
+```
+
+**Time:** ~5-15 minutes (or 2-5 minutes with parallel) | **Quality:** Production-ready | **Cost:** Free
+
+### GitHub Repository Scraping
+
+```bash
+# Basic repository scraping
+skill-seekers github --repo facebook/react
+
+# Using a config file
+skill-seekers github --config configs/react_github.json
+
+# With authentication (higher rate limits)
+export GITHUB_TOKEN=ghp_your_token_here
+skill-seekers github --repo facebook/react
+
+# Customize what to include
+skill-seekers github --repo django/django \
+    --include-issues \        # Extract GitHub Issues
+    --max-issues 100 \        # Limit issue count
+    --include-changelog \     # Extract CHANGELOG.md
+    --include-releases        # Extract GitHub Releases
+```
+
+**Time:** ~5-10 minutes | **Quality:** Production-ready | **Cost:** Free
+
+### Unified Multi-Source Scraping (**NEW - v2.0.0**)
+
+**The Problem:** Documentation and code often drift apart. Docs might be outdated, missing features that exist in code, or documenting features that were removed.
+
+**The Solution:** Combine documentation + GitHub + PDF into one unified skill that shows BOTH what's documented AND what actually exists, with clear warnings about discrepancies.
+
+```bash
+# Use existing unified configs
+skill-seekers unified --config configs/react_unified.json
+skill-seekers unified --config configs/django_unified.json
+
+# Or create unified config (mix documentation + GitHub)
+cat > configs/myframework_unified.json << 'EOF'
+{
+  "name": "myframework",
+  "description": "Complete framework knowledge from docs + code",
+  "merge_mode": "rule-based",
+  "sources": [
+    {
+      "type": "documentation",
+      "base_url": "https://docs.myframework.com/",
+      "extract_api": true,
+      "max_pages": 200
+    },
+    {
+      "type": "github",
+      "repo": "owner/myframework",
+      "include_code": true,
+      "code_analysis_depth": "surface"
+    }
+  ]
+}
+EOF
+
+# Run unified scraper
+skill-seekers unified --config configs/myframework_unified.json
+
+# Package and upload
+skill-seekers package output/myframework/
+# Upload output/myframework.zip to Claude - Done!
+```
+
+**Time:** ~30-45 minutes | **Quality:** Production-ready with conflict detection | **Cost:** Free
+
+**What Makes It Special:**
+
+✅ **Conflict Detection** - Automatically finds 4 types of discrepancies:
+- 🔴 **Missing in code** (high): Documented but not implemented
+- 🟡 **Missing in docs** (medium): Implemented but not documented
+- ⚠️ **Signature mismatch**: Different parameters/types
+- ℹ️ **Description mismatch**: Different explanations
+
+✅ **Transparent Reporting** - Shows both versions side-by-side:
+```markdown
+#### `move_local_x(delta: float)`
+
+⚠️ **Conflict**: Documentation signature differs from implementation
+
+**Documentation says:**
+```
+def move_local_x(delta: float)
+```
+
+**Code implementation:**
+```python
+def move_local_x(delta: float, snap: bool = False) -> None
+```
+```
+
+✅ **Advantages:**
+- **Identifies documentation gaps** - Find outdated or missing docs automatically
+- **Catches code changes** - Know when APIs change without docs being updated
+- **Single source of truth** - One skill showing intent (docs) AND reality (code)
+- **Actionable insights** - Get suggestions for fixing each conflict
+- **Development aid** - See what's actually in the codebase vs what's documented
+
+**Example Unified Configs:**
+- `configs/react_unified.json` - React docs + GitHub repo
+- `configs/django_unified.json` - Django docs + GitHub repo
+- `configs/fastapi_unified.json` - FastAPI docs + GitHub repo
+
+**Full Guide:** See [docs/UNIFIED_SCRAPING.md](docs/UNIFIED_SCRAPING.md) for complete documentation.
+
+## How It Works
+
+```mermaid
+graph LR
+    A[Documentation Website] --> B[Skill Seeker]
+    B --> C[Scraper]
+    B --> D[AI Enhancement]
+    B --> E[Packager]
+    C --> F[Organized References]
+    D --> F
+    F --> E
+    E --> G[Claude Skill .zip]
+    G --> H[Upload to Claude AI]
+```
+
+0. **Detect llms.txt** - Checks for llms-full.txt, llms.txt, llms-small.txt first
+1. **Scrape**: Extracts all pages from documentation
+2. **Categorize**: Organizes content into topics (API, guides, tutorials, etc.)
+3. **Enhance**: AI analyzes docs and creates comprehensive SKILL.md with examples
+4. **Package**: Bundles everything into a Claude-ready `.zip` file
+
+## 📋 Prerequisites
+
+**Before you start, make sure you have:**
+
+1. **Python 3.10 or higher** - [Download](https://www.python.org/downloads/) | Check: `python3 --version`
+2. **Git** - [Download](https://git-scm.com/) | Check: `git --version`
+3. **15-30 minutes** for first-time setup
+
+**First time user?** → **[Start Here: Bulletproof Quick Start Guide](BULLETPROOF_QUICKSTART.md)** 🎯
+
+This guide walks you through EVERYTHING step-by-step (Python install, git clone, first skill creation).
+
+---
+
+## 🚀 Quick Start
+
+### Method 1: MCP Server for Claude Code (Easiest)
+
+Use Skill Seeker directly from Claude Code with natural language!
+
+```bash
+# Clone repository
+git clone https://github.com/yusufkaraaslan/Skill_Seekers.git
+cd Skill_Seekers
+
+# One-time setup (5 minutes)
+./setup_mcp.sh
+
+# Restart Claude Code, then just ask:
+```
+
+**In Claude Code:**
+```
+List all available configs
+Generate config for Tailwind at https://tailwindcss.com/docs
+Scrape docs using configs/react.json
+Package skill at output/react/
+```
+
+**Benefits:**
+- ✅ No manual CLI commands
+- ✅ Natural language interface
+- ✅ Integrated with your workflow
+- ✅ 9 tools available instantly (includes automatic upload!)
+- ✅ **Tested and working** in production
+
+**Full guides:**
+- 📘 [MCP Setup Guide](docs/MCP_SETUP.md) - Complete installation instructions
+- 🧪 [MCP Testing Guide](docs/TEST_MCP_IN_CLAUDE_CODE.md) - Test all 9 tools
+- 📦 [Large Documentation Guide](docs/LARGE_DOCUMENTATION.md) - Handle 10K-40K+ pages
+- 📤 [Upload Guide](docs/UPLOAD_GUIDE.md) - How to upload skills to Claude
+
+### Method 2: CLI (Traditional)
+
+#### One-Time Setup: Create Virtual Environment
+
+```bash
+# Clone repository
+git clone https://github.com/yusufkaraaslan/Skill_Seekers.git
+cd Skill_Seekers
+
+# Create virtual environment
+python3 -m venv venv
+
+# Activate virtual environment
+source venv/bin/activate  # macOS/Linux
+# OR on Windows: venv\Scripts\activate
+
+# Install dependencies
+pip install requests beautifulsoup4 pytest
+
+# Save dependencies
+pip freeze > requirements.txt
+
+# Optional: Install anthropic for API-based enhancement (not needed for LOCAL enhancement)
+# pip install anthropic
+```
+
+**Always activate the virtual environment before using Skill Seeker:**
+```bash
+source venv/bin/activate  # Run this each time you start a new terminal session
+```
+
+#### Easiest: Use a Preset
+
+```bash
+# Make sure venv is activated (you should see (venv) in your prompt)
+source venv/bin/activate
+
+# Optional: Estimate pages first (fast, 1-2 minutes)
+skill-seekers estimate configs/godot.json
+
+# Use Godot preset
+skill-seekers scrape --config configs/godot.json
+
+# Use React preset
+skill-seekers scrape --config configs/react.json
+
+# See all presets
+ls configs/
+```
+
+### Interactive Mode
+
+```bash
+skill-seekers scrape --interactive
+```
+
+### Quick Mode
+
+```bash
+skill-seekers scrape \
+  --name react \
+  --url https://react.dev/ \
+  --description "React framework for UIs"
+```
+
+## 📤 Uploading Skills to Claude
+
+Once your skill is packaged, you need to upload it to Claude:
+
+### Option 1: Automatic Upload (API-based)
+
+```bash
+# Set your API key (one-time)
+export ANTHROPIC_API_KEY=sk-ant-...
+
+# Package and upload automatically
+skill-seekers package output/react/ --upload
+
+# OR upload existing .zip
+skill-seekers upload output/react.zip
+```
+
+**Benefits:**
+- ✅ Fully automatic
+- ✅ No manual steps
+- ✅ Works from command line
+
+**Requirements:**
+- Anthropic API key (get from https://console.anthropic.com/)
+
+### Option 2: Manual Upload (No API Key)
+
+```bash
+# Package skill
+skill-seekers package output/react/
+
+# This will:
+# 1. Create output/react.zip
+# 2. Open the output/ folder automatically
+# 3. Show upload instructions
+
+# Then manually upload:
+# - Go to https://claude.ai/skills
+# - Click "Upload Skill"
+# - Select output/react.zip
+# - Done!
+```
+
+**Benefits:**
+- ✅ No API key needed
+- ✅ Works for everyone
+- ✅ Folder opens automatically
+
+### Option 3: Claude Code (MCP) - Smart & Automatic
+
+```
+In Claude Code, just ask:
+"Package and upload the React skill"
+
+# With API key set:
+# - Packages the skill
+# - Uploads to Claude automatically
+# - Done! ✅
+
+# Without API key:
+# - Packages the skill
+# - Shows where to find the .zip
+# - Provides manual upload instructions
+```
+
+**Benefits:**
+- ✅ Natural language
+- ✅ Smart auto-detection (uploads if API key available)
+- ✅ Works with or without API key
+- ✅ No errors or failures
+
+---
+
+## 📁 Simple Structure
+
+```
+doc-to-skill/
+├── cli/
+│   ├── doc_scraper.py      # Main scraping tool
+│   ├── package_skill.py    # Package to .zip
+│   ├── upload_skill.py     # Auto-upload (API)
+│   └── enhance_skill.py    # AI enhancement
+├── mcp/                    # MCP server for Claude Code
+│   └── server.py           # 9 MCP tools
+├── configs/                # Preset configurations
+│   ├── godot.json         # Godot Engine
+│   ├── react.json         # React
+│   ├── vue.json           # Vue.js
+│   ├── django.json        # Django
+│   └── fastapi.json       # FastAPI
+└── output/                 # All output (auto-created)
+    ├── godot_data/        # Scraped data
+    ├── godot/             # Built skill
+    └── godot.zip          # Packaged skill
+```
+
+## ✨ Features
+
+### 1. Fast Page Estimation (NEW!)
+
+```bash
+skill-seekers estimate configs/react.json
+
+# Output:
+📊 ESTIMATION RESULTS
+✅ Pages Discovered: 180
+📈 Estimated Total: 230
+⏱️  Time Elapsed: 1.2 minutes
+💡 Recommended max_pages: 280
+```
+
+**Benefits:**
+- Know page count BEFORE scraping (saves time)
+- Validates URL patterns work correctly
+- Estimates total scraping time
+- Recommends optimal `max_pages` setting
+- Fast (1-2 minutes vs 20-40 minutes full scrape)
+
+### 2. Auto-Detect Existing Data
+
+```bash
+skill-seekers scrape --config configs/godot.json
+
+# If data exists:
+✓ Found existing data: 245 pages
+Use existing data? (y/n): y
+⏭️  Skipping scrape, using existing data
+```
+
+### 3. Knowledge Generation
+
+**Automatic pattern extraction:**
+- Extracts common code patterns from docs
+- Detects programming language
+- Creates quick reference with real examples
+- Smarter categorization with scoring
+
+**Enhanced SKILL.md:**
+- Real code examples from documentation
+- Language-annotated code blocks
+- Common patterns section
+- Quick reference from actual usage examples
+
+### 4. Smart Categorization
+
+Automatically infers categories from:
+- URL structure
+- Page titles
+- Content keywords
+- With scoring for better accuracy
+
+### 5. Code Language Detection
+
+```python
+# Automatically detects:
+- Python (def, import, from)
+- JavaScript (const, let, =>)
+- GDScript (func, var, extends)
+- C++ (#include, int main)
+- And more...
+```
+
+### 5. Skip Scraping
+
+```bash
+# Scrape once
+skill-seekers scrape --config configs/react.json
+
+# Later, just rebuild (instant)
+skill-seekers scrape --config configs/react.json --skip-scrape
+```
+
+### 6. Async Mode for Faster Scraping (2-3x Speed!)
+
+```bash
+# Enable async mode with 8 workers (recommended for large docs)
+skill-seekers scrape --config configs/react.json --async --workers 8
+
+# Small docs (~100-500 pages)
+skill-seekers scrape --config configs/mydocs.json --async --workers 4
+
+# Large docs (2000+ pages) with no rate limiting
+skill-seekers scrape --config configs/largedocs.json --async --workers 8 --no-rate-limit
+```
+
+**Performance Comparison:**
+- **Sync mode (threads):** ~18 pages/sec, 120 MB memory
+- **Async mode:** ~55 pages/sec, 40 MB memory
+- **Result:** 3x faster, 66% less memory!
+
+**When to use:**
+- ✅ Large documentation (500+ pages)
+- ✅ Network latency is high
+- ✅ Memory is constrained
+- ❌ Small docs (< 100 pages) - overhead not worth it
+
+**See full guide:** [ASYNC_SUPPORT.md](ASYNC_SUPPORT.md)
+
+### 7. AI-Powered SKILL.md Enhancement
+
+```bash
+# Option 1: During scraping (API-based, requires API key)
+pip3 install anthropic
+export ANTHROPIC_API_KEY=sk-ant-...
+skill-seekers scrape --config configs/react.json --enhance
+
+# Option 2: During scraping (LOCAL, no API key - uses Claude Code Max)
+skill-seekers scrape --config configs/react.json --enhance-local
+
+# Option 3: After scraping (API-based, standalone)
+skill-seekers enhance output/react/
+
+# Option 4: After scraping (LOCAL, no API key, standalone)
+skill-seekers enhance output/react/
+```
+
+**What it does:**
+- Reads your reference documentation
+- Uses Claude to generate an excellent SKILL.md
+- Extracts best code examples (5-10 practical examples)
+- Creates comprehensive quick reference
+- Adds domain-specific key concepts
+- Provides navigation guidance for different skill levels
+- Automatically backs up original
+- **Quality:** Transforms 75-line templates into 500+ line comprehensive guides
+
+**LOCAL Enhancement (Recommended):**
+- Uses your Claude Code Max plan (no API costs)
+- Opens new terminal with Claude Code
+- Analyzes reference files automatically
+- Takes 30-60 seconds
+- Quality: 9/10 (comparable to API version)
+
+### 7. Large Documentation Support (10K-40K+ Pages)
+
+**For massive documentation sites like Godot (40K pages), AWS, or Microsoft Docs:**
+
+```bash
+# 1. Estimate first (discover page count)
+skill-seekers estimate configs/godot.json
+
+# 2. Auto-split into focused sub-skills
+python3 -m skill_seekers.cli.split_config configs/godot.json --strategy router
+
+# Creates:
+# - godot-scripting.json (5K pages)
+# - godot-2d.json (8K pages)
+# - godot-3d.json (10K pages)
+# - godot-physics.json (6K pages)
+# - godot-shaders.json (11K pages)
+
+# 3. Scrape all in parallel (4-8 hours instead of 20-40!)
+for config in configs/godot-*.json; do
+  skill-seekers scrape --config $config &
+done
+wait
+
+# 4. Generate intelligent router/hub skill
+python3 -m skill_seekers.cli.generate_router configs/godot-*.json
+
+# 5. Package all skills
+python3 -m skill_seekers.cli.package_multi output/godot*/
+
+# 6. Upload all .zip files to Claude
+# Users just ask questions naturally!
+# Router automatically directs to the right sub-skill!
+```
+
+**Split Strategies:**
+- **auto** - Intelligently detects best strategy based on page count
+- **category** - Split by documentation categories (scripting, 2d, 3d, etc.)
+- **router** - Create hub skill + specialized sub-skills (RECOMMENDED)
+- **size** - Split every N pages (for docs without clear categories)
+
+**Benefits:**
+- ✅ Faster scraping (parallel execution)
+- ✅ More focused skills (better Claude performance)
+- ✅ Easier maintenance (update one topic at a time)
+- ✅ Natural user experience (router handles routing)
+- ✅ Avoids context window limits
+
+**Configuration:**
+```json
+{
+  "name": "godot",
+  "max_pages": 40000,
+  "split_strategy": "router",
+  "split_config": {
+    "target_pages_per_skill": 5000,
+    "create_router": true,
+    "split_by_categories": ["scripting", "2d", "3d", "physics"]
+  }
+}
+```
+
+**Full Guide:** [Large Documentation Guide](docs/LARGE_DOCUMENTATION.md)
+
+### 8. Checkpoint/Resume for Long Scrapes
+
+**Never lose progress on long-running scrapes:**
+
+```bash
+# Enable in config
+{
+  "checkpoint": {
+    "enabled": true,
+    "interval": 1000  // Save every 1000 pages
+  }
+}
+
+# If scrape is interrupted (Ctrl+C or crash)
+skill-seekers scrape --config configs/godot.json --resume
+
+# Resume from last checkpoint
+✅ Resuming from checkpoint (12,450 pages scraped)
+⏭️  Skipping 12,450 already-scraped pages
+🔄 Continuing from where we left off...
+
+# Start fresh (clear checkpoint)
+skill-seekers scrape --config configs/godot.json --fresh
+```
+
+**Benefits:**
+- ✅ Auto-saves every 1000 pages (configurable)
+- ✅ Saves on interruption (Ctrl+C)
+- ✅ Resume with `--resume` flag
+- ✅ Never lose hours of scraping progress
+
+## 🎯 Complete Workflows
+
+### First Time (With Scraping + Enhancement)
+
+```bash
+# 1. Scrape + Build + AI Enhancement (LOCAL, no API key)
+skill-seekers scrape --config configs/godot.json --enhance-local
+
+# 2. Wait for new terminal to close (enhancement completes)
+# Check the enhanced SKILL.md:
+cat output/godot/SKILL.md
+
+# 3. Package
+skill-seekers package output/godot/
+
+# 4. Done! You have godot.zip with excellent SKILL.md
+```
+
+**Time:** 20-40 minutes (scraping) + 60 seconds (enhancement) = ~21-41 minutes
+
+### Using Existing Data (Fast!)
+
+```bash
+# 1. Use cached data + Local Enhancement
+skill-seekers scrape --config configs/godot.json --skip-scrape
+skill-seekers enhance output/godot/
+
+# 2. Package
+skill-seekers package output/godot/
+
+# 3. Done!
+```
+
+**Time:** 1-3 minutes (build) + 60 seconds (enhancement) = ~2-4 minutes total
+
+### Without Enhancement (Basic)
+
+```bash
+# 1. Scrape + Build (no enhancement)
+skill-seekers scrape --config configs/godot.json
+
+# 2. Package
+skill-seekers package output/godot/
+
+# 3. Done! (SKILL.md will be basic template)
+```
+
+**Time:** 20-40 minutes
+**Note:** SKILL.md will be generic - enhancement strongly recommended!
+
+## 📋 Available Presets
+
+| Config | Framework | Description |
+|--------|-----------|-------------|
+| `godot.json` | Godot Engine | Game development |
+| `react.json` | React | UI framework |
+| `vue.json` | Vue.js | Progressive framework |
+| `django.json` | Django | Python web framework |
+| `fastapi.json` | FastAPI | Modern Python API |
+| `ansible-core.json` | Ansible Core 2.19 | Automation & configuration |
+
+### Using Presets
+
+```bash
+# Godot
+skill-seekers scrape --config configs/godot.json
+
+# React
+skill-seekers scrape --config configs/react.json
+
+# Vue
+skill-seekers scrape --config configs/vue.json
+
+# Django
+skill-seekers scrape --config configs/django.json
+
+# FastAPI
+skill-seekers scrape --config configs/fastapi.json
+
+# Ansible
+skill-seekers scrape --config configs/ansible-core.json
+```
+
+## 🎨 Creating Your Own Config
+
+### Option 1: Interactive
+
+```bash
+skill-seekers scrape --interactive
+# Follow prompts, it will create the config for you
+```
+
+### Option 2: Copy and Edit
+
+```bash
+# Copy a preset
+cp configs/react.json configs/myframework.json
+
+# Edit it
+nano configs/myframework.json
+
+# Use it
+skill-seekers scrape --config configs/myframework.json
+```
+
+### Config Structure
+
+```json
+{
+  "name": "myframework",
+  "description": "When to use this skill",
+  "base_url": "https://docs.myframework.com/",
+  "selectors": {
+    "main_content": "article",
+    "title": "h1",
+    "code_blocks": "pre code"
+  },
+  "url_patterns": {
+    "include": ["/docs", "/guide"],
+    "exclude": ["/blog", "/about"]
+  },
+  "categories": {
+    "getting_started": ["intro", "quickstart"],
+    "api": ["api", "reference"]
+  },
+  "rate_limit": 0.5,
+  "max_pages": 500
+}
+```
+
+## 📊 What Gets Created
+
+```
+output/
+├── godot_data/              # Scraped raw data
+│   ├── pages/              # JSON files (one per page)
+│   └── summary.json        # Overview
+│
+└── godot/                   # The skill
+    ├── SKILL.md            # Enhanced with real examples
+    ├── references/         # Categorized docs
+    │   ├── index.md
+    │   ├── getting_started.md
+    │   ├── scripting.md
+    │   └── ...
+    ├── scripts/            # Empty (add your own)
+    └── assets/             # Empty (add your own)
+```
+
+## 🎯 Command Line Options
+
+```bash
+# Interactive mode
+skill-seekers scrape --interactive
+
+# Use config file
+skill-seekers scrape --config configs/godot.json
+
+# Quick mode
+skill-seekers scrape --name react --url https://react.dev/
+
+# Skip scraping (use existing data)
+skill-seekers scrape --config configs/godot.json --skip-scrape
+
+# With description
+skill-seekers scrape \
+  --name react \
+  --url https://react.dev/ \
+  --description "React framework for building UIs"
+```
+
+## 💡 Tips
+
+### 1. Test Small First
+
+Edit `max_pages` in config to test:
+```json
+{
+  "max_pages": 20  // Test with just 20 pages
+}
+```
+
+### 2. Reuse Scraped Data
+
+```bash
+# Scrape once
+skill-seekers scrape --config configs/react.json
+
+# Rebuild multiple times (instant)
+skill-seekers scrape --config configs/react.json --skip-scrape
+skill-seekers scrape --config configs/react.json --skip-scrape
+```
+
+### 3. Finding Selectors
+
+```python
+# Test in Python
+from bs4 import BeautifulSoup
+import requests
+
+url = "https://docs.example.com/page"
+soup = BeautifulSoup(requests.get(url).content, 'html.parser')
+
+# Try different selectors
+print(soup.select_one('article'))
+print(soup.select_one('main'))
+print(soup.select_one('div[role="main"]'))
+```
+
+### 4. Check Output Quality
+
+```bash
+# After building, check:
+cat output/godot/SKILL.md  # Should have real examples
+cat output/godot/references/index.md  # Categories
+```
+
+## 🐛 Troubleshooting
+
+### No Content Extracted?
+- Check your `main_content` selector
+- Try: `article`, `main`, `div[role="main"]`
+
+### Data Exists But Won't Use It?
+```bash
+# Force re-scrape
+rm -rf output/myframework_data/
+skill-seekers scrape --config configs/myframework.json
+```
+
+### Categories Not Good?
+Edit the config `categories` section with better keywords.
+
+### Want to Update Docs?
+```bash
+# Delete old data
+rm -rf output/godot_data/
+
+# Re-scrape
+skill-seekers scrape --config configs/godot.json
+```
+
+## 📈 Performance
+
+| Task | Time | Notes |
+|------|------|-------|
+| Scraping (sync) | 15-45 min | First time only, thread-based |
+| Scraping (async) | 5-15 min | 2-3x faster with --async flag |
+| Building | 1-3 min | Fast! |
+| Re-building | <1 min | With --skip-scrape |
+| Packaging | 5-10 sec | Final zip |
+
+## ✅ Summary
+
+**One tool does everything:**
+1. ✅ Scrapes documentation
+2. ✅ Auto-detects existing data
+3. ✅ Generates better knowledge
+4. ✅ Creates enhanced skills
+5. ✅ Works with presets or custom configs
+6. ✅ Supports skip-scraping for fast iteration
+
+**Simple structure:**
+- `doc_scraper.py` - The tool
+- `configs/` - Presets
+- `output/` - Everything else
+
+**Better output:**
+- Real code examples with language detection
+- Common patterns extracted from docs
+- Smart categorization
+- Enhanced SKILL.md with actual examples
+
+## 📚 Documentation
+
+### Getting Started
+- **[BULLETPROOF_QUICKSTART.md](BULLETPROOF_QUICKSTART.md)** - 🎯 **START HERE** if you're new!
+- **[QUICKSTART.md](QUICKSTART.md)** - Quick start for experienced users
+- **[TROUBLESHOOTING.md](TROUBLESHOOTING.md)** - Common issues and solutions
+
+### Guides
+- **[docs/LARGE_DOCUMENTATION.md](docs/LARGE_DOCUMENTATION.md)** - Handle 10K-40K+ page docs
+- **[ASYNC_SUPPORT.md](ASYNC_SUPPORT.md)** - Async mode guide (2-3x faster scraping)
+- **[docs/ENHANCEMENT.md](docs/ENHANCEMENT.md)** - AI enhancement guide
+- **[docs/TERMINAL_SELECTION.md](docs/TERMINAL_SELECTION.md)** - Configure terminal app for local enhancement
+- **[docs/UPLOAD_GUIDE.md](docs/UPLOAD_GUIDE.md)** - How to upload skills to Claude
+- **[docs/MCP_SETUP.md](docs/MCP_SETUP.md)** - MCP integration setup
+
+### Technical
+- **[docs/CLAUDE.md](docs/CLAUDE.md)** - Technical architecture
+- **[STRUCTURE.md](STRUCTURE.md)** - Repository structure
+
+## 🎮 Ready?
+
+```bash
+# Try Godot
+skill-seekers scrape --config configs/godot.json
+
+# Try React
+skill-seekers scrape --config configs/react.json
+
+# Or go interactive
+skill-seekers scrape --interactive
+```
+
+## 📝 License
+
+MIT License - see [LICENSE](LICENSE) file for details
+
+---
+
+Happy skill building! 🚀

+ 266 - 0
libs/external/Skill_Seekers-development/ROADMAP.md

@@ -0,0 +1,266 @@
+# Skill Seeker Development Roadmap
+
+## Vision
+Transform Skill Seeker into the easiest way to create Claude AI skills from **any knowledge source** - documentation websites, PDFs, codebases, GitHub repos, Office docs, and more - with both CLI and MCP interfaces.
+
+## 🎯 New Approach: Flexible, Incremental Development
+
+**Philosophy:** Small tasks → Pick one → Complete → Move on
+
+Instead of rigid milestones, we now use a **flexible task-based approach**:
+- 100+ small, independent tasks across 10 categories
+- Pick any task, any order
+- Start small, ship often
+- No deadlines, just continuous progress
+
+**See:** [FLEXIBLE_ROADMAP.md](FLEXIBLE_ROADMAP.md) for the complete task list!
+
+---
+
+## 🎯 Milestones
+
+### ✅ v1.0 - Production Release (COMPLETED - Oct 19, 2025)
+**Released:** October 19, 2025 | **Tag:** v1.0.0
+
+#### Core Features ✅
+- [x] Documentation scraping with BFS
+- [x] Smart categorization
+- [x] Language detection
+- [x] Pattern extraction
+- [x] 12 preset configurations (Godot, React, Vue, Django, FastAPI, Tailwind, Kubernetes, Astro, etc.)
+- [x] Comprehensive test suite (14 tests, 100% pass rate)
+
+#### MCP Integration ✅
+- [x] Monorepo refactor (cli/ and mcp/)
+- [x] MCP server with 9 tools (fully functional)
+- [x] All MCP tools tested and working
+- [x] Complete MCP documentation
+- [x] Setup automation (setup_mcp.sh)
+
+#### Large Documentation Support ✅
+- [x] Config splitting for 40K+ page docs
+- [x] Router/hub skill generation
+- [x] Checkpoint/resume functionality
+- [x] Parallel scraping support
+
+#### Auto-Upload Feature ✅
+- [x] Smart API key detection
+- [x] Automatic upload to Claude
+- [x] Cross-platform folder opening
+- [x] Graceful fallback to manual upload
+
+**Statistics:**
+- 9 MCP tools (fully working)
+- 12 preset configurations
+- 14/14 tests passing (100%)
+- ~3,800 lines of code
+- Complete documentation suite
+
+---
+
+## 📋 Task Categories (Flexible Development)
+
+See [FLEXIBLE_ROADMAP.md](FLEXIBLE_ROADMAP.md) for detailed task breakdown.
+
+### Category Summary:
+- **🌐 Community & Sharing** - Config/knowledge sharing website features
+- **🛠️ New Input Formats** - PDF, Word, Excel, Markdown support
+- **💻 Codebase Knowledge** - GitHub repos, local code scraping
+- **🔌 Context7 Integration** - Enhanced context management
+- **🚀 MCP Enhancements** - New tools and quality improvements
+- **⚡ Performance & Reliability** - Core improvements
+- **🎨 Tools & Utilities** - Standalone helper tools
+- **📚 Community Response** - Address GitHub issues
+- **🎓 Content & Documentation** - Videos and guides
+- **🧪 Testing & Quality** - Test coverage expansion
+
+---
+
+### ~~📋 v1.1 - Website Launch (PLANNED)~~ → Now flexible tasks!
+**Goal:** Create professional website and community presence
+**Timeline:** November 2025 (Due: Nov 3, 2025)
+
+**Features:**
+- Professional landing page (skillseekersweb.com)
+- Documentation migration to website
+- Preset showcase gallery (interactive)
+- Blog with release notes and tutorials
+- SEO optimization
+- Analytics integration
+
+**Community:**
+- Video tutorial series
+- Contributing guidelines
+- Issue templates and workflows
+- GitHub Project board
+- Community engagement
+
+---
+
+### 📋 v1.2 - Core Improvements (PLANNED)
+**Goal:** Address technical debt and performance
+**Timeline:** Late November 2025
+
+**Technical Enhancements:**
+- URL normalization/deduplication
+- Memory optimization for large docs
+- HTML parser fallback (lxml)
+- Selector validation tool
+- Incremental update system
+
+**MCP Enhancements:**
+- Interactive config wizard via MCP
+- Real-time progress updates
+- Auto-detect documentation patterns
+- Enhanced error handling and logging
+- Batch operations
+
+---
+
+### 📋 v2.0 - Intelligence Layer (PLANNED)
+**Goal:** Smart defaults and auto-configuration
+**Timeline:** December 2025
+
+**Features:**
+- **Auto-detection:**
+  - Automatically find best selectors
+  - Detect documentation framework (Docusaurus, GitBook, etc.)
+  - Suggest optimal rate_limit and max_pages
+
+- **Quality Metrics:**
+  - Analyze generated SKILL.md quality
+  - Suggest improvements
+  - Validate code examples
+
+- **Templates:**
+  - Pre-built configs for popular frameworks
+  - Community config sharing
+  - One-click generation for common docs
+
+**Example:**
+```
+User: "Create skill from https://tailwindcss.com/docs"
+Tool: Auto-detects Tailwind, uses template, generates in 30 seconds
+```
+
+---
+
+### 💭 v3.0 - Platform Features (IDEAS)
+**Goal:** Build ecosystem around skill generation
+
+**Possible Features:**
+- Web UI for config generation
+- GitHub Actions integration
+- Skill marketplace
+- Analytics dashboard
+- API for programmatic access
+
+---
+
+## 🎨 Feature Ideas
+
+### High Priority
+1. **Selector Auto-Detection** - Analyze page, suggest selectors
+2. **Progress Streaming** - Real-time updates during scraping
+3. **Config Validation UI** - Visual feedback on config quality
+4. **Batch Processing** - Handle multiple sites at once
+
+### Medium Priority
+5. **Skill Quality Score** - Rate generated skills
+6. **Enhanced SKILL.md** - Better templates, more examples
+7. **Documentation Framework Detection** - Auto-detect Docusaurus, VuePress, etc.
+8. **Custom Categories AI** - Use AI to suggest categories
+
+### Low Priority
+9. **Web Dashboard** - Browser-based interface
+10. **Skill Analytics** - Track usage, quality metrics
+11. **Community Configs** - Share and discover configs
+12. **Plugin System** - Extend with custom scrapers
+
+---
+
+## 🔬 Research Areas
+
+### MCP Enhancements
+- [ ] Investigate MCP progress/streaming APIs
+- [ ] Test MCP with large documentation sites
+- [ ] Explore MCP caching strategies
+
+### AI Integration
+- [ ] Use Claude to auto-generate categories
+- [ ] AI-powered selector detection
+- [ ] Quality analysis with LLMs
+
+### Performance
+- [ ] Parallel scraping
+- [ ] Incremental updates
+- [ ] Smart caching
+
+---
+
+## 📊 Metrics & Goals
+
+### Current State (Oct 20, 2025) ✅
+- ✅ 12 preset configs (Godot, React, Vue, Django, FastAPI, Tailwind, Kubernetes, Astro, etc.)
+- ✅ 14/14 tests (100% pass rate)
+- ✅ 9 MCP tools (fully functional)
+- ✅ ~3,800 lines of code
+- ✅ Complete documentation suite
+- ✅ Production-ready v1.0.0 release
+- ✅ Auto-upload functionality
+- ✅ Large documentation support (40K+ pages)
+
+### Goals for v1.1 (Website Launch)
+- 🎯 Professional website live
+- 🎯 Video tutorial series (5 videos)
+- 🎯 20+ GitHub stars
+- 🎯 Community engagement started
+- 🎯 Documentation site migration
+
+### Goals for v1.2 (Core Improvements)
+- 🎯 Enhanced MCP features
+- 🎯 Performance optimization
+- 🎯 Better error handling
+- 🎯 Incremental update system
+
+### Goals for v2.0 (Intelligence)
+- 🎯 50+ preset configs
+- 🎯 Auto-detection for 80%+ of sites
+- 🎯 <1 minute skill generation
+- 🎯 Community contributions
+- 🎯 Quality scoring system
+
+---
+
+## 🤝 Contributing
+
+See [CONTRIBUTING.md](CONTRIBUTING.md) for:
+- How to add new MCP tools
+- Testing guidelines
+- Code style
+- PR process
+
+---
+
+## 📅 Release Schedule
+
+| Version | Target Date | Status | Focus |
+|---------|-------------|--------|-------|
+| v1.0.0 | Oct 19, 2025 | ✅ **RELEASED** | Core CLI + MCP Integration |
+| v1.1.0 | Nov 3, 2025 | 📋 Planned | Website Launch |
+| v1.2.0 | Late Nov 2025 | 📋 Planned | Core Improvements |
+| v2.0.0 | Dec 2025 | 📋 Planned | Intelligence Layer |
+| v3.0.0 | Q1 2026 | 💭 Ideas | Platform Features |
+
+---
+
+## 🔗 Related Projects
+
+- [Model Context Protocol](https://modelcontextprotocol.io/)
+- [Claude Code](https://claude.ai/code)
+- [Anthropic Claude](https://claude.ai)
+- Documentation frameworks we support: Docusaurus, GitBook, VuePress, Sphinx, MkDocs
+
+---
+
+**Last Updated:** October 20, 2025

+ 124 - 0
libs/external/Skill_Seekers-development/STRUCTURE.md

@@ -0,0 +1,124 @@
+# Repository Structure
+
+```
+Skill_Seekers/
+│
+├── 📄 Root Documentation
+│   ├── README.md                  # Main documentation (start here!)
+│   ├── CLAUDE.md                  # Quick reference for Claude Code
+│   ├── QUICKSTART.md              # 3-step quick start guide
+│   ├── ROADMAP.md                 # Development roadmap
+│   ├── TODO.md                    # Current sprint tasks
+│   ├── STRUCTURE.md               # This file
+│   ├── LICENSE                    # MIT License
+│   └── .gitignore                 # Git ignore rules
+│
+├── 🔧 CLI Tools (cli/)
+│   ├── doc_scraper.py             # Main scraping tool
+│   ├── estimate_pages.py          # Page count estimator
+│   ├── enhance_skill.py           # AI enhancement (API-based)
+│   ├── enhance_skill_local.py     # AI enhancement (LOCAL, no API)
+│   ├── package_skill.py           # Skill packaging tool
+│   └── run_tests.py               # Test runner
+│
+├── 🌐 MCP Server (mcp/)
+│   ├── server.py                  # Main MCP server
+│   ├── requirements.txt           # MCP dependencies
+│   └── README.md                  # MCP setup guide
+│
+├── 📁 configs/                    # Preset configurations
+│   ├── godot.json
+│   ├── react.json
+│   ├── vue.json
+│   ├── django.json
+│   ├── fastapi.json
+│   ├── kubernetes.json
+│   └── steam-economy-complete.json
+│
+├── 🧪 tests/                      # Test suite (71 tests, 100% pass rate)
+│   ├── test_config_validation.py
+│   ├── test_integration.py
+│   └── test_scraper_features.py
+│
+├── 📚 docs/                       # Detailed documentation
+│   ├── CLAUDE.md                  # Technical architecture
+│   ├── ENHANCEMENT.md             # AI enhancement guide
+│   ├── USAGE.md                   # Complete usage guide
+│   ├── TESTING.md                 # Testing guide
+│   └── UPLOAD_GUIDE.md            # How to upload skills
+│
+├── 🔀 .github/                    # GitHub configuration
+│   ├── SETUP_GUIDE.md             # GitHub project setup
+│   ├── ISSUES_TO_CREATE.md        # Issue templates
+│   └── ISSUE_TEMPLATE/            # Issue templates
+│
+└── 📦 output/                     # Generated skills (git-ignored)
+    ├── {name}_data/               # Scraped raw data (cached)
+    └── {name}/                    # Built skills
+        ├── SKILL.md               # Main skill file
+        └── references/            # Reference documentation
+```
+
+## Key Files
+
+### For Users:
+- **README.md** - Start here for overview and installation
+- **QUICKSTART.md** - Get started in 3 steps
+- **configs/** - 7 ready-to-use presets
+- **mcp/README.md** - MCP server setup for Claude Code
+
+### For CLI Usage:
+- **cli/doc_scraper.py** - Main scraping tool
+- **cli/estimate_pages.py** - Page count estimator
+- **cli/enhance_skill_local.py** - Local enhancement (no API key)
+- **cli/package_skill.py** - Package skills to .zip
+
+### For MCP Usage (Claude Code):
+- **mcp/server.py** - MCP server (6 tools)
+- **mcp/README.md** - Setup instructions
+- **configs/** - Shared configurations
+
+### For Developers:
+- **docs/CLAUDE.md** - Architecture and internals
+- **docs/USAGE.md** - Complete usage guide
+- **docs/TESTING.md** - Testing guide
+- **tests/** - 71 tests (100% pass rate)
+
+### For Contributors:
+- **ROADMAP.md** - Development roadmap
+- **TODO.md** - Current sprint tasks
+- **.github/SETUP_GUIDE.md** - GitHub setup
+- **LICENSE** - MIT License
+
+## Architecture
+
+### Monorepo Structure
+
+The repository is organized as a monorepo with two main components:
+
+1. **CLI Tools** (`cli/`): Standalone Python scripts for direct command-line usage
+2. **MCP Server** (`mcp/`): Model Context Protocol server for Claude Code integration
+
+Both components share the same configuration files and output directory.
+
+### Data Flow
+
+```
+Config (configs/*.json)
+  ↓
+CLI Tools OR MCP Server
+  ↓
+Scraper (cli/doc_scraper.py)
+  ↓
+Output (output/{name}_data/)
+  ↓
+Builder (cli/doc_scraper.py)
+  ↓
+Skill (output/{name}/)
+  ↓
+Enhancer (optional)
+  ↓
+Packager (cli/package_skill.py)
+  ↓
+Skill .zip (output/{name}.zip)
+```

+ 446 - 0
libs/external/Skill_Seekers-development/TROUBLESHOOTING.md

@@ -0,0 +1,446 @@
+# Troubleshooting Guide
+
+Common issues and solutions when using Skill Seeker.
+
+---
+
+## Installation Issues
+
+### Python Not Found
+
+**Error:**
+```
+python3: command not found
+```
+
+**Solutions:**
+1. **Check if Python is installed:**
+   ```bash
+   which python3
+   python --version  # Try without the 3
+   ```
+
+2. **Install Python:**
+   - **macOS:** `brew install python3`
+   - **Linux:** `sudo apt install python3 python3-pip`
+   - **Windows:** Download from python.org, check "Add to PATH"
+
+3. **Use python instead of python3:**
+   ```bash
+   python cli/doc_scraper.py --help
+   ```
+
+### Module Not Found
+
+**Error:**
+```
+ModuleNotFoundError: No module named 'requests'
+ModuleNotFoundError: No module named 'bs4'
+ModuleNotFoundError: No module named 'mcp'
+```
+
+**Solutions:**
+1. **Install dependencies:**
+   ```bash
+   pip3 install requests beautifulsoup4
+   pip3 install -r mcp/requirements.txt  # For MCP
+   ```
+
+2. **Use --user flag if permission denied:**
+   ```bash
+   pip3 install --user requests beautifulsoup4
+   ```
+
+3. **Check pip is working:**
+   ```bash
+   pip3 --version
+   ```
+
+### Permission Denied
+
+**Error:**
+```
+Permission denied: '/usr/local/lib/python3.x/...'
+```
+
+**Solutions:**
+1. **Use --user flag:**
+   ```bash
+   pip3 install --user requests beautifulsoup4
+   ```
+
+2. **Use sudo (not recommended):**
+   ```bash
+   sudo pip3 install requests beautifulsoup4
+   ```
+
+3. **Use virtual environment (best practice):**
+   ```bash
+   python3 -m venv venv
+   source venv/bin/activate
+   pip install requests beautifulsoup4
+   ```
+
+---
+
+## Runtime Issues
+
+### File Not Found
+
+**Error:**
+```
+FileNotFoundError: [Errno 2] No such file or directory: 'cli/doc_scraper.py'
+```
+
+**Solutions:**
+1. **Check you're in the Skill_Seekers directory:**
+   ```bash
+   pwd
+   # Should show: .../Skill_Seekers
+
+   ls
+   # Should show: README.md, cli/, mcp/, configs/
+   ```
+
+2. **Change to the correct directory:**
+   ```bash
+   cd ~/Projects/Skill_Seekers  # Adjust path
+   ```
+
+### Config File Not Found
+
+**Error:**
+```
+FileNotFoundError: configs/react.json
+```
+
+**Solutions:**
+1. **Check config exists:**
+   ```bash
+   ls configs/
+   # Should show: godot.json, react.json, vue.json, etc.
+   ```
+
+2. **Use full path:**
+   ```bash
+   skill-seekers scrape --config $(pwd)/configs/react.json
+   ```
+
+3. **Create missing config:**
+   ```bash
+   skill-seekers scrape --interactive
+   ```
+
+---
+
+## MCP Setup Issues
+
+### MCP Server Not Loading
+
+**Symptoms:**
+- Tools don't appear in Claude Code
+- "List all available configs" doesn't work
+
+**Solutions:**
+
+1. **Check configuration file:**
+   ```bash
+   cat ~/.config/claude-code/mcp.json
+   ```
+
+2. **Verify paths are ABSOLUTE (not placeholders):**
+   ```json
+   {
+     "mcpServers": {
+       "skill-seeker": {
+         "args": [
+           "/Users/yourname/Projects/Skill_Seekers/mcp/server.py"
+         ]
+       }
+     }
+   }
+   ```
+   ❌ **Bad:** `$REPO_PATH` or `/path/to/Skill_Seekers`
+   ✅ **Good:** `/Users/john/Projects/Skill_Seekers`
+
+3. **Test server manually:**
+   ```bash
+   cd ~/Projects/Skill_Seekers
+   python3 mcp/server.py
+   # Should start without errors (Ctrl+C to stop)
+   ```
+
+4. **Re-run setup script:**
+   ```bash
+   ./setup_mcp.sh
+   # Select "y" for auto-configure
+   ```
+
+5. **RESTART Claude Code completely:**
+   - Quit (don't just close window)
+   - Reopen
+
+### Placeholder Paths in Config
+
+**Problem:** Config has `$REPO_PATH` or `/Users/username/` instead of real paths
+
+**Solution:**
+```bash
+# Get your actual path
+cd ~/Projects/Skill_Seekers
+pwd
+# Copy this path
+
+# Edit config
+nano ~/.config/claude-code/mcp.json
+
+# Replace ALL instances of placeholders with your actual path
+# Save (Ctrl+O, Enter, Ctrl+X)
+
+# Restart Claude Code
+```
+
+### Tools Appear But Don't Work
+
+**Symptoms:**
+- Tools listed but commands fail
+- "Error executing tool" messages
+
+**Solutions:**
+
+1. **Check working directory:**
+   ```json
+   {
+     "cwd": "/FULL/PATH/TO/Skill_Seekers"
+   }
+   ```
+
+2. **Verify files exist:**
+   ```bash
+   ls cli/doc_scraper.py
+   ls mcp/server.py
+   ```
+
+3. **Test CLI tools directly:**
+   ```bash
+   skill-seekers scrape --help
+   ```
+
+---
+
+## Scraping Issues
+
+### Slow or Hanging
+
+**Solutions:**
+
+1. **Check network connection:**
+   ```bash
+   ping google.com
+   curl -I https://docs.yoursite.com
+   ```
+
+2. **Use smaller max_pages for testing:**
+   ```bash
+   skill-seekers scrape --config configs/test.json --max-pages 5
+   ```
+
+3. **Increase rate_limit in config:**
+   ```json
+   {
+     "rate_limit": 1.0  // Increase from 0.5
+   }
+   ```
+
+### No Content Extracted
+
+**Problem:** Pages scraped but content is empty
+
+**Solutions:**
+
+1. **Check selector in config:**
+   ```bash
+   # Test with browser dev tools
+   # Look for: article, main, div[role="main"], div.content
+   ```
+
+2. **Verify website is accessible:**
+   ```bash
+   curl https://docs.example.com
+   ```
+
+3. **Try different selectors:**
+   ```json
+   {
+     "selectors": {
+       "main_content": "article"  // Try: main, div.content, etc.
+     }
+   }
+   ```
+
+### Rate Limiting / 429 Errors
+
+**Error:**
+```
+HTTP Error 429: Too Many Requests
+```
+
+**Solutions:**
+
+1. **Increase rate_limit:**
+   ```json
+   {
+     "rate_limit": 2.0  // Wait 2 seconds between requests
+   }
+   ```
+
+2. **Reduce max_pages:**
+   ```json
+   {
+     "max_pages": 50  // Scrape fewer pages
+   }
+   ```
+
+3. **Try again later:**
+   ```bash
+   # Wait an hour and retry
+   ```
+
+---
+
+## Platform-Specific Issues
+
+### macOS
+
+**Issue:** Can't run `./setup_mcp.sh`
+
+**Solution:**
+```bash
+chmod +x setup_mcp.sh
+./setup_mcp.sh
+```
+
+**Issue:** Homebrew not installed
+
+**Solution:**
+```bash
+/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
+```
+
+### Linux
+
+**Issue:** pip3 not found
+
+**Solution:**
+```bash
+sudo apt update
+sudo apt install python3-pip
+```
+
+**Issue:** Permission errors
+
+**Solution:**
+```bash
+# Use --user flag
+pip3 install --user requests beautifulsoup4
+```
+
+### Windows (WSL)
+
+**Issue:** Python not in PATH
+
+**Solution:**
+1. Reinstall Python
+2. Check "Add Python to PATH"
+3. Or add manually to PATH
+
+**Issue:** Line ending errors
+
+**Solution:**
+```bash
+dos2unix setup_mcp.sh
+./setup_mcp.sh
+```
+
+---
+
+## Verification Commands
+
+Use these to check your setup:
+
+```bash
+# 1. Check Python
+python3 --version  # Should be 3.10+
+
+# 2. Check dependencies
+pip3 list | grep requests
+pip3 list | grep beautifulsoup4
+pip3 list | grep mcp
+
+# 3. Check files exist
+ls cli/doc_scraper.py
+ls mcp/server.py
+ls configs/
+
+# 4. Check MCP config
+cat ~/.config/claude-code/mcp.json
+
+# 5. Test scraper
+skill-seekers scrape --help
+
+# 6. Test MCP server
+timeout 3 python3 mcp/server.py || echo "Server OK"
+
+# 7. Check git repo
+git status
+git log --oneline -5
+```
+
+---
+
+## Getting Help
+
+If none of these solutions work:
+
+1. **Check existing issues:**
+   https://github.com/yusufkaraaslan/Skill_Seekers/issues
+
+2. **Open a new issue with:**
+   - Your OS (macOS 13, Ubuntu 22.04, etc.)
+   - Python version (`python3 --version`)
+   - Full error message
+   - What command you ran
+   - Output of verification commands above
+
+3. **Include this debug info:**
+   ```bash
+   # System info
+   uname -a
+   python3 --version
+   pip3 --version
+
+   # Skill Seeker info
+   cd ~/Projects/Skill_Seekers  # Your path
+   pwd
+   git log --oneline -1
+   ls -la cli/ mcp/ configs/
+
+   # MCP config (if using MCP)
+   cat ~/.config/claude-code/mcp.json
+   ```
+
+---
+
+## Quick Fixes Checklist
+
+- [ ] In the Skill_Seekers directory? (`pwd`)
+- [ ] Python 3.10+ installed? (`python3 --version`)
+- [ ] Dependencies installed? (`pip3 list | grep requests`)
+- [ ] Config file exists? (`ls configs/yourconfig.json`)
+- [ ] Internet connection working? (`ping google.com`)
+- [ ] For MCP: Config uses absolute paths? (not `$REPO_PATH`)
+- [ ] For MCP: Claude Code restarted? (quit and reopen)
+
+---
+
+**Still stuck?** Open an issue: https://github.com/yusufkaraaslan/Skill_Seekers/issues/new

+ 31 - 0
libs/external/Skill_Seekers-development/configs/ansible-core.json

@@ -0,0 +1,31 @@
+{
+  "name": "ansible-core",
+  "description": "Ansible Core 2.19 skill for automation and configuration management",
+  "base_url": "https://docs.ansible.com/ansible-core/2.19/",
+  "selectors": {
+    "main_content": "div[role=main]",
+    "title": "title",
+    "code_blocks": "pre"
+  },
+  "url_patterns": {
+    "include": [],
+    "exclude": ["/_static/", "/_images/", "/_downloads/", "/search.html", "/genindex.html", "/py-modindex.html", "/index.html", "/roadmap/"]
+  },
+  "categories": {
+    "getting_started": ["getting_started", "getting-started", "introduction", "overview"],
+    "installation": ["installation_guide", "installation", "setup"],
+    "inventory": ["inventory_guide", "inventory"],
+    "playbooks": ["playbook_guide", "playbooks", "playbook"],
+    "modules": ["module_plugin_guide", "modules", "plugins"],
+    "collections": ["collections_guide", "collections"],
+    "vault": ["vault_guide", "vault", "encryption"],
+    "commands": ["command_guide", "commands", "cli"],
+    "porting": ["porting_guides", "porting", "migration"],
+    "os_specific": ["os_guide", "platform"],
+    "tips": ["tips_tricks", "tips", "tricks", "best-practices"],
+    "community": ["community", "contributing", "contributions"],
+    "development": ["dev_guide", "development", "developing"]
+  },
+  "rate_limit": 0.5,
+  "max_pages": 800
+}

+ 30 - 0
libs/external/Skill_Seekers-development/configs/astro.json

@@ -0,0 +1,30 @@
+{
+  "name": "astro",
+  "description": "Astro web framework for content-focused websites. Use for Astro components, islands architecture, content collections, SSR/SSG, and modern web development.",
+  "base_url": "https://docs.astro.build/en/getting-started/",
+  "start_urls": [
+    "https://docs.astro.build/en/getting-started/",
+    "https://docs.astro.build/en/install/auto/",
+    "https://docs.astro.build/en/core-concepts/project-structure/",
+    "https://docs.astro.build/en/core-concepts/astro-components/",
+    "https://docs.astro.build/en/core-concepts/astro-pages/"
+  ],
+  "selectors": {
+    "main_content": "article",
+    "title": "h1",
+    "code_blocks": "pre code"
+  },
+  "url_patterns": {
+    "include": ["/en/"],
+    "exclude": ["/blog", "/integrations"]
+  },
+  "categories": {
+    "getting_started": ["getting-started", "install", "tutorial"],
+    "core_concepts": ["core-concepts", "project-structure", "components", "pages"],
+    "guides": ["guides", "deploy", "migrate"],
+    "configuration": ["configuration", "config", "typescript"],
+    "integrations": ["integrations", "framework", "adapter"]
+  },
+  "rate_limit": 0.5,
+  "max_pages": 100
+}

+ 37 - 0
libs/external/Skill_Seekers-development/configs/claude-code.json

@@ -0,0 +1,37 @@
+{
+  "name": "claude-code",
+  "description": "Claude Code CLI and development environment. Use for Claude Code features, tools, workflows, MCP integration, configuration, and AI-assisted development.",
+  "base_url": "https://docs.claude.com/en/docs/claude-code/",
+  "start_urls": [
+    "https://docs.claude.com/en/docs/claude-code/overview",
+    "https://docs.claude.com/en/docs/claude-code/quickstart",
+    "https://docs.claude.com/en/docs/claude-code/common-workflows",
+    "https://docs.claude.com/en/docs/claude-code/mcp",
+    "https://docs.claude.com/en/docs/claude-code/settings",
+    "https://docs.claude.com/en/docs/claude-code/troubleshooting",
+    "https://docs.claude.com/en/docs/claude-code/iam"
+  ],
+  "selectors": {
+    "main_content": "#content-container",
+    "title": "h1",
+    "code_blocks": "pre code"
+  },
+  "url_patterns": {
+    "include": ["/claude-code/"],
+    "exclude": ["/api-reference/", "/claude-ai/", "/claude.ai/", "/prompt-engineering/", "/changelog/"]
+  },
+  "categories": {
+    "getting_started": ["overview", "quickstart", "installation", "setup", "terminal-config"],
+    "workflows": ["workflow", "common-workflows", "git", "testing", "debugging", "interactive"],
+    "mcp": ["mcp", "model-context-protocol"],
+    "configuration": ["config", "settings", "preferences", "customize", "hooks", "statusline", "model-config", "memory", "output-styles"],
+    "agents": ["agent", "task", "subagent", "sub-agent", "specialized"],
+    "skills": ["skill", "agent-skill"],
+    "integrations": ["ide-integrations", "vs-code", "jetbrains", "plugin", "marketplace"],
+    "deployment": ["bedrock", "vertex", "deployment", "network", "gateway", "devcontainer", "sandboxing", "third-party"],
+    "reference": ["reference", "api", "command", "cli-reference", "slash", "checkpointing", "headless", "sdk"],
+    "enterprise": ["iam", "security", "monitoring", "analytics", "costs", "legal", "data-usage"]
+  },
+  "rate_limit": 0.5,
+  "max_pages": 200
+}

+ 34 - 0
libs/external/Skill_Seekers-development/configs/django.json

@@ -0,0 +1,34 @@
+{
+  "name": "django",
+  "description": "Django web framework for Python. Use for Django models, views, templates, ORM, authentication, and web development.",
+  "base_url": "https://docs.djangoproject.com/en/stable/",
+  "start_urls": [
+    "https://docs.djangoproject.com/en/stable/intro/",
+    "https://docs.djangoproject.com/en/stable/topics/db/models/",
+    "https://docs.djangoproject.com/en/stable/topics/http/views/",
+    "https://docs.djangoproject.com/en/stable/topics/templates/",
+    "https://docs.djangoproject.com/en/stable/topics/forms/",
+    "https://docs.djangoproject.com/en/stable/topics/auth/",
+    "https://docs.djangoproject.com/en/stable/ref/models/"
+  ],
+  "selectors": {
+    "main_content": "article",
+    "title": "h1",
+    "code_blocks": "pre"
+  },
+  "url_patterns": {
+    "include": ["/intro/", "/topics/", "/ref/", "/howto/"],
+    "exclude": ["/faq/", "/misc/", "/releases/"]
+  },
+  "categories": {
+    "getting_started": ["intro", "tutorial", "install"],
+    "models": ["models", "database", "orm", "queries"],
+    "views": ["views", "urlconf", "routing"],
+    "templates": ["templates", "template"],
+    "forms": ["forms", "form"],
+    "authentication": ["auth", "authentication", "user"],
+    "api": ["ref", "reference"]
+  },
+  "rate_limit": 0.3,
+  "max_pages": 500
+}

+ 49 - 0
libs/external/Skill_Seekers-development/configs/django_unified.json

@@ -0,0 +1,49 @@
+{
+  "name": "django",
+  "description": "Complete Django framework knowledge combining official documentation and Django codebase. Use when building Django applications, understanding ORM internals, or debugging Django issues.",
+  "merge_mode": "rule-based",
+  "sources": [
+    {
+      "type": "documentation",
+      "base_url": "https://docs.djangoproject.com/en/stable/",
+      "extract_api": true,
+      "selectors": {
+        "main_content": "article",
+        "title": "h1",
+        "code_blocks": "pre"
+      },
+      "url_patterns": {
+        "include": [],
+        "exclude": ["/search/", "/genindex/"]
+      },
+      "categories": {
+        "getting_started": ["intro", "tutorial", "install"],
+        "models": ["models", "orm", "queries", "database"],
+        "views": ["views", "urls", "templates"],
+        "forms": ["forms", "modelforms"],
+        "admin": ["admin"],
+        "api": ["ref/"],
+        "topics": ["topics/"],
+        "security": ["security", "csrf", "authentication"]
+      },
+      "rate_limit": 0.5,
+      "max_pages": 300
+    },
+    {
+      "type": "github",
+      "repo": "django/django",
+      "include_issues": true,
+      "max_issues": 100,
+      "include_changelog": true,
+      "include_releases": true,
+      "include_code": true,
+      "code_analysis_depth": "surface",
+      "file_patterns": [
+        "django/db/**/*.py",
+        "django/views/**/*.py",
+        "django/forms/**/*.py",
+        "django/contrib/admin/**/*.py"
+      ]
+    }
+  ]
+}

+ 17 - 0
libs/external/Skill_Seekers-development/configs/example_pdf.json

@@ -0,0 +1,17 @@
+{
+  "name": "example_manual",
+  "description": "Example PDF documentation skill",
+  "pdf_path": "docs/manual.pdf",
+  "extract_options": {
+    "chunk_size": 10,
+    "min_quality": 5.0,
+    "extract_images": true,
+    "min_image_size": 100
+  },
+  "categories": {
+    "getting_started": ["introduction", "getting started", "quick start", "setup"],
+    "tutorial": ["tutorial", "guide", "walkthrough", "example"],
+    "api": ["api", "reference", "function", "class", "method"],
+    "advanced": ["advanced", "optimization", "performance", "best practices"]
+  }
+}

+ 33 - 0
libs/external/Skill_Seekers-development/configs/fastapi.json

@@ -0,0 +1,33 @@
+{
+  "name": "fastapi",
+  "description": "FastAPI modern Python web framework. Use for building APIs, async endpoints, dependency injection, and Python backend development.",
+  "base_url": "https://fastapi.tiangolo.com/",
+  "start_urls": [
+    "https://fastapi.tiangolo.com/tutorial/",
+    "https://fastapi.tiangolo.com/tutorial/first-steps/",
+    "https://fastapi.tiangolo.com/tutorial/path-params/",
+    "https://fastapi.tiangolo.com/tutorial/body/",
+    "https://fastapi.tiangolo.com/tutorial/dependencies/",
+    "https://fastapi.tiangolo.com/advanced/",
+    "https://fastapi.tiangolo.com/reference/"
+  ],
+  "selectors": {
+    "main_content": "article",
+    "title": "h1",
+    "code_blocks": "pre code"
+  },
+  "url_patterns": {
+    "include": ["/tutorial/", "/advanced/", "/reference/"],
+    "exclude": ["/help/", "/external-links/", "/deployment/"]
+  },
+  "categories": {
+    "getting_started": ["first-steps", "tutorial", "intro"],
+    "path_operations": ["path", "operations", "routing"],
+    "request_data": ["request", "body", "query", "parameters"],
+    "dependencies": ["dependencies", "injection"],
+    "security": ["security", "oauth", "authentication"],
+    "database": ["database", "sql", "orm"]
+  },
+  "rate_limit": 0.5,
+  "max_pages": 250
+}

+ 45 - 0
libs/external/Skill_Seekers-development/configs/fastapi_unified.json

@@ -0,0 +1,45 @@
+{
+  "name": "fastapi",
+  "description": "Complete FastAPI knowledge combining official documentation and FastAPI codebase. Use when building FastAPI applications, understanding async patterns, or working with Pydantic models.",
+  "merge_mode": "rule-based",
+  "sources": [
+    {
+      "type": "documentation",
+      "base_url": "https://fastapi.tiangolo.com/",
+      "extract_api": true,
+      "selectors": {
+        "main_content": "article",
+        "title": "h1",
+        "code_blocks": "pre code"
+      },
+      "url_patterns": {
+        "include": [],
+        "exclude": ["/img/", "/js/"]
+      },
+      "categories": {
+        "getting_started": ["tutorial", "first-steps"],
+        "path_operations": ["path-params", "query-params", "body"],
+        "dependencies": ["dependencies"],
+        "security": ["security", "oauth2"],
+        "database": ["sql-databases"],
+        "advanced": ["advanced", "async", "middleware"],
+        "deployment": ["deployment"]
+      },
+      "rate_limit": 0.5,
+      "max_pages": 150
+    },
+    {
+      "type": "github",
+      "repo": "tiangolo/fastapi",
+      "include_issues": true,
+      "max_issues": 100,
+      "include_changelog": true,
+      "include_releases": true,
+      "include_code": true,
+      "code_analysis_depth": "surface",
+      "file_patterns": [
+        "fastapi/**/*.py"
+      ]
+    }
+  ]
+}

+ 41 - 0
libs/external/Skill_Seekers-development/configs/fastapi_unified_test.json

@@ -0,0 +1,41 @@
+{
+  "name": "fastapi_test",
+  "description": "FastAPI test - unified scraping with limited pages",
+  "merge_mode": "rule-based",
+  "sources": [
+    {
+      "type": "documentation",
+      "base_url": "https://fastapi.tiangolo.com/",
+      "extract_api": true,
+      "selectors": {
+        "main_content": "article",
+        "title": "h1",
+        "code_blocks": "pre code"
+      },
+      "url_patterns": {
+        "include": [],
+        "exclude": ["/img/", "/js/"]
+      },
+      "categories": {
+        "getting_started": ["tutorial", "first-steps"],
+        "path_operations": ["path-params", "query-params"],
+        "api": ["reference"]
+      },
+      "rate_limit": 0.5,
+      "max_pages": 20
+    },
+    {
+      "type": "github",
+      "repo": "tiangolo/fastapi",
+      "include_issues": false,
+      "include_changelog": false,
+      "include_releases": true,
+      "include_code": true,
+      "code_analysis_depth": "surface",
+      "file_patterns": [
+        "fastapi/routing.py",
+        "fastapi/applications.py"
+      ]
+    }
+  ]
+}

+ 63 - 0
libs/external/Skill_Seekers-development/configs/godot-large-example.json

@@ -0,0 +1,63 @@
+{
+  "name": "godot",
+  "description": "Godot Engine game development. Use for Godot projects, GDScript/C# coding, scene setup, node systems, 2D/3D development, physics, animation, UI, shaders, or any Godot-specific questions.",
+  "base_url": "https://docs.godotengine.org/en/stable/",
+  "start_urls": [
+    "https://docs.godotengine.org/en/stable/getting_started/introduction/index.html",
+    "https://docs.godotengine.org/en/stable/tutorials/scripting/gdscript/index.html",
+    "https://docs.godotengine.org/en/stable/tutorials/2d/index.html",
+    "https://docs.godotengine.org/en/stable/tutorials/3d/index.html",
+    "https://docs.godotengine.org/en/stable/tutorials/physics/index.html",
+    "https://docs.godotengine.org/en/stable/tutorials/animation/index.html",
+    "https://docs.godotengine.org/en/stable/classes/index.html"
+  ],
+  "selectors": {
+    "main_content": "div[role='main']",
+    "title": "title",
+    "code_blocks": "pre"
+  },
+  "url_patterns": {
+    "include": [
+      "/getting_started/",
+      "/tutorials/",
+      "/classes/"
+    ],
+    "exclude": [
+      "/genindex.html",
+      "/search.html",
+      "/_static/",
+      "/_sources/"
+    ]
+  },
+  "categories": {
+    "getting_started": ["introduction", "getting_started", "first", "your_first"],
+    "scripting": ["scripting", "gdscript", "c#", "csharp"],
+    "2d": ["/2d/", "sprite", "canvas", "tilemap"],
+    "3d": ["/3d/", "spatial", "mesh", "3d_"],
+    "physics": ["physics", "collision", "rigidbody", "characterbody"],
+    "animation": ["animation", "tween", "animationplayer"],
+    "ui": ["ui", "control", "gui", "theme"],
+    "shaders": ["shader", "material", "visual_shader"],
+    "audio": ["audio", "sound"],
+    "networking": ["networking", "multiplayer", "rpc"],
+    "export": ["export", "platform", "deploy"]
+  },
+  "rate_limit": 0.5,
+  "max_pages": 40000,
+
+  "_comment": "=== NEW: Split Strategy Configuration ===",
+  "split_strategy": "router",
+  "split_config": {
+    "target_pages_per_skill": 5000,
+    "create_router": true,
+    "split_by_categories": ["scripting", "2d", "3d", "physics", "shaders"],
+    "router_name": "godot",
+    "parallel_scraping": true
+  },
+
+  "_comment2": "=== NEW: Checkpoint Configuration ===",
+  "checkpoint": {
+    "enabled": true,
+    "interval": 1000
+  }
+}

+ 47 - 0
libs/external/Skill_Seekers-development/configs/godot.json

@@ -0,0 +1,47 @@
+{
+  "name": "godot",
+  "description": "Godot Engine game development. Use for Godot projects, GDScript/C# coding, scene setup, node systems, 2D/3D development, physics, animation, UI, shaders, or any Godot-specific questions.",
+  "base_url": "https://docs.godotengine.org/en/stable/",
+  "start_urls": [
+    "https://docs.godotengine.org/en/stable/getting_started/introduction/index.html",
+    "https://docs.godotengine.org/en/stable/tutorials/scripting/gdscript/index.html",
+    "https://docs.godotengine.org/en/stable/tutorials/2d/index.html",
+    "https://docs.godotengine.org/en/stable/tutorials/3d/index.html",
+    "https://docs.godotengine.org/en/stable/tutorials/physics/index.html",
+    "https://docs.godotengine.org/en/stable/tutorials/animation/index.html",
+    "https://docs.godotengine.org/en/stable/classes/index.html"
+  ],
+  "selectors": {
+    "main_content": "div[role='main']",
+    "title": "title",
+    "code_blocks": "pre"
+  },
+  "url_patterns": {
+    "include": [
+      "/getting_started/",
+      "/tutorials/",
+      "/classes/"
+    ],
+    "exclude": [
+      "/genindex.html",
+      "/search.html",
+      "/_static/",
+      "/_sources/"
+    ]
+  },
+  "categories": {
+    "getting_started": ["introduction", "getting_started", "first", "your_first"],
+    "scripting": ["scripting", "gdscript", "c#", "csharp"],
+    "2d": ["/2d/", "sprite", "canvas", "tilemap"],
+    "3d": ["/3d/", "spatial", "mesh", "3d_"],
+    "physics": ["physics", "collision", "rigidbody", "characterbody"],
+    "animation": ["animation", "tween", "animationplayer"],
+    "ui": ["ui", "control", "gui", "theme"],
+    "shaders": ["shader", "material", "visual_shader"],
+    "audio": ["audio", "sound"],
+    "networking": ["networking", "multiplayer", "rpc"],
+    "export": ["export", "platform", "deploy"]
+  },
+  "rate_limit": 0.5,
+  "max_pages": 500
+}

+ 19 - 0
libs/external/Skill_Seekers-development/configs/godot_github.json

@@ -0,0 +1,19 @@
+{
+  "name": "godot",
+  "repo": "godotengine/godot",
+  "description": "Godot Engine - Multi-platform 2D and 3D game engine",
+  "github_token": null,
+  "include_issues": true,
+  "max_issues": 100,
+  "include_changelog": true,
+  "include_releases": true,
+  "include_code": false,
+  "file_patterns": [
+    "core/**/*.h",
+    "core/**/*.cpp",
+    "scene/**/*.h",
+    "scene/**/*.cpp",
+    "servers/**/*.h",
+    "servers/**/*.cpp"
+  ]
+}

+ 50 - 0
libs/external/Skill_Seekers-development/configs/godot_unified.json

@@ -0,0 +1,50 @@
+{
+  "name": "godot",
+  "description": "Complete Godot Engine knowledge base combining official documentation and source code analysis",
+  "merge_mode": "claude-enhanced",
+  "sources": [
+    {
+      "type": "documentation",
+      "base_url": "https://docs.godotengine.org/en/stable/",
+      "extract_api": true,
+      "selectors": {
+        "main_content": "div[role='main']",
+        "title": "title",
+        "code_blocks": "pre"
+      },
+      "url_patterns": {
+        "include": [],
+        "exclude": ["/search.html", "/_static/", "/_images/"]
+      },
+      "categories": {
+        "getting_started": ["introduction", "getting_started", "step_by_step"],
+        "scripting": ["scripting", "gdscript", "c_sharp"],
+        "2d": ["2d", "canvas", "sprite", "animation"],
+        "3d": ["3d", "spatial", "mesh", "shader"],
+        "physics": ["physics", "collision", "rigidbody"],
+        "api": ["api", "class", "reference", "method"]
+      },
+      "rate_limit": 0.5,
+      "max_pages": 500
+    },
+    {
+      "type": "github",
+      "repo": "godotengine/godot",
+      "github_token": null,
+      "code_analysis_depth": "deep",
+      "include_code": true,
+      "include_issues": true,
+      "max_issues": 100,
+      "include_changelog": true,
+      "include_releases": true,
+      "file_patterns": [
+        "core/**/*.h",
+        "core/**/*.cpp",
+        "scene/**/*.h",
+        "scene/**/*.cpp",
+        "servers/**/*.h",
+        "servers/**/*.cpp"
+      ]
+    }
+  ]
+}

+ 18 - 0
libs/external/Skill_Seekers-development/configs/hono.json

@@ -0,0 +1,18 @@
+{
+  "name": "hono",
+  "description": "Hono web application framework for building fast, lightweight APIs. Use for Hono routing, middleware, context handling, and modern JavaScript/TypeScript web development.",
+  "llms_txt_url": "https://hono.dev/llms-full.txt",
+  "base_url": "https://hono.dev/docs",
+  "selectors": {
+    "main_content": "article",
+    "title": "h1",
+    "code_blocks": "pre code"
+  },
+  "url_patterns": {
+    "include": [],
+    "exclude": []
+  },
+  "categories": {},
+  "rate_limit": 0.5,
+  "max_pages": 50
+}

+ 48 - 0
libs/external/Skill_Seekers-development/configs/kubernetes.json

@@ -0,0 +1,48 @@
+{
+  "name": "kubernetes",
+  "description": "Kubernetes container orchestration platform. Use for K8s clusters, deployments, pods, services, networking, storage, configuration, and DevOps tasks.",
+  "base_url": "https://kubernetes.io/docs/",
+  "start_urls": [
+    "https://kubernetes.io/docs/home/",
+    "https://kubernetes.io/docs/concepts/",
+    "https://kubernetes.io/docs/tasks/",
+    "https://kubernetes.io/docs/tutorials/",
+    "https://kubernetes.io/docs/reference/"
+  ],
+  "selectors": {
+    "main_content": "main",
+    "title": "h1",
+    "code_blocks": "pre code"
+  },
+  "url_patterns": {
+    "include": [
+      "/docs/concepts/",
+      "/docs/tasks/",
+      "/docs/tutorials/",
+      "/docs/reference/",
+      "/docs/setup/"
+    ],
+    "exclude": [
+      "/search/",
+      "/blog/",
+      "/training/",
+      "/partners/",
+      "/community/",
+      "/_print/",
+      "/case-studies/"
+    ]
+  },
+  "categories": {
+    "getting_started": ["getting-started", "setup", "learning-environment"],
+    "concepts": ["concepts", "overview", "architecture"],
+    "workloads": ["workloads", "pods", "deployments", "replicaset", "statefulset", "daemonset"],
+    "services": ["services", "networking", "ingress", "service"],
+    "storage": ["storage", "volumes", "persistent"],
+    "configuration": ["configuration", "configmap", "secret"],
+    "security": ["security", "rbac", "policies", "authentication"],
+    "tasks": ["tasks", "administer", "configure"],
+    "tutorials": ["tutorials", "stateless", "stateful"]
+  },
+  "rate_limit": 0.5,
+  "max_pages": 1000
+}

+ 34 - 0
libs/external/Skill_Seekers-development/configs/laravel.json

@@ -0,0 +1,34 @@
+{
+  "name": "laravel",
+  "description": "Laravel PHP web framework. Use for Laravel models, routes, controllers, Blade templates, Eloquent ORM, authentication, and PHP web development.",
+  "base_url": "https://laravel.com/docs/9.x/",
+  "start_urls": [
+    "https://laravel.com/docs/9.x/installation",
+    "https://laravel.com/docs/9.x/routing",
+    "https://laravel.com/docs/9.x/controllers",
+    "https://laravel.com/docs/9.x/views",
+    "https://laravel.com/docs/9.x/blade",
+    "https://laravel.com/docs/9.x/eloquent",
+    "https://laravel.com/docs/9.x/migrations",
+    "https://laravel.com/docs/9.x/authentication"
+  ],
+  "selectors": {
+    "main_content": "#main-content",
+    "title": "h1",
+    "code_blocks": "pre"
+  },
+  "url_patterns": {
+    "include": ["/docs/9.x/", "/docs/10.x/", "/docs/11.x/"],
+    "exclude": ["/api/", "/packages/"]
+  },
+  "categories": {
+    "getting_started": ["installation", "configuration", "structure", "deployment"],
+    "routing": ["routing", "middleware", "controllers"],
+    "views": ["views", "blade", "templates"],
+    "models": ["eloquent", "database", "migrations", "seeding", "queries"],
+    "authentication": ["authentication", "authorization", "passwords"],
+    "api": ["api", "resources", "requests", "responses"]
+  },
+  "rate_limit": 0.3,
+  "max_pages": 500
+}

+ 17 - 0
libs/external/Skill_Seekers-development/configs/python-tutorial-test.json

@@ -0,0 +1,17 @@
+{
+  "name": "python-tutorial-test",
+  "description": "Python tutorial for testing MCP tools",
+  "base_url": "https://docs.python.org/3/tutorial/",
+  "selectors": {
+    "main_content": "article",
+    "title": "h1",
+    "code_blocks": "pre code"
+  },
+  "url_patterns": {
+    "include": [],
+    "exclude": []
+  },
+  "categories": {},
+  "rate_limit": 0.3,
+  "max_pages": 10
+}

+ 31 - 0
libs/external/Skill_Seekers-development/configs/react.json

@@ -0,0 +1,31 @@
+{
+  "name": "react",
+  "description": "React framework for building user interfaces. Use for React components, hooks, state management, JSX, and modern frontend development.",
+  "base_url": "https://react.dev/",
+  "start_urls": [
+    "https://react.dev/learn",
+    "https://react.dev/learn/quick-start",
+    "https://react.dev/learn/thinking-in-react",
+    "https://react.dev/reference/react",
+    "https://react.dev/reference/react-dom",
+    "https://react.dev/reference/react/hooks"
+  ],
+  "selectors": {
+    "main_content": "article",
+    "title": "h1",
+    "code_blocks": "pre code"
+  },
+  "url_patterns": {
+    "include": ["/learn", "/reference"],
+    "exclude": ["/community", "/blog"]
+  },
+  "categories": {
+    "getting_started": ["quick-start", "installation", "tutorial"],
+    "hooks": ["usestate", "useeffect", "usememo", "usecallback", "usecontext", "useref", "hook"],
+    "components": ["component", "props", "jsx"],
+    "state": ["state", "context", "reducer"],
+    "api": ["api", "reference"]
+  },
+  "rate_limit": 0.5,
+  "max_pages": 300
+}

+ 15 - 0
libs/external/Skill_Seekers-development/configs/react_github.json

@@ -0,0 +1,15 @@
+{
+  "name": "react",
+  "repo": "facebook/react",
+  "description": "React JavaScript library for building user interfaces",
+  "github_token": null,
+  "include_issues": true,
+  "max_issues": 100,
+  "include_changelog": true,
+  "include_releases": true,
+  "include_code": false,
+  "file_patterns": [
+    "packages/**/*.js",
+    "packages/**/*.ts"
+  ]
+}

+ 44 - 0
libs/external/Skill_Seekers-development/configs/react_unified.json

@@ -0,0 +1,44 @@
+{
+  "name": "react",
+  "description": "Complete React knowledge base combining official documentation and React codebase insights. Use when working with React, understanding API changes, or debugging React internals.",
+  "merge_mode": "rule-based",
+  "sources": [
+    {
+      "type": "documentation",
+      "base_url": "https://react.dev/",
+      "extract_api": true,
+      "selectors": {
+        "main_content": "article",
+        "title": "h1",
+        "code_blocks": "pre code"
+      },
+      "url_patterns": {
+        "include": [],
+        "exclude": ["/blog/", "/community/"]
+      },
+      "categories": {
+        "getting_started": ["learn", "installation", "quick-start"],
+        "components": ["components", "props", "state"],
+        "hooks": ["hooks", "usestate", "useeffect", "usecontext"],
+        "api": ["api", "reference"],
+        "advanced": ["context", "refs", "portals", "suspense"]
+      },
+      "rate_limit": 0.5,
+      "max_pages": 200
+    },
+    {
+      "type": "github",
+      "repo": "facebook/react",
+      "include_issues": true,
+      "max_issues": 100,
+      "include_changelog": true,
+      "include_releases": true,
+      "include_code": true,
+      "code_analysis_depth": "surface",
+      "file_patterns": [
+        "packages/react/src/**/*.js",
+        "packages/react-dom/src/**/*.js"
+      ]
+    }
+  ]
+}

+ 108 - 0
libs/external/Skill_Seekers-development/configs/steam-economy-complete.json

@@ -0,0 +1,108 @@
+{
+  "name": "steam-economy-complete",
+  "description": "Complete Steam Economy system including inventory, microtransactions, trading, and monetization. Use for ISteamInventory API, ISteamEconomy API, IInventoryService Web API, Steam Wallet integration, in-app purchases, item definitions, trading, crafting, market integration, and all economy features for game developers.",
+  "base_url": "https://partner.steamgames.com/doc/",
+  "start_urls": [
+    "https://partner.steamgames.com/doc/features/inventory",
+    "https://partner.steamgames.com/doc/features/microtransactions",
+    "https://partner.steamgames.com/doc/features/microtransactions/implementation",
+    "https://partner.steamgames.com/doc/api/ISteamInventory",
+    "https://partner.steamgames.com/doc/webapi/ISteamEconomy",
+    "https://partner.steamgames.com/doc/webapi/IInventoryService",
+    "https://partner.steamgames.com/doc/features/inventory/economy"
+  ],
+  "selectors": {
+    "main_content": "div.documentation_bbcode",
+    "title": "div.docPageTitle",
+    "code_blocks": "div.bb_code"
+  },
+  "url_patterns": {
+    "include": [
+      "/features/inventory",
+      "/features/microtransactions",
+      "/api/ISteamInventory",
+      "/webapi/ISteamEconomy",
+      "/webapi/IInventoryService"
+    ],
+    "exclude": [
+      "/home",
+      "/sales",
+      "/marketing",
+      "/legal",
+      "/finance",
+      "/login",
+      "/search",
+      "/steamworks/apps",
+      "/steamworks/partner"
+    ]
+  },
+  "categories": {
+    "getting_started": [
+      "overview",
+      "getting started",
+      "introduction",
+      "quickstart",
+      "setup"
+    ],
+    "inventory_system": [
+      "inventory",
+      "item definition",
+      "item schema",
+      "item properties",
+      "itemdefs",
+      "ISteamInventory"
+    ],
+    "microtransactions": [
+      "microtransaction",
+      "purchase",
+      "payment",
+      "checkout",
+      "wallet",
+      "transaction"
+    ],
+    "economy_api": [
+      "ISteamEconomy",
+      "economy",
+      "asset",
+      "context"
+    ],
+    "inventory_webapi": [
+      "IInventoryService",
+      "webapi",
+      "web api",
+      "http"
+    ],
+    "trading": [
+      "trading",
+      "trade",
+      "exchange",
+      "market"
+    ],
+    "crafting": [
+      "crafting",
+      "recipe",
+      "combine",
+      "exchange"
+    ],
+    "pricing": [
+      "pricing",
+      "price",
+      "cost",
+      "currency"
+    ],
+    "implementation": [
+      "integration",
+      "implementation",
+      "configure",
+      "best practices"
+    ],
+    "examples": [
+      "example",
+      "sample",
+      "tutorial",
+      "walkthrough"
+    ]
+  },
+  "rate_limit": 0.7,
+  "max_pages": 1000
+}

+ 30 - 0
libs/external/Skill_Seekers-development/configs/tailwind.json

@@ -0,0 +1,30 @@
+{
+  "name": "tailwind",
+  "description": "Tailwind CSS utility-first framework for rapid UI development. Use for Tailwind utilities, responsive design, custom configurations, and modern CSS workflows.",
+  "base_url": "https://tailwindcss.com/docs",
+  "start_urls": [
+    "https://tailwindcss.com/docs/installation",
+    "https://tailwindcss.com/docs/utility-first",
+    "https://tailwindcss.com/docs/responsive-design",
+    "https://tailwindcss.com/docs/hover-focus-and-other-states"
+  ],
+  "selectors": {
+    "main_content": "div.prose",
+    "title": "h1",
+    "code_blocks": "pre code"
+  },
+  "url_patterns": {
+    "include": ["/docs"],
+    "exclude": ["/blog", "/resources"]
+  },
+  "categories": {
+    "getting_started": ["installation", "editor-setup", "intellisense"],
+    "core_concepts": ["utility-first", "responsive", "hover-focus", "dark-mode"],
+    "layout": ["container", "columns", "flex", "grid"],
+    "typography": ["font-family", "font-size", "text-align", "text-color"],
+    "backgrounds": ["background-color", "background-image", "gradient"],
+    "customization": ["configuration", "theme", "plugins"]
+  },
+  "rate_limit": 0.5,
+  "max_pages": 100
+}

+ 17 - 0
libs/external/Skill_Seekers-development/configs/test-manual.json

@@ -0,0 +1,17 @@
+{
+  "name": "test-manual",
+  "description": "Manual test config",
+  "base_url": "https://test.example.com/",
+  "selectors": {
+    "main_content": "article",
+    "title": "h1",
+    "code_blocks": "pre code"
+  },
+  "url_patterns": {
+    "include": [],
+    "exclude": []
+  },
+  "categories": {},
+  "rate_limit": 0.5,
+  "max_pages": 50
+}

+ 31 - 0
libs/external/Skill_Seekers-development/configs/vue.json

@@ -0,0 +1,31 @@
+{
+  "name": "vue",
+  "description": "Vue.js progressive JavaScript framework. Use for Vue components, reactivity, composition API, and frontend development.",
+  "base_url": "https://vuejs.org/",
+  "start_urls": [
+    "https://vuejs.org/guide/introduction.html",
+    "https://vuejs.org/guide/quick-start.html",
+    "https://vuejs.org/guide/essentials/application.html",
+    "https://vuejs.org/guide/components/registration.html",
+    "https://vuejs.org/guide/reusability/composables.html",
+    "https://vuejs.org/api/"
+  ],
+  "selectors": {
+    "main_content": "main",
+    "title": "h1",
+    "code_blocks": "pre code"
+  },
+  "url_patterns": {
+    "include": ["/guide/", "/api/", "/examples/"],
+    "exclude": ["/about/", "/sponsor/", "/partners/"]
+  },
+  "categories": {
+    "getting_started": ["quick-start", "introduction", "essentials"],
+    "components": ["component", "props", "events"],
+    "reactivity": ["reactivity", "reactive", "ref", "computed"],
+    "composition_api": ["composition", "setup"],
+    "api": ["api", "reference"]
+  },
+  "rate_limit": 0.5,
+  "max_pages": 200
+}

+ 195 - 0
libs/external/Skill_Seekers-development/demo_conflicts.py

@@ -0,0 +1,195 @@
+#!/usr/bin/env python3
+"""
+Demo: Conflict Detection and Reporting
+
+This demonstrates the unified scraper's ability to detect and report
+conflicts between documentation and code implementation.
+"""
+
+import sys
+import json
+from pathlib import Path
+
+# Add CLI to path
+sys.path.insert(0, str(Path(__file__).parent / 'cli'))
+
+from conflict_detector import ConflictDetector
+
+print("=" * 70)
+print("UNIFIED SCRAPER - CONFLICT DETECTION DEMO")
+print("=" * 70)
+print()
+
+# Load test data
+print("📂 Loading test data...")
+print("   - Documentation APIs from example docs")
+print("   - Code APIs from example repository")
+print()
+
+with open('cli/conflicts.json', 'r') as f:
+    conflicts_data = json.load(f)
+
+conflicts = conflicts_data['conflicts']
+summary = conflicts_data['summary']
+
+print(f"✅ Loaded {summary['total']} conflicts")
+print()
+
+# Display summary
+print("=" * 70)
+print("CONFLICT SUMMARY")
+print("=" * 70)
+print()
+
+print(f"📊 **Total Conflicts**: {summary['total']}")
+print()
+
+print("**By Type:**")
+for conflict_type, count in summary['by_type'].items():
+    if count > 0:
+        emoji = "📖" if conflict_type == "missing_in_docs" else "💻" if conflict_type == "missing_in_code" else "⚠️"
+        print(f"   {emoji} {conflict_type}: {count}")
+print()
+
+print("**By Severity:**")
+for severity, count in summary['by_severity'].items():
+    if count > 0:
+        emoji = "🔴" if severity == "high" else "🟡" if severity == "medium" else "🟢"
+        print(f"   {emoji} {severity.upper()}: {count}")
+print()
+
+# Display detailed conflicts
+print("=" * 70)
+print("DETAILED CONFLICT REPORTS")
+print("=" * 70)
+print()
+
+# Group by severity
+high = [c for c in conflicts if c['severity'] == 'high']
+medium = [c for c in conflicts if c['severity'] == 'medium']
+low = [c for c in conflicts if c['severity'] == 'low']
+
+# Show high severity first
+if high:
+    print("🔴 **HIGH SEVERITY CONFLICTS** (Requires immediate attention)")
+    print("-" * 70)
+    for conflict in high:
+        print()
+        print(f"**API**: `{conflict['api_name']}`")
+        print(f"**Type**: {conflict['type']}")
+        print(f"**Issue**: {conflict['difference']}")
+        print(f"**Suggestion**: {conflict['suggestion']}")
+
+        if conflict['docs_info']:
+            print(f"\n**Documented as**:")
+            print(f"  Signature: {conflict['docs_info'].get('raw_signature', 'N/A')}")
+
+        if conflict['code_info']:
+            print(f"\n**Implemented as**:")
+            params = conflict['code_info'].get('parameters', [])
+            param_str = ', '.join(f"{p['name']}: {p.get('type_hint', 'Any')}" for p in params if p['name'] != 'self')
+            print(f"  Signature: {conflict['code_info']['name']}({param_str})")
+            print(f"  Return type: {conflict['code_info'].get('return_type', 'None')}")
+            print(f"  Location: {conflict['code_info'].get('source', 'N/A')}:{conflict['code_info'].get('line', '?')}")
+    print()
+
+# Show medium severity
+if medium:
+    print("🟡 **MEDIUM SEVERITY CONFLICTS** (Review recommended)")
+    print("-" * 70)
+    for conflict in medium[:3]:  # Show first 3
+        print()
+        print(f"**API**: `{conflict['api_name']}`")
+        print(f"**Type**: {conflict['type']}")
+        print(f"**Issue**: {conflict['difference']}")
+
+        if conflict['code_info']:
+            print(f"**Location**: {conflict['code_info'].get('source', 'N/A')}")
+
+    if len(medium) > 3:
+        print(f"\n   ... and {len(medium) - 3} more medium severity conflicts")
+    print()
+
+# Example: How conflicts appear in final skill
+print("=" * 70)
+print("HOW CONFLICTS APPEAR IN SKILL.MD")
+print("=" * 70)
+print()
+
+example_conflict = high[0] if high else medium[0] if medium else conflicts[0]
+
+print("```markdown")
+print("## 🔧 API Reference")
+print()
+print("### ⚠️ APIs with Conflicts")
+print()
+print(f"#### `{example_conflict['api_name']}`")
+print()
+print(f"⚠️ **Conflict**: {example_conflict['difference']}")
+print()
+
+if example_conflict.get('docs_info'):
+    print("**Documentation says:**")
+    print("```")
+    print(example_conflict['docs_info'].get('raw_signature', 'N/A'))
+    print("```")
+    print()
+
+if example_conflict.get('code_info'):
+    print("**Code implementation:**")
+    print("```python")
+    params = example_conflict['code_info'].get('parameters', [])
+    param_strs = []
+    for p in params:
+        if p['name'] == 'self':
+            continue
+        param_str = p['name']
+        if p.get('type_hint'):
+            param_str += f": {p['type_hint']}"
+        if p.get('default'):
+            param_str += f" = {p['default']}"
+        param_strs.append(param_str)
+
+    sig = f"def {example_conflict['code_info']['name']}({', '.join(param_strs)})"
+    if example_conflict['code_info'].get('return_type'):
+        sig += f" -> {example_conflict['code_info']['return_type']}"
+
+    print(sig)
+    print("```")
+print()
+
+print("*Source: both (conflict)*")
+print("```")
+print()
+
+# Key takeaways
+print("=" * 70)
+print("KEY TAKEAWAYS")
+print("=" * 70)
+print()
+
+print("✅ **What the Unified Scraper Does:**")
+print("   1. Extracts APIs from both documentation and code")
+print("   2. Compares them to detect discrepancies")
+print("   3. Classifies conflicts by type and severity")
+print("   4. Provides actionable suggestions")
+print("   5. Shows both versions transparently in the skill")
+print()
+
+print("⚠️ **Common Conflict Types:**")
+print("   - **Missing in docs**: Undocumented features in code")
+print("   - **Missing in code**: Documented but not implemented")
+print("   - **Signature mismatch**: Different parameters/types")
+print("   - **Description mismatch**: Different explanations")
+print()
+
+print("🎯 **Value:**")
+print("   - Identifies documentation gaps")
+print("   - Catches outdated documentation")
+print("   - Highlights implementation differences")
+print("   - Creates single source of truth showing reality")
+print()
+
+print("=" * 70)
+print("END OF DEMO")
+print("=" * 70)

+ 400 - 0
libs/external/Skill_Seekers-development/docs/CLAUDE.md

@@ -0,0 +1,400 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Overview
+
+This is a Python-based documentation scraper that converts ANY documentation website into a Claude skill. It's a single-file tool (`doc_scraper.py`) that scrapes documentation, extracts code patterns, detects programming languages, and generates structured skill files ready for use with Claude.
+
+## Dependencies
+
+```bash
+pip3 install requests beautifulsoup4
+```
+
+## Core Commands
+
+### Run with a preset configuration
+```bash
+python3 cli/doc_scraper.py --config configs/godot.json
+python3 cli/doc_scraper.py --config configs/react.json
+python3 cli/doc_scraper.py --config configs/vue.json
+python3 cli/doc_scraper.py --config configs/django.json
+python3 cli/doc_scraper.py --config configs/fastapi.json
+```
+
+### Interactive mode (for new frameworks)
+```bash
+python3 cli/doc_scraper.py --interactive
+```
+
+### Quick mode (minimal config)
+```bash
+python3 cli/doc_scraper.py --name react --url https://react.dev/ --description "React framework"
+```
+
+### Skip scraping (use cached data)
+```bash
+python3 cli/doc_scraper.py --config configs/godot.json --skip-scrape
+```
+
+### Resume interrupted scrapes
+```bash
+# If scrape was interrupted
+python3 cli/doc_scraper.py --config configs/godot.json --resume
+
+# Start fresh (clear checkpoint)
+python3 cli/doc_scraper.py --config configs/godot.json --fresh
+```
+
+### Large documentation (10K-40K+ pages)
+```bash
+# 1. Estimate page count
+python3 cli/estimate_pages.py configs/godot.json
+
+# 2. Split into focused sub-skills
+python3 cli/split_config.py configs/godot.json --strategy router
+
+# 3. Generate router skill
+python3 cli/generate_router.py configs/godot-*.json
+
+# 4. Package multiple skills
+python3 cli/package_multi.py output/godot*/
+```
+
+### AI-powered SKILL.md enhancement
+```bash
+# Option 1: During scraping (API-based, requires ANTHROPIC_API_KEY)
+pip3 install anthropic
+export ANTHROPIC_API_KEY=sk-ant-...
+python3 cli/doc_scraper.py --config configs/react.json --enhance
+
+# Option 2: During scraping (LOCAL, no API key - uses Claude Code Max)
+python3 cli/doc_scraper.py --config configs/react.json --enhance-local
+
+# Option 3: Standalone after scraping (API-based)
+python3 cli/enhance_skill.py output/react/
+
+# Option 4: Standalone after scraping (LOCAL, no API key)
+python3 cli/enhance_skill_local.py output/react/
+```
+
+The LOCAL enhancement option (`--enhance-local` or `enhance_skill_local.py`) opens a new terminal with Claude Code, which analyzes reference files and enhances SKILL.md automatically. This requires Claude Code Max plan but no API key.
+
+### MCP Integration (Claude Code)
+```bash
+# One-time setup
+./setup_mcp.sh
+
+# Then in Claude Code, use natural language:
+"List all available configs"
+"Generate config for Tailwind at https://tailwindcss.com/docs"
+"Split configs/godot.json using router strategy"
+"Generate router for configs/godot-*.json"
+"Package skill at output/react/"
+```
+
+9 MCP tools available: list_configs, generate_config, validate_config, estimate_pages, scrape_docs, package_skill, upload_skill, split_config, generate_router
+
+### Test with limited pages (edit config first)
+Set `"max_pages": 20` in the config file to test with fewer pages.
+
+## Architecture
+
+### Single-File Design
+The entire tool is contained in `doc_scraper.py` (~737 lines). It follows a class-based architecture with a single `DocToSkillConverter` class that handles:
+- **Web scraping**: BFS traversal with URL validation
+- **Content extraction**: CSS selectors for title, content, code blocks
+- **Language detection**: Heuristic-based detection from code samples (Python, JavaScript, GDScript, C++, etc.)
+- **Pattern extraction**: Identifies common coding patterns from documentation
+- **Categorization**: Smart categorization using URL structure, page titles, and content keywords with scoring
+- **Skill generation**: Creates SKILL.md with real code examples and categorized reference files
+
+### Data Flow
+1. **Scrape Phase**:
+   - Input: Config JSON (name, base_url, selectors, url_patterns, categories, rate_limit, max_pages)
+   - Process: BFS traversal starting from base_url, respecting include/exclude patterns
+   - Output: `output/{name}_data/pages/*.json` + `summary.json`
+
+2. **Build Phase**:
+   - Input: Scraped JSON data from `output/{name}_data/`
+   - Process: Load pages → Smart categorize → Extract patterns → Generate references
+   - Output: `output/{name}/SKILL.md` + `output/{name}/references/*.md`
+
+### Directory Structure
+```
+Skill_Seekers/
+├── cli/                        # CLI tools
+│   ├── doc_scraper.py         # Main scraping & building tool
+│   ├── enhance_skill.py       # AI enhancement (API-based)
+│   ├── enhance_skill_local.py # AI enhancement (LOCAL, no API)
+│   ├── estimate_pages.py      # Page count estimator
+│   ├── split_config.py        # Large docs splitter (NEW)
+│   ├── generate_router.py     # Router skill generator (NEW)
+│   ├── package_skill.py       # Single skill packager
+│   └── package_multi.py       # Multi-skill packager (NEW)
+├── mcp/                        # MCP server
+│   ├── server.py              # 9 MCP tools (includes upload)
+│   └── README.md
+├── configs/                    # Preset configurations
+│   ├── godot.json
+│   ├── godot-large-example.json  # Large docs example (NEW)
+│   ├── react.json
+│   └── ...
+├── docs/                       # Documentation
+│   ├── CLAUDE.md              # Technical architecture (this file)
+│   ├── LARGE_DOCUMENTATION.md # Large docs guide (NEW)
+│   ├── ENHANCEMENT.md
+│   ├── MCP_SETUP.md
+│   └── ...
+└── output/                     # Generated output (git-ignored)
+    ├── {name}_data/           # Raw scraped data (cached)
+    │   ├── pages/             # Individual page JSONs
+    │   ├── summary.json       # Scraping summary
+    │   └── checkpoint.json    # Resume checkpoint (NEW)
+    └── {name}/                # Generated skill
+        ├── SKILL.md           # Main skill file with examples
+        ├── SKILL.md.backup    # Backup (if enhanced)
+        ├── references/        # Categorized documentation
+        │   ├── index.md
+        │   ├── getting_started.md
+        │   ├── api.md
+        │   └── ...
+        ├── scripts/           # Empty (for user scripts)
+        └── assets/            # Empty (for user assets)
+```
+
+### Configuration Format
+Config files in `configs/*.json` contain:
+- `name`: Skill identifier (e.g., "godot", "react")
+- `description`: When to use this skill
+- `base_url`: Starting URL for scraping
+- `selectors`: CSS selectors for content extraction
+  - `main_content`: Main documentation content (e.g., "article", "div[role='main']")
+  - `title`: Page title selector
+  - `code_blocks`: Code sample selector (e.g., "pre code", "pre")
+- `url_patterns`: URL filtering
+  - `include`: Only scrape URLs containing these patterns
+  - `exclude`: Skip URLs containing these patterns
+- `categories`: Keyword-based categorization mapping
+- `rate_limit`: Delay between requests (seconds)
+- `max_pages`: Maximum pages to scrape
+- `split_strategy`: (Optional) How to split large docs: "auto", "category", "router", "size"
+- `split_config`: (Optional) Split configuration
+  - `target_pages_per_skill`: Pages per sub-skill (default: 5000)
+  - `create_router`: Create router/hub skill (default: true)
+  - `split_by_categories`: Category names to split by
+- `checkpoint`: (Optional) Checkpoint/resume configuration
+  - `enabled`: Enable checkpointing (default: false)
+  - `interval`: Save every N pages (default: 1000)
+
+### Key Features
+
+**Auto-detect existing data**: Tool checks for `output/{name}_data/` and prompts to reuse, avoiding re-scraping.
+
+**Language detection**: Detects code languages from:
+1. CSS class attributes (`language-*`, `lang-*`)
+2. Heuristics (keywords like `def`, `const`, `func`, etc.)
+
+**Pattern extraction**: Looks for "Example:", "Pattern:", "Usage:" markers in content and extracts following code blocks (up to 5 per page).
+
+**Smart categorization**:
+- Scores pages against category keywords (3 points for URL match, 2 for title, 1 for content)
+- Threshold of 2+ for categorization
+- Auto-infers categories from URL segments if none provided
+- Falls back to "other" category
+
+**Enhanced SKILL.md**: Generated with:
+- Real code examples from documentation (language-annotated)
+- Quick reference patterns extracted from docs
+- Common pattern section
+- Category file listings
+
+**AI-Powered Enhancement**: Two scripts to dramatically improve SKILL.md quality:
+- `enhance_skill.py`: Uses Anthropic API (~$0.15-$0.30 per skill, requires API key)
+- `enhance_skill_local.py`: Uses Claude Code Max (free, no API key needed)
+- Transforms generic 75-line templates into comprehensive 500+ line guides
+- Extracts best examples, explains key concepts, adds navigation guidance
+- Success rate: 9/10 quality (based on steam-economy test)
+
+**Large Documentation Support (NEW)**: Handle 10K-40K+ page documentation:
+- `split_config.py`: Split large configs into multiple focused sub-skills
+- `generate_router.py`: Create intelligent router/hub skills that direct queries
+- `package_multi.py`: Package multiple skills at once
+- 4 split strategies: auto, category, router, size
+- Parallel scraping support for faster processing
+- MCP integration for natural language usage
+
+**Checkpoint/Resume (NEW)**: Never lose progress on long scrapes:
+- Auto-saves every N pages (configurable, default: 1000)
+- Resume with `--resume` flag
+- Clear checkpoint with `--fresh` flag
+- Saves on interruption (Ctrl+C)
+
+## Key Code Locations
+
+- **URL validation**: `is_valid_url()` doc_scraper.py:47-62
+- **Content extraction**: `extract_content()` doc_scraper.py:64-131
+- **Language detection**: `detect_language()` doc_scraper.py:133-163
+- **Pattern extraction**: `extract_patterns()` doc_scraper.py:165-181
+- **Smart categorization**: `smart_categorize()` doc_scraper.py:280-321
+- **Category inference**: `infer_categories()` doc_scraper.py:323-349
+- **Quick reference generation**: `generate_quick_reference()` doc_scraper.py:351-370
+- **SKILL.md generation**: `create_enhanced_skill_md()` doc_scraper.py:424-540
+- **Scraping loop**: `scrape_all()` doc_scraper.py:226-249
+- **Main workflow**: `main()` doc_scraper.py:661-733
+
+## Workflow Examples
+
+### First time scraping (with scraping)
+```bash
+# 1. Scrape + Build
+python3 cli/doc_scraper.py --config configs/godot.json
+# Time: 20-40 minutes
+
+# 2. Package
+python3 cli/package_skill.py output/godot/
+
+# Result: godot.zip
+```
+
+### Using cached data (fast iteration)
+```bash
+# 1. Use existing data
+python3 cli/doc_scraper.py --config configs/godot.json --skip-scrape
+# Time: 1-3 minutes
+
+# 2. Package
+python3 cli/package_skill.py output/godot/
+```
+
+### Creating a new framework config
+```bash
+# Option 1: Interactive
+python3 cli/doc_scraper.py --interactive
+
+# Option 2: Copy and modify
+cp configs/react.json configs/myframework.json
+# Edit configs/myframework.json
+python3 cli/doc_scraper.py --config configs/myframework.json
+```
+
+### Large documentation workflow (40K pages)
+```bash
+# 1. Estimate page count (fast, 1-2 minutes)
+python3 cli/estimate_pages.py configs/godot.json
+
+# 2. Split into focused sub-skills
+python3 cli/split_config.py configs/godot.json --strategy router --target-pages 5000
+
+# Creates: godot-scripting.json, godot-2d.json, godot-3d.json, etc.
+
+# 3. Scrape all in parallel (4-8 hours instead of 20-40!)
+for config in configs/godot-*.json; do
+  python3 cli/doc_scraper.py --config $config &
+done
+wait
+
+# 4. Generate intelligent router skill
+python3 cli/generate_router.py configs/godot-*.json
+
+# 5. Package all skills
+python3 cli/package_multi.py output/godot*/
+
+# 6. Upload all .zip files to Claude
+# Result: Router automatically directs queries to the right sub-skill!
+```
+
+**Time savings:** Parallel scraping reduces 20-40 hours to 4-8 hours
+
+**See full guide:** [Large Documentation Guide](LARGE_DOCUMENTATION.md)
+
+## Testing Selectors
+
+To find the right CSS selectors for a documentation site:
+
+```python
+from bs4 import BeautifulSoup
+import requests
+
+url = "https://docs.example.com/page"
+soup = BeautifulSoup(requests.get(url).content, 'html.parser')
+
+# Try different selectors
+print(soup.select_one('article'))
+print(soup.select_one('main'))
+print(soup.select_one('div[role="main"]'))
+```
+
+## Running Tests
+
+**IMPORTANT: You must install the package before running tests**
+
+```bash
+# 1. Install package in editable mode (one-time setup)
+pip install -e .
+
+# 2. Run all tests
+pytest
+
+# 3. Run specific test files
+pytest tests/test_config_validation.py
+pytest tests/test_github_scraper.py
+
+# 4. Run with verbose output
+pytest -v
+
+# 5. Run with coverage report
+pytest --cov=src/skill_seekers --cov-report=html
+```
+
+**Why install first?**
+- Tests import from `skill_seekers.cli` which requires the package to be installed
+- Modern Python packaging best practice (PEP 517/518)
+- CI/CD automatically installs with `pip install -e .`
+- conftest.py will show helpful error if package not installed
+
+**Test Coverage:**
+- 391+ tests passing
+- 39% code coverage
+- All core features tested
+- CI/CD tests on Ubuntu + macOS with Python 3.10-3.12
+
+## Troubleshooting
+
+**No content extracted**: Check `main_content` selector. Common values: `article`, `main`, `div[role="main"]`, `div.content`
+
+**Poor categorization**: Edit `categories` section in config with better keywords specific to the documentation structure
+
+**Force re-scrape**: Delete cached data with `rm -rf output/{name}_data/`
+
+**Rate limiting issues**: Increase `rate_limit` value in config (e.g., from 0.5 to 1.0 seconds)
+
+## Output Quality Checks
+
+After building, verify quality:
+```bash
+cat output/godot/SKILL.md              # Should have real code examples
+cat output/godot/references/index.md   # Should show categories
+ls output/godot/references/            # Should have category .md files
+```
+
+## llms.txt Support
+
+Skill_Seekers automatically detects llms.txt files before HTML scraping:
+
+### Detection Order
+1. `{base_url}/llms-full.txt` (complete documentation)
+2. `{base_url}/llms.txt` (standard version)
+3. `{base_url}/llms-small.txt` (quick reference)
+
+### Benefits
+- ⚡ 10x faster (< 5 seconds vs 20-60 seconds)
+- ✅ More reliable (maintained by docs authors)
+- 🎯 Better quality (pre-formatted for LLMs)
+- 🚫 No rate limiting needed
+
+### Example Sites
+- Hono: https://hono.dev/llms-full.txt
+
+If no llms.txt is found, automatically falls back to HTML scraping.

+ 250 - 0
libs/external/Skill_Seekers-development/docs/ENHANCEMENT.md

@@ -0,0 +1,250 @@
+# AI-Powered SKILL.md Enhancement
+
+Two scripts are available to dramatically improve your SKILL.md file:
+1. **`enhance_skill_local.py`** - Uses Claude Code Max (no API key, **recommended**)
+2. **`enhance_skill.py`** - Uses Anthropic API (~$0.15-$0.30 per skill)
+
+Both analyze reference documentation and extract the best examples and guidance.
+
+## Why Use Enhancement?
+
+**Problem:** The auto-generated SKILL.md is often too generic:
+- Empty Quick Reference section
+- No practical code examples
+- Generic "When to Use" triggers
+- Doesn't highlight key features
+
+**Solution:** Let Claude read your reference docs and create a much better SKILL.md with:
+- ✅ Best code examples extracted from documentation
+- ✅ Practical quick reference with real patterns
+- ✅ Domain-specific guidance
+- ✅ Clear navigation tips
+- ✅ Key concepts explained
+
+## Quick Start (LOCAL - No API Key)
+
+**Recommended for Claude Code Max users:**
+
+```bash
+# Option 1: Standalone enhancement
+python3 cli/enhance_skill_local.py output/steam-inventory/
+
+# Option 2: Integrated with scraper
+python3 cli/doc_scraper.py --config configs/steam-inventory.json --enhance-local
+```
+
+**What happens:**
+1. Opens new terminal window
+2. Runs Claude Code with enhancement prompt
+3. Claude analyzes reference files (~15-20K chars)
+4. Generates enhanced SKILL.md (30-60 seconds)
+5. Terminal auto-closes when done
+
+**Requirements:**
+- Claude Code Max plan (you're already using it!)
+- macOS (auto-launch works) or manual terminal run on other OS
+
+## API-Based Enhancement (Alternative)
+
+**If you prefer API-based approach:**
+
+### Installation
+
+```bash
+pip3 install anthropic
+```
+
+### Setup API Key
+
+```bash
+# Option 1: Environment variable (recommended)
+export ANTHROPIC_API_KEY=sk-ant-...
+
+# Option 2: Pass directly with --api-key
+python3 cli/enhance_skill.py output/react/ --api-key sk-ant-...
+```
+
+### Usage
+
+```bash
+# Standalone enhancement
+python3 cli/enhance_skill.py output/steam-inventory/
+
+# Integrated with scraper
+python3 cli/doc_scraper.py --config configs/steam-inventory.json --enhance
+
+# Dry run (see what would be done)
+python3 cli/enhance_skill.py output/react/ --dry-run
+```
+
+## What It Does
+
+1. **Reads reference files** (api_reference.md, webapi.md, etc.)
+2. **Sends to Claude** with instructions to:
+   - Extract 5-10 best code examples
+   - Create practical quick reference
+   - Write domain-specific "When to Use" triggers
+   - Add helpful navigation guidance
+3. **Backs up original** SKILL.md to SKILL.md.backup
+4. **Saves enhanced version** as new SKILL.md
+
+## Example Enhancement
+
+### Before (Auto-Generated)
+```markdown
+## Quick Reference
+
+### Common Patterns
+
+*Quick reference patterns will be added as you use the skill.*
+```
+
+### After (AI-Enhanced)
+```markdown
+## Quick Reference
+
+### Common API Patterns
+
+**Granting promotional items:**
+```cpp
+void CInventory::GrantPromoItems()
+{
+    SteamItemDef_t newItems[2];
+    newItems[0] = 110;
+    newItems[1] = 111;
+    SteamInventory()->AddPromoItems( &s_GenerateRequestResult, newItems, 2 );
+}
+```
+
+**Getting all items in player inventory:**
+```cpp
+SteamInventoryResult_t resultHandle;
+bool success = SteamInventory()->GetAllItems( &resultHandle );
+```
+[... 8 more practical examples ...]
+```
+
+## Cost Estimate
+
+- **Input**: ~50,000-100,000 tokens (reference docs)
+- **Output**: ~4,000 tokens (enhanced SKILL.md)
+- **Model**: claude-sonnet-4-20250514
+- **Estimated cost**: $0.15-$0.30 per skill
+
+## Troubleshooting
+
+### "No API key provided"
+```bash
+export ANTHROPIC_API_KEY=sk-ant-...
+# or
+python3 cli/enhance_skill.py output/react/ --api-key sk-ant-...
+```
+
+### "No reference files found"
+Make sure you've run the scraper first:
+```bash
+python3 cli/doc_scraper.py --config configs/react.json
+```
+
+### "anthropic package not installed"
+```bash
+pip3 install anthropic
+```
+
+### Don't like the result?
+```bash
+# Restore original
+mv output/steam-inventory/SKILL.md.backup output/steam-inventory/SKILL.md
+
+# Try again (it may generate different content)
+python3 cli/enhance_skill.py output/steam-inventory/
+```
+
+## Tips
+
+1. **Run after scraping completes** - Enhancement works best with complete reference docs
+2. **Review the output** - AI is good but not perfect, check the generated SKILL.md
+3. **Keep the backup** - Original is saved as SKILL.md.backup
+4. **Re-run if needed** - Each run may produce slightly different results
+5. **Works offline after first run** - Reference files are local
+
+## Real-World Results
+
+**Test Case: steam-economy skill**
+- **Before:** 75 lines, generic template, empty Quick Reference
+- **After:** 570 lines, 10 practical API examples, key concepts explained
+- **Time:** 60 seconds
+- **Quality Rating:** 9/10
+
+The LOCAL enhancement successfully:
+- Extracted best HTTP/JSON examples from 24 pages of documentation
+- Explained domain concepts (Asset Classes, Context IDs, Transaction Lifecycle)
+- Created navigation guidance for beginners through advanced users
+- Added best practices for security, economy design, and API integration
+
+## Limitations
+
+**LOCAL Enhancement (`enhance_skill_local.py`):**
+- Requires Claude Code Max plan
+- macOS auto-launch only (manual on other OS)
+- Opens new terminal window
+- Takes ~60 seconds
+
+**API Enhancement (`enhance_skill.py`):**
+- Requires Anthropic API key (paid)
+- Cost: ~$0.15-$0.30 per skill
+- Limited to ~100K tokens of reference input
+
+**Both:**
+- May occasionally miss the best examples
+- Can't understand context beyond the reference docs
+- Doesn't modify reference files (only SKILL.md)
+
+## Enhancement Options Comparison
+
+| Aspect | Manual Edit | LOCAL Enhancement | API Enhancement |
+|--------|-------------|-------------------|-----------------|
+| Time | 15-30 minutes | 30-60 seconds | 30-60 seconds |
+| Code examples | You pick | AI picks best | AI picks best |
+| Quick reference | Write yourself | Auto-generated | Auto-generated |
+| Domain guidance | Your knowledge | From docs | From docs |
+| Consistency | Varies | Consistent | Consistent |
+| Cost | Free (your time) | Free (Max plan) | ~$0.20 per skill |
+| Setup | None | None | API key needed |
+| Quality | High (if expert) | 9/10 | 9/10 |
+| **Recommended?** | For experts only | ✅ **Yes** | If no Max plan |
+
+## When to Use
+
+**Use enhancement when:**
+- You want high-quality SKILL.md quickly
+- Working with large documentation (50+ pages)
+- Creating skills for unfamiliar frameworks
+- Need practical code examples extracted
+- Want consistent quality across multiple skills
+
+**Skip enhancement when:**
+- Budget constrained (use manual editing)
+- Very small documentation (<10 pages)
+- You know the framework intimately
+- Documentation has no code examples
+
+## Advanced: Customization
+
+To customize how Claude enhances the SKILL.md, edit `enhance_skill.py` and modify the `_build_enhancement_prompt()` method around line 130.
+
+Example customization:
+```python
+prompt += """
+ADDITIONAL REQUIREMENTS:
+- Focus on security best practices
+- Include performance tips
+- Add troubleshooting section
+"""
+```
+
+## See Also
+
+- [README.md](../README.md) - Main documentation
+- [CLAUDE.md](CLAUDE.md) - Architecture guide
+- [doc_scraper.py](../doc_scraper.py) - Main scraping tool

+ 431 - 0
libs/external/Skill_Seekers-development/docs/LARGE_DOCUMENTATION.md

@@ -0,0 +1,431 @@
+# Handling Large Documentation Sites (10K+ Pages)
+
+Complete guide for scraping and managing large documentation sites with Skill Seeker.
+
+---
+
+## Table of Contents
+
+- [When to Split Documentation](#when-to-split-documentation)
+- [Split Strategies](#split-strategies)
+- [Quick Start](#quick-start)
+- [Detailed Workflows](#detailed-workflows)
+- [Best Practices](#best-practices)
+- [Examples](#examples)
+- [Troubleshooting](#troubleshooting)
+
+---
+
+## When to Split Documentation
+
+### Size Guidelines
+
+| Documentation Size | Recommendation | Strategy |
+|-------------------|----------------|----------|
+| < 5,000 pages | **One skill** | No splitting needed |
+| 5,000 - 10,000 pages | **Consider splitting** | Category-based |
+| 10,000 - 30,000 pages | **Recommended** | Router + Categories |
+| 30,000+ pages | **Strongly recommended** | Router + Categories |
+
+### Why Split Large Documentation?
+
+**Benefits:**
+- ✅ Faster scraping (parallel execution)
+- ✅ More focused skills (better Claude performance)
+- ✅ Easier maintenance (update one topic at a time)
+- ✅ Better user experience (precise answers)
+- ✅ Avoids context window limits
+
+**Trade-offs:**
+- ⚠️ Multiple skills to manage
+- ⚠️ Initial setup more complex
+- ⚠️ Router adds one extra skill
+
+---
+
+## Split Strategies
+
+### 1. **No Split** (One Big Skill)
+**Best for:** Small to medium documentation (< 5K pages)
+
+```bash
+# Just use the config as-is
+python3 cli/doc_scraper.py --config configs/react.json
+```
+
+**Pros:** Simple, one skill to maintain
+**Cons:** Can be slow for large docs, may hit limits
+
+---
+
+### 2. **Category Split** (Multiple Focused Skills)
+**Best for:** 5K-15K pages with clear topic divisions
+
+```bash
+# Auto-split by categories
+python3 cli/split_config.py configs/godot.json --strategy category
+
+# Creates:
+# - godot-scripting.json
+# - godot-2d.json
+# - godot-3d.json
+# - godot-physics.json
+# - etc.
+```
+
+**Pros:** Focused skills, clear separation
+**Cons:** User must know which skill to use
+
+---
+
+### 3. **Router + Categories** (Intelligent Hub) ⭐ RECOMMENDED
+**Best for:** 10K+ pages, best user experience
+
+```bash
+# Create router + sub-skills
+python3 cli/split_config.py configs/godot.json --strategy router
+
+# Creates:
+# - godot.json (router/hub)
+# - godot-scripting.json
+# - godot-2d.json
+# - etc.
+```
+
+**Pros:** Best of both worlds, intelligent routing, natural UX
+**Cons:** Slightly more complex setup
+
+---
+
+### 4. **Size-Based Split**
+**Best for:** Docs without clear categories
+
+```bash
+# Split every 5000 pages
+python3 cli/split_config.py configs/bigdocs.json --strategy size --target-pages 5000
+
+# Creates:
+# - bigdocs-part1.json
+# - bigdocs-part2.json
+# - bigdocs-part3.json
+# - etc.
+```
+
+**Pros:** Simple, predictable
+**Cons:** May split related topics
+
+---
+
+## Quick Start
+
+### Option 1: Automatic (Recommended)
+
+```bash
+# 1. Create config
+python3 cli/doc_scraper.py --interactive
+# Name: godot
+# URL: https://docs.godotengine.org
+# ... fill in prompts ...
+
+# 2. Estimate pages (discovers it's large)
+python3 cli/estimate_pages.py configs/godot.json
+# Output: ⚠️  40,000 pages detected - splitting recommended
+
+# 3. Auto-split with router
+python3 cli/split_config.py configs/godot.json --strategy router
+
+# 4. Scrape all sub-skills
+for config in configs/godot-*.json; do
+  python3 cli/doc_scraper.py --config $config &
+done
+wait
+
+# 5. Generate router
+python3 cli/generate_router.py configs/godot-*.json
+
+# 6. Package all
+python3 cli/package_multi.py output/godot*/
+
+# 7. Upload all .zip files to Claude
+```
+
+---
+
+### Option 2: Manual Control
+
+```bash
+# 1. Define split in config
+nano configs/godot.json
+
+# Add:
+{
+  "split_strategy": "router",
+  "split_config": {
+    "target_pages_per_skill": 5000,
+    "create_router": true,
+    "split_by_categories": ["scripting", "2d", "3d", "physics"]
+  }
+}
+
+# 2. Split
+python3 cli/split_config.py configs/godot.json
+
+# 3. Continue as above...
+```
+
+---
+
+## Detailed Workflows
+
+### Workflow 1: Router + Categories (40K Pages)
+
+**Scenario:** Godot documentation (40,000 pages)
+
+**Step 1: Estimate**
+```bash
+python3 cli/estimate_pages.py configs/godot.json
+
+# Output:
+# Estimated: 40,000 pages
+# Recommended: Split into 8 skills (5K each)
+```
+
+**Step 2: Split Configuration**
+```bash
+python3 cli/split_config.py configs/godot.json --strategy router --target-pages 5000
+
+# Creates:
+# configs/godot.json (router)
+# configs/godot-scripting.json (5K pages)
+# configs/godot-2d.json (8K pages)
+# configs/godot-3d.json (10K pages)
+# configs/godot-physics.json (6K pages)
+# configs/godot-shaders.json (11K pages)
+```
+
+**Step 3: Scrape Sub-Skills (Parallel)**
+```bash
+# Open multiple terminals or use background jobs
+python3 cli/doc_scraper.py --config configs/godot-scripting.json &
+python3 cli/doc_scraper.py --config configs/godot-2d.json &
+python3 cli/doc_scraper.py --config configs/godot-3d.json &
+python3 cli/doc_scraper.py --config configs/godot-physics.json &
+python3 cli/doc_scraper.py --config configs/godot-shaders.json &
+
+# Wait for all to complete
+wait
+
+# Time: 4-8 hours (parallel) vs 20-40 hours (sequential)
+```
+
+**Step 4: Generate Router**
+```bash
+python3 cli/generate_router.py configs/godot-*.json
+
+# Creates:
+# output/godot/SKILL.md (router skill)
+```
+
+**Step 5: Package All**
+```bash
+python3 cli/package_multi.py output/godot*/
+
+# Creates:
+# output/godot.zip (router)
+# output/godot-scripting.zip
+# output/godot-2d.zip
+# output/godot-3d.zip
+# output/godot-physics.zip
+# output/godot-shaders.zip
+```
+
+**Step 6: Upload to Claude**
+Upload all 6 .zip files to Claude. The router will intelligently direct queries to the right sub-skill!
+
+---
+
+### Workflow 2: Category Split Only (15K Pages)
+
+**Scenario:** Vue.js documentation (15,000 pages)
+
+**No router needed - just focused skills:**
+
+```bash
+# 1. Split
+python3 cli/split_config.py configs/vue.json --strategy category
+
+# 2. Scrape each
+for config in configs/vue-*.json; do
+  python3 cli/doc_scraper.py --config $config
+done
+
+# 3. Package
+python3 cli/package_multi.py output/vue*/
+
+# 4. Upload all to Claude
+```
+
+**Result:** 5 focused Vue skills (components, reactivity, routing, etc.)
+
+---
+
+## Best Practices
+
+### 1. **Choose Target Size Wisely**
+
+```bash
+# Small focused skills (3K-5K pages) - more skills, very focused
+python3 cli/split_config.py config.json --target-pages 3000
+
+# Medium skills (5K-8K pages) - balanced (RECOMMENDED)
+python3 cli/split_config.py config.json --target-pages 5000
+
+# Larger skills (8K-10K pages) - fewer skills, broader
+python3 cli/split_config.py config.json --target-pages 8000
+```
+
+### 2. **Use Parallel Scraping**
+
+```bash
+# Serial (slow - 40 hours)
+for config in configs/godot-*.json; do
+  python3 cli/doc_scraper.py --config $config
+done
+
+# Parallel (fast - 8 hours) ⭐
+for config in configs/godot-*.json; do
+  python3 cli/doc_scraper.py --config $config &
+done
+wait
+```
+
+### 3. **Test Before Full Scrape**
+
+```bash
+# Test with limited pages first
+nano configs/godot-2d.json
+# Set: "max_pages": 50
+
+python3 cli/doc_scraper.py --config configs/godot-2d.json
+
+# If output looks good, increase to full
+```
+
+### 4. **Use Checkpoints for Long Scrapes**
+
+```bash
+# Enable checkpoints in config
+{
+  "checkpoint": {
+    "enabled": true,
+    "interval": 1000
+  }
+}
+
+# If scrape fails, resume
+python3 cli/doc_scraper.py --config config.json --resume
+```
+
+---
+
+## Examples
+
+### Example 1: AWS Documentation (Hypothetical 50K Pages)
+
+```bash
+# 1. Split by AWS services
+python3 cli/split_config.py configs/aws.json --strategy router --target-pages 5000
+
+# Creates ~10 skills:
+# - aws (router)
+# - aws-compute (EC2, Lambda)
+# - aws-storage (S3, EBS)
+# - aws-database (RDS, DynamoDB)
+# - etc.
+
+# 2. Scrape in parallel (overnight)
+# 3. Upload all skills to Claude
+# 4. User asks "How do I create an S3 bucket?"
+# 5. Router activates aws-storage skill
+# 6. Focused, accurate answer!
+```
+
+### Example 2: Microsoft Docs (100K+ Pages)
+
+```bash
+# Too large even with splitting - use selective categories
+
+# Only scrape key topics
+python3 cli/split_config.py configs/microsoft.json --strategy category
+
+# Edit configs to include only:
+# - microsoft-azure (Azure docs only)
+# - microsoft-dotnet (.NET docs only)
+# - microsoft-typescript (TS docs only)
+
+# Skip less relevant sections
+```
+
+---
+
+## Troubleshooting
+
+### Issue: "Splitting creates too many skills"
+
+**Solution:** Increase target size or combine categories
+
+```bash
+# Instead of 5K per skill, use 8K
+python3 cli/split_config.py config.json --target-pages 8000
+
+# Or manually combine categories in config
+```
+
+### Issue: "Router not routing correctly"
+
+**Solution:** Check routing keywords in router SKILL.md
+
+```bash
+# Review router
+cat output/godot/SKILL.md
+
+# Update keywords if needed
+nano output/godot/SKILL.md
+```
+
+### Issue: "Parallel scraping fails"
+
+**Solution:** Reduce parallelism or check rate limits
+
+```bash
+# Scrape 2-3 at a time instead of all
+python3 cli/doc_scraper.py --config config1.json &
+python3 cli/doc_scraper.py --config config2.json &
+wait
+
+python3 cli/doc_scraper.py --config config3.json &
+python3 cli/doc_scraper.py --config config4.json &
+wait
+```
+
+---
+
+## Summary
+
+**For 40K+ Page Documentation:**
+
+1. ✅ **Estimate first**: `python3 cli/estimate_pages.py config.json`
+2. ✅ **Split with router**: `python3 cli/split_config.py config.json --strategy router`
+3. ✅ **Scrape in parallel**: Multiple terminals or background jobs
+4. ✅ **Generate router**: `python3 cli/generate_router.py configs/*-*.json`
+5. ✅ **Package all**: `python3 cli/package_multi.py output/*/`
+6. ✅ **Upload to Claude**: All .zip files
+
+**Result:** Intelligent, fast, focused skills that work seamlessly together!
+
+---
+
+**Questions? See:**
+- [Main README](../README.md)
+- [MCP Setup Guide](MCP_SETUP.md)
+- [Enhancement Guide](ENHANCEMENT.md)

+ 60 - 0
libs/external/Skill_Seekers-development/docs/LLMS_TXT_SUPPORT.md

@@ -0,0 +1,60 @@
+# llms.txt Support
+
+## Overview
+
+Skill_Seekers now automatically detects and uses llms.txt files when available, providing 10x faster documentation ingestion.
+
+## What is llms.txt?
+
+The llms.txt convention is a growing standard where documentation sites provide pre-formatted, LLM-ready markdown files:
+
+- `llms-full.txt` - Complete documentation
+- `llms.txt` - Standard balanced version
+- `llms-small.txt` - Quick reference
+
+## How It Works
+
+1. Before HTML scraping, Skill_Seekers checks for llms.txt files
+2. If found, downloads and parses the markdown
+3. If not found, falls back to HTML scraping
+4. Zero config changes needed
+
+## Configuration
+
+### Automatic Detection (Recommended)
+
+No config changes needed. Just run normally:
+
+```bash
+python3 cli/doc_scraper.py --config configs/hono.json
+```
+
+### Explicit URL
+
+Optionally specify llms.txt URL:
+
+```json
+{
+  "name": "hono",
+  "llms_txt_url": "https://hono.dev/llms-full.txt",
+  "base_url": "https://hono.dev/docs"
+}
+```
+
+## Performance Comparison
+
+| Method | Time | Requests |
+|--------|------|----------|
+| HTML Scraping (20 pages) | 20-60s | 20+ |
+| llms.txt | < 5s | 1 |
+
+## Supported Sites
+
+Sites known to provide llms.txt:
+
+- Hono: https://hono.dev/llms-full.txt
+- (More to be discovered)
+
+## Fallback Behavior
+
+If llms.txt download or parsing fails, automatically falls back to HTML scraping with no user intervention required.

+ 618 - 0
libs/external/Skill_Seekers-development/docs/MCP_SETUP.md

@@ -0,0 +1,618 @@
+# Complete MCP Setup Guide for Claude Code
+
+Step-by-step guide to set up the Skill Seeker MCP server with Claude Code.
+
+**✅ Fully Tested and Working**: All 9 MCP tools verified in production use with Claude Code
+- ✅ 34 comprehensive unit tests (100% pass rate)
+- ✅ Integration tested via actual Claude Code MCP protocol
+- ✅ All 9 tools working with natural language commands (includes upload support!)
+
+---
+
+## Table of Contents
+
+- [Prerequisites](#prerequisites)
+- [Installation](#installation)
+- [Configuration](#configuration)
+- [Verification](#verification)
+- [Usage Examples](#usage-examples)
+- [Troubleshooting](#troubleshooting)
+- [Advanced Configuration](#advanced-configuration)
+
+---
+
+## Prerequisites
+
+### Required Software
+
+1. **Python 3.10 or higher**
+   ```bash
+   python3 --version
+   # Should show: Python 3.10.x or higher
+   ```
+
+2. **Claude Code installed**
+   - Download from [claude.ai/code](https://claude.ai/code)
+   - Requires Claude Pro or Claude Code Max subscription
+
+3. **Skill Seeker repository cloned**
+   ```bash
+   git clone https://github.com/yusufkaraaslan/Skill_Seekers.git
+   cd Skill_Seekers
+   ```
+
+### System Requirements
+
+- **Operating System**: macOS, Linux, or Windows (WSL)
+- **Disk Space**: 100 MB for dependencies + space for generated skills
+- **Network**: Internet connection for documentation scraping
+
+---
+
+## Installation
+
+### Step 1: Install Python Dependencies
+
+```bash
+# Navigate to repository root
+cd /path/to/Skill_Seekers
+
+# Install MCP server dependencies
+pip3 install -r skill_seeker_mcp/requirements.txt
+
+# Install CLI tool dependencies (for scraping)
+pip3 install requests beautifulsoup4
+```
+
+**Expected output:**
+```
+Successfully installed mcp-0.9.0 requests-2.31.0 beautifulsoup4-4.12.3
+```
+
+### Step 2: Verify Installation
+
+```bash
+# Test MCP server can start
+timeout 3 python3 skill_seeker_mcp/server.py || echo "Server OK (timeout expected)"
+
+# Should exit cleanly or timeout (both are normal)
+```
+
+**Optional: Run Tests**
+
+```bash
+# Install test dependencies
+pip3 install pytest
+
+# Run MCP server tests (25 tests)
+python3 -m pytest tests/test_mcp_server.py -v
+
+# Expected: 25 passed in ~0.3s
+```
+
+### Step 3: Note Your Repository Path
+
+```bash
+# Get absolute path
+pwd
+
+# Example output: /Users/username/Projects/Skill_Seekers
+# or: /home/username/Skill_Seekers
+```
+
+**Save this path** - you'll need it for configuration!
+
+---
+
+## Configuration
+
+### Step 1: Locate Claude Code MCP Configuration
+
+Claude Code stores MCP configuration in:
+
+- **macOS**: `~/.config/claude-code/mcp.json`
+- **Linux**: `~/.config/claude-code/mcp.json`
+- **Windows (WSL)**: `~/.config/claude-code/mcp.json`
+
+### Step 2: Create/Edit Configuration File
+
+```bash
+# Create config directory if it doesn't exist
+mkdir -p ~/.config/claude-code
+
+# Edit the configuration
+nano ~/.config/claude-code/mcp.json
+```
+
+### Step 3: Add Skill Seeker MCP Server
+
+**Full Configuration Example:**
+
+```json
+{
+  "mcpServers": {
+    "skill-seeker": {
+      "command": "python3",
+      "args": [
+        "/Users/username/Projects/Skill_Seekers/skill_seeker_mcp/server.py"
+      ],
+      "cwd": "/Users/username/Projects/Skill_Seekers",
+      "env": {}
+    }
+  }
+}
+```
+
+**IMPORTANT:** Replace `/Users/username/Projects/Skill_Seekers` with YOUR actual repository path!
+
+**If you already have other MCP servers:**
+
+```json
+{
+  "mcpServers": {
+    "existing-server": {
+      "command": "node",
+      "args": ["/path/to/existing/server.js"]
+    },
+    "skill-seeker": {
+      "command": "python3",
+      "args": [
+        "/Users/username/Projects/Skill_Seekers/skill_seeker_mcp/server.py"
+      ],
+      "cwd": "/Users/username/Projects/Skill_Seekers"
+    }
+  }
+}
+```
+
+### Step 4: Save and Restart Claude Code
+
+1. Save the file (`Ctrl+O` in nano, then `Enter`)
+2. Exit editor (`Ctrl+X` in nano)
+3. **Completely restart Claude Code** (quit and reopen)
+
+---
+
+## Verification
+
+### Step 1: Check MCP Server Loaded
+
+In Claude Code, type:
+```
+List all available MCP tools
+```
+
+You should see 9 Skill Seeker tools:
+- `generate_config`
+- `estimate_pages`
+- `scrape_docs`
+- `package_skill`
+- `upload_skill`
+- `list_configs`
+- `validate_config`
+- `split_config`
+- `generate_router`
+
+### Step 2: Test a Simple Command
+
+```
+List all available configs
+```
+
+**Expected response:**
+```
+Available configurations:
+1. godot - Godot Engine documentation
+2. react - React framework
+3. vue - Vue.js framework
+4. django - Django web framework
+5. fastapi - FastAPI Python framework
+6. kubernetes - Kubernetes documentation
+7. steam-economy-complete - Steam Economy API
+```
+
+### Step 3: Test Config Generation
+
+```
+Generate a config for Tailwind CSS at https://tailwindcss.com/docs
+```
+
+**Expected response:**
+```
+✅ Config created: configs/tailwind.json
+```
+
+**Verify the file exists:**
+```bash
+ls configs/tailwind.json
+```
+
+---
+
+## Usage Examples
+
+### Example 1: Generate Skill from Scratch
+
+```
+User: Generate config for Svelte docs at https://svelte.dev/docs
+
+Claude: ✅ Config created: configs/svelte.json
+
+User: Estimate pages for configs/svelte.json
+
+Claude: 📊 Estimated pages: 150
+        Recommended max_pages: 180
+
+User: Scrape docs using configs/svelte.json
+
+Claude: ✅ Skill created at output/svelte/
+        Run: python3 cli/package_skill.py output/svelte/
+
+User: Package skill at output/svelte/
+
+Claude: ✅ Created: output/svelte.zip
+        Ready to upload to Claude!
+```
+
+### Example 2: Use Existing Config
+
+```
+User: List all available configs
+
+Claude: [Shows 7 configs]
+
+User: Scrape docs using configs/react.json with max 50 pages
+
+Claude: ✅ Skill created at output/react/
+
+User: Package skill at output/react/
+
+Claude: ✅ Created: output/react.zip
+```
+
+### Example 3: Validate Before Scraping
+
+```
+User: Validate configs/godot.json
+
+Claude: ✅ Config is valid
+        - Base URL: https://docs.godotengine.org/en/stable/
+        - Max pages: 500
+        - Rate limit: 0.5s
+        - Categories: 3
+
+User: Estimate pages for configs/godot.json
+
+Claude: 📊 Estimated pages: 450
+        Current max_pages (500) is sufficient
+
+User: Scrape docs using configs/godot.json
+
+Claude: [Scraping starts...]
+```
+
+---
+
+## Troubleshooting
+
+### Issue: MCP Server Not Loading
+
+**Symptoms:**
+- Skill Seeker tools don't appear in Claude Code
+- No response when asking about configs
+
+**Solutions:**
+
+1. **Check configuration path:**
+   ```bash
+   cat ~/.config/claude-code/mcp.json
+   ```
+
+2. **Verify Python path:**
+   ```bash
+   which python3
+   # Should show: /usr/bin/python3 or /usr/local/bin/python3
+   ```
+
+3. **Test server manually:**
+   ```bash
+   cd /path/to/Skill_Seekers
+   python3 skill_seeker_mcp/server.py
+   # Should start without errors
+   ```
+
+4. **Check Claude Code logs:**
+   - macOS: `~/Library/Logs/Claude Code/`
+   - Linux: `~/.config/claude-code/logs/`
+
+5. **Completely restart Claude Code:**
+   - Quit Claude Code (don't just close window)
+   - Reopen Claude Code
+
+### Issue: "ModuleNotFoundError: No module named 'mcp'"
+
+**Solution:**
+```bash
+pip3 install -r skill_seeker_mcp/requirements.txt
+```
+
+### Issue: "Permission denied" when running server
+
+**Solution:**
+```bash
+chmod +x skill_seeker_mcp/server.py
+```
+
+### Issue: Tools appear but don't work
+
+**Symptoms:**
+- Tools listed but commands fail
+- "Error executing tool" messages
+
+**Solutions:**
+
+1. **Check working directory in config:**
+   ```json
+   {
+     "cwd": "/FULL/PATH/TO/Skill_Seekers"
+   }
+   ```
+
+2. **Verify CLI tools exist:**
+   ```bash
+   ls cli/doc_scraper.py
+   ls cli/estimate_pages.py
+   ls cli/package_skill.py
+   ```
+
+3. **Test CLI tools directly:**
+   ```bash
+   python3 cli/doc_scraper.py --help
+   ```
+
+### Issue: Slow or hanging operations
+
+**Solutions:**
+
+1. **Check rate limit in config:**
+   - Default: 0.5 seconds
+   - Increase if needed: 1.0 or 2.0 seconds
+
+2. **Use smaller max_pages for testing:**
+   ```
+   Generate config with max_pages=20 for testing
+   ```
+
+3. **Check network connection:**
+   ```bash
+   curl -I https://docs.example.com
+   ```
+
+---
+
+## Advanced Configuration
+
+### Custom Environment Variables
+
+```json
+{
+  "mcpServers": {
+    "skill-seeker": {
+      "command": "python3",
+      "args": ["/path/to/Skill_Seekers/skill_seeker_mcp/server.py"],
+      "cwd": "/path/to/Skill_Seekers",
+      "env": {
+        "ANTHROPIC_API_KEY": "sk-ant-...",
+        "PYTHONPATH": "/custom/path"
+      }
+    }
+  }
+}
+```
+
+### Multiple Python Versions
+
+If you have multiple Python versions:
+
+```json
+{
+  "mcpServers": {
+    "skill-seeker": {
+      "command": "/usr/local/bin/python3.11",
+      "args": ["/path/to/Skill_Seekers/skill_seeker_mcp/server.py"],
+      "cwd": "/path/to/Skill_Seekers"
+    }
+  }
+}
+```
+
+### Virtual Environment
+
+To use a Python virtual environment:
+
+```bash
+# Create venv
+cd /path/to/Skill_Seekers
+python3 -m venv venv
+source venv/bin/activate
+pip install -r skill_seeker_mcp/requirements.txt
+pip install requests beautifulsoup4
+which python3
+# Copy this path for config
+```
+
+```json
+{
+  "mcpServers": {
+    "skill-seeker": {
+      "command": "/path/to/Skill_Seekers/venv/bin/python3",
+      "args": ["/path/to/Skill_Seekers/skill_seeker_mcp/server.py"],
+      "cwd": "/path/to/Skill_Seekers"
+    }
+  }
+}
+```
+
+### Debug Mode
+
+Enable verbose logging:
+
+```json
+{
+  "mcpServers": {
+    "skill-seeker": {
+      "command": "python3",
+      "args": [
+        "-u",
+        "/path/to/Skill_Seekers/skill_seeker_mcp/server.py"
+      ],
+      "cwd": "/path/to/Skill_Seekers",
+      "env": {
+        "DEBUG": "1"
+      }
+    }
+  }
+}
+```
+
+---
+
+## Complete Example Configuration
+
+**Minimal (recommended for most users):**
+
+```json
+{
+  "mcpServers": {
+    "skill-seeker": {
+      "command": "python3",
+      "args": [
+        "/Users/username/Projects/Skill_Seekers/skill_seeker_mcp/server.py"
+      ],
+      "cwd": "/Users/username/Projects/Skill_Seekers"
+    }
+  }
+}
+```
+
+**With API enhancement:**
+
+```json
+{
+  "mcpServers": {
+    "skill-seeker": {
+      "command": "python3",
+      "args": [
+        "/Users/username/Projects/Skill_Seekers/skill_seeker_mcp/server.py"
+      ],
+      "cwd": "/Users/username/Projects/Skill_Seekers",
+      "env": {
+        "ANTHROPIC_API_KEY": "sk-ant-your-key-here"
+      }
+    }
+  }
+}
+```
+
+---
+
+## End-to-End Workflow
+
+### Complete Setup and First Skill
+
+```bash
+# 1. Install
+cd ~/Projects
+git clone https://github.com/yusufkaraaslan/Skill_Seekers.git
+cd Skill_Seekers
+pip3 install -r skill_seeker_mcp/requirements.txt
+pip3 install requests beautifulsoup4
+
+# 2. Configure
+mkdir -p ~/.config/claude-code
+cat > ~/.config/claude-code/mcp.json << 'EOF'
+{
+  "mcpServers": {
+    "skill-seeker": {
+      "command": "python3",
+      "args": [
+        "/Users/username/Projects/Skill_Seekers/skill_seeker_mcp/server.py"
+      ],
+      "cwd": "/Users/username/Projects/Skill_Seekers"
+    }
+  }
+}
+EOF
+# (Replace paths with your actual paths!)
+
+# 3. Restart Claude Code
+
+# 4. Test in Claude Code:
+```
+
+**In Claude Code:**
+```
+User: List all available configs
+User: Scrape docs using configs/react.json with max 50 pages
+User: Package skill at output/react/
+```
+
+**Result:** `output/react.zip` ready to upload!
+
+---
+
+## Next Steps
+
+After successful setup:
+
+1. **Try preset configs:**
+   - React: `scrape docs using configs/react.json`
+   - Vue: `scrape docs using configs/vue.json`
+   - Django: `scrape docs using configs/django.json`
+
+2. **Create custom configs:**
+   - `generate config for [framework] at [url]`
+
+3. **Test with small limits first:**
+   - Use `max_pages` parameter: `scrape docs using configs/test.json with max 20 pages`
+
+4. **Explore enhancement:**
+   - Use `--enhance-local` flag for AI-powered SKILL.md improvement
+
+---
+
+## Getting Help
+
+- **Documentation**: See [mcp/README.md](../mcp/README.md)
+- **Issues**: [GitHub Issues](https://github.com/yusufkaraaslan/Skill_Seekers/issues)
+- **Examples**: See [.github/ISSUES_TO_CREATE.md](../.github/ISSUES_TO_CREATE.md) for test cases
+
+---
+
+## Quick Reference Card
+
+```
+SETUP:
+1. Install dependencies: pip3 install -r skill_seeker_mcp/requirements.txt
+2. Configure: ~/.config/claude-code/mcp.json
+3. Restart Claude Code
+
+VERIFY:
+- "List all available configs"
+- "Validate configs/react.json"
+
+GENERATE SKILL:
+1. "Generate config for [name] at [url]"
+2. "Estimate pages for configs/[name].json"
+3. "Scrape docs using configs/[name].json"
+4. "Package skill at output/[name]/"
+
+TROUBLESHOOTING:
+- Check: cat ~/.config/claude-code/mcp.json
+- Test: python3 skill_seeker_mcp/server.py
+- Logs: ~/Library/Logs/Claude Code/
+```
+
+---
+
+Happy skill creating! 🚀

+ 579 - 0
libs/external/Skill_Seekers-development/docs/PDF_ADVANCED_FEATURES.md

@@ -0,0 +1,579 @@
+# PDF Advanced Features Guide
+
+Comprehensive guide to advanced PDF extraction features (Priority 2 & 3).
+
+## Overview
+
+Skill Seeker's PDF extractor now includes powerful advanced features for handling complex PDF scenarios:
+
+**Priority 2 Features (More PDF Types):**
+- ✅ OCR support for scanned PDFs
+- ✅ Password-protected PDF support
+- ✅ Complex table extraction
+
+**Priority 3 Features (Performance Optimizations):**
+- ✅ Parallel page processing
+- ✅ Intelligent caching of expensive operations
+
+## Table of Contents
+
+1. [OCR Support for Scanned PDFs](#ocr-support)
+2. [Password-Protected PDFs](#password-protected-pdfs)
+3. [Table Extraction](#table-extraction)
+4. [Parallel Processing](#parallel-processing)
+5. [Caching](#caching)
+6. [Combined Usage](#combined-usage)
+7. [Performance Benchmarks](#performance-benchmarks)
+
+---
+
+## OCR Support
+
+Extract text from scanned PDFs using Optical Character Recognition.
+
+### Installation
+
+```bash
+# Install Tesseract OCR engine
+# Ubuntu/Debian
+sudo apt-get install tesseract-ocr
+
+# macOS
+brew install tesseract
+
+# Install Python packages
+pip install pytesseract Pillow
+```
+
+### Usage
+
+```bash
+# Basic OCR
+python3 cli/pdf_extractor_poc.py scanned.pdf --ocr
+
+# OCR with other options
+python3 cli/pdf_extractor_poc.py scanned.pdf --ocr --verbose -o output.json
+
+# Full skill creation with OCR
+python3 cli/pdf_scraper.py --pdf scanned.pdf --name myskill --ocr
+```
+
+### How It Works
+
+1. **Detection**: For each page, checks if text content is < 50 characters
+2. **Fallback**: If low text detected and OCR enabled, renders page as image
+3. **Processing**: Runs Tesseract OCR on the image
+4. **Selection**: Uses OCR text if it's longer than extracted text
+5. **Logging**: Shows OCR extraction results in verbose mode
+
+### Example Output
+
+```
+📄 Extracting from: scanned.pdf
+   Pages: 50
+   OCR: ✅ enabled
+
+  Page 1: 245 chars, 0 code blocks, 2 headings, 0 images, 0 tables
+   OCR extracted 245 chars (was 12)
+  Page 2: 389 chars, 1 code blocks, 3 headings, 0 images, 0 tables
+   OCR extracted 389 chars (was 5)
+```
+
+### Limitations
+
+- Requires Tesseract installed on system
+- Slower than regular text extraction (~2-5 seconds per page)
+- Quality depends on PDF scan quality
+- Works best with high-resolution scans
+
+### Best Practices
+
+- Use `--parallel` with OCR for faster processing
+- Combine with `--verbose` to see OCR progress
+- Test on a few pages first before processing large documents
+
+---
+
+## Password-Protected PDFs
+
+Handle encrypted PDFs with password protection.
+
+### Usage
+
+```bash
+# Basic usage
+python3 cli/pdf_extractor_poc.py encrypted.pdf --password mypassword
+
+# With full workflow
+python3 cli/pdf_scraper.py --pdf encrypted.pdf --name myskill --password mypassword
+```
+
+### How It Works
+
+1. **Detection**: Checks if PDF is encrypted (`doc.is_encrypted`)
+2. **Authentication**: Attempts to authenticate with provided password
+3. **Validation**: Returns error if password is incorrect or missing
+4. **Processing**: Continues normal extraction if authentication succeeds
+
+### Example Output
+
+```
+📄 Extracting from: encrypted.pdf
+   🔐 PDF is encrypted, trying password...
+   ✅ Password accepted
+   Pages: 100
+   Metadata: {...}
+```
+
+### Error Handling
+
+```
+# Missing password
+❌ PDF is encrypted but no password provided
+   Use --password option to provide password
+
+# Wrong password
+❌ Invalid password
+```
+
+### Security Notes
+
+- Password is passed via command line (visible in process list)
+- For sensitive documents, consider environment variables
+- Password is not stored in output JSON
+
+---
+
+## Table Extraction
+
+Extract tables from PDFs and include them in skill references.
+
+### Usage
+
+```bash
+# Extract tables
+python3 cli/pdf_extractor_poc.py data.pdf --extract-tables
+
+# With other options
+python3 cli/pdf_extractor_poc.py data.pdf --extract-tables --verbose -o output.json
+
+# Full skill creation with tables
+python3 cli/pdf_scraper.py --pdf data.pdf --name myskill --extract-tables
+```
+
+### How It Works
+
+1. **Detection**: Uses PyMuPDF's `find_tables()` method
+2. **Extraction**: Extracts table data as 2D array (rows × columns)
+3. **Metadata**: Captures bounding box, row count, column count
+4. **Integration**: Tables included in page data and summary
+
+### Example Output
+
+```
+📄 Extracting from: data.pdf
+   Table extraction: ✅ enabled
+
+  Page 5: 892 chars, 2 code blocks, 4 headings, 0 images, 2 tables
+   Found table 0: 10x4
+   Found table 1: 15x6
+
+✅ Extraction complete:
+   Tables found: 25
+```
+
+### Table Data Structure
+
+```json
+{
+  "tables": [
+    {
+      "table_index": 0,
+      "rows": [
+        ["Header 1", "Header 2", "Header 3"],
+        ["Data 1", "Data 2", "Data 3"],
+        ...
+      ],
+      "bbox": [x0, y0, x1, y1],
+      "row_count": 10,
+      "col_count": 4
+    }
+  ]
+}
+```
+
+### Integration with Skills
+
+Tables are automatically included in reference files when building skills:
+
+```markdown
+## Data Tables
+
+### Table 1 (Page 5)
+| Header 1 | Header 2 | Header 3 |
+|----------|----------|----------|
+| Data 1   | Data 2   | Data 3   |
+```
+
+### Limitations
+
+- Quality depends on PDF table structure
+- Works best with well-formatted tables
+- Complex merged cells may not extract correctly
+
+---
+
+## Parallel Processing
+
+Process pages in parallel for 3x faster extraction.
+
+### Usage
+
+```bash
+# Enable parallel processing (auto-detects CPU count)
+python3 cli/pdf_extractor_poc.py large.pdf --parallel
+
+# Specify worker count
+python3 cli/pdf_extractor_poc.py large.pdf --parallel --workers 8
+
+# With full workflow
+python3 cli/pdf_scraper.py --pdf large.pdf --name myskill --parallel --workers 8
+```
+
+### How It Works
+
+1. **Worker Pool**: Creates ThreadPoolExecutor with N workers
+2. **Distribution**: Distributes pages across workers
+3. **Extraction**: Each worker processes pages independently
+4. **Collection**: Results collected and merged
+5. **Threshold**: Only activates for PDFs with > 5 pages
+
+### Example Output
+
+```
+📄 Extracting from: large.pdf
+   Pages: 500
+   Parallel processing: ✅ enabled (8 workers)
+
+🚀 Extracting 500 pages in parallel (8 workers)...
+
+✅ Extraction complete:
+   Total characters: 1,250,000
+   Code blocks found: 450
+```
+
+### Performance
+
+| Pages | Sequential | Parallel (4 workers) | Parallel (8 workers) |
+|-------|-----------|---------------------|---------------------|
+| 50    | 25s       | 10s (2.5x)          | 8s (3.1x)           |
+| 100   | 50s       | 18s (2.8x)          | 15s (3.3x)          |
+| 500   | 4m 10s    | 1m 30s (2.8x)       | 1m 15s (3.3x)       |
+| 1000  | 8m 20s    | 3m 00s (2.8x)       | 2m 30s (3.3x)       |
+
+### Best Practices
+
+- Use `--workers` equal to CPU core count
+- Combine with `--no-cache` for first-time processing
+- Monitor system resources (RAM, CPU)
+- Not recommended for very large images (memory intensive)
+
+### Limitations
+
+- Requires `concurrent.futures` (Python 3.2+)
+- Uses more memory (N workers × page size)
+- May not be beneficial for PDFs with many large images
+
+---
+
+## Caching
+
+Intelligent caching of expensive operations for faster re-extraction.
+
+### Usage
+
+```bash
+# Caching enabled by default
+python3 cli/pdf_extractor_poc.py input.pdf
+
+# Disable caching
+python3 cli/pdf_extractor_poc.py input.pdf --no-cache
+```
+
+### How It Works
+
+1. **Cache Key**: Each page cached by page number
+2. **Check**: Before extraction, checks cache for page data
+3. **Store**: After extraction, stores result in cache
+4. **Reuse**: On re-run, returns cached data instantly
+
+### What Gets Cached
+
+- Page text and markdown
+- Code block detection results
+- Language detection results
+- Quality scores
+- Image extraction results
+- Table extraction results
+
+### Example Output
+
+```
+  Page 1: Using cached data
+  Page 2: Using cached data
+  Page 3: 892 chars, 2 code blocks, 4 headings, 0 images, 0 tables
+```
+
+### Cache Lifetime
+
+- In-memory only (cleared when process exits)
+- Useful for:
+  - Testing extraction parameters
+  - Re-running with different filters
+  - Development and debugging
+
+### When to Disable
+
+- First-time extraction
+- PDF file has changed
+- Different extraction options
+- Memory constraints
+
+---
+
+## Combined Usage
+
+### Maximum Performance
+
+Extract everything as fast as possible:
+
+```bash
+python3 cli/pdf_scraper.py \
+  --pdf docs/manual.pdf \
+  --name myskill \
+  --extract-images \
+  --extract-tables \
+  --parallel \
+  --workers 8 \
+  --min-quality 5.0
+```
+
+### Scanned PDF with Tables
+
+```bash
+python3 cli/pdf_scraper.py \
+  --pdf docs/scanned.pdf \
+  --name myskill \
+  --ocr \
+  --extract-tables \
+  --parallel \
+  --workers 4
+```
+
+### Encrypted PDF with All Features
+
+```bash
+python3 cli/pdf_scraper.py \
+  --pdf docs/encrypted.pdf \
+  --name myskill \
+  --password mypassword \
+  --extract-images \
+  --extract-tables \
+  --parallel \
+  --workers 8 \
+  --verbose
+```
+
+---
+
+## Performance Benchmarks
+
+### Test Setup
+
+- **Hardware**: 8-core CPU, 16GB RAM
+- **PDF**: 500-page technical manual
+- **Content**: Mixed text, code, images, tables
+
+### Results
+
+| Configuration | Time | Speedup |
+|--------------|------|---------|
+| Basic (sequential) | 4m 10s | 1.0x (baseline) |
+| + Caching | 2m 30s | 1.7x |
+| + Parallel (4 workers) | 1m 30s | 2.8x |
+| + Parallel (8 workers) | 1m 15s | 3.3x |
+| + All optimizations | 1m 10s | 3.6x |
+
+### Feature Overhead
+
+| Feature | Time Impact | Memory Impact |
+|---------|------------|---------------|
+| OCR | +2-5s per page | +50MB per page |
+| Table extraction | +0.5s per page | +10MB |
+| Image extraction | +0.2s per image | Varies |
+| Parallel (8 workers) | -66% total time | +8x memory |
+| Caching | -50% on re-run | +100MB |
+
+---
+
+## Troubleshooting
+
+### OCR Issues
+
+**Problem**: `pytesseract not found`
+
+```bash
+# Install pytesseract
+pip install pytesseract
+
+# Install Tesseract engine
+sudo apt-get install tesseract-ocr  # Ubuntu
+brew install tesseract               # macOS
+```
+
+**Problem**: Low OCR quality
+
+- Use higher DPI PDFs
+- Check scan quality
+- Try different Tesseract language packs
+
+### Parallel Processing Issues
+
+**Problem**: Out of memory errors
+
+```bash
+# Reduce worker count
+python3 cli/pdf_extractor_poc.py large.pdf --parallel --workers 2
+
+# Or disable parallel
+python3 cli/pdf_extractor_poc.py large.pdf
+```
+
+**Problem**: Not faster than sequential
+
+- Check CPU usage (may be I/O bound)
+- Try with larger PDFs (> 50 pages)
+- Monitor system resources
+
+### Table Extraction Issues
+
+**Problem**: Tables not detected
+
+- Check if tables are actual tables (not images)
+- Try different PDF viewers to verify structure
+- Use `--verbose` to see detection attempts
+
+**Problem**: Malformed table data
+
+- Complex merged cells may not extract correctly
+- Try extracting specific pages only
+- Manual post-processing may be needed
+
+---
+
+## Best Practices
+
+### For Large PDFs (500+ pages)
+
+1. Use parallel processing:
+   ```bash
+   python3 cli/pdf_scraper.py --pdf large.pdf --parallel --workers 8
+   ```
+
+2. Extract to JSON first, then build skill:
+   ```bash
+   python3 cli/pdf_extractor_poc.py large.pdf -o extracted.json --parallel
+   python3 cli/pdf_scraper.py --from-json extracted.json --name myskill
+   ```
+
+3. Monitor system resources
+
+### For Scanned PDFs
+
+1. Use OCR with parallel processing:
+   ```bash
+   python3 cli/pdf_scraper.py --pdf scanned.pdf --ocr --parallel --workers 4
+   ```
+
+2. Test on sample pages first
+3. Use `--verbose` to monitor OCR performance
+
+### For Encrypted PDFs
+
+1. Use environment variable for password:
+   ```bash
+   export PDF_PASSWORD="mypassword"
+   python3 cli/pdf_scraper.py --pdf encrypted.pdf --password "$PDF_PASSWORD"
+   ```
+
+2. Clear history after use to remove password
+
+### For PDFs with Tables
+
+1. Enable table extraction:
+   ```bash
+   python3 cli/pdf_scraper.py --pdf data.pdf --extract-tables
+   ```
+
+2. Check table quality in output JSON
+3. Manual review recommended for critical data
+
+---
+
+## API Reference
+
+### PDFExtractor Class
+
+```python
+from pdf_extractor_poc import PDFExtractor
+
+extractor = PDFExtractor(
+    pdf_path="input.pdf",
+    verbose=True,
+    chunk_size=10,
+    min_quality=5.0,
+    extract_images=True,
+    image_dir="images/",
+    min_image_size=100,
+    # Advanced features
+    use_ocr=True,
+    password="mypassword",
+    extract_tables=True,
+    parallel=True,
+    max_workers=8,
+    use_cache=True
+)
+
+result = extractor.extract_all()
+```
+
+### Configuration Options
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `pdf_path` | str | required | Path to PDF file |
+| `verbose` | bool | False | Enable verbose logging |
+| `chunk_size` | int | 10 | Pages per chunk |
+| `min_quality` | float | 0.0 | Min code quality (0-10) |
+| `extract_images` | bool | False | Extract images to files |
+| `image_dir` | str | None | Image output directory |
+| `min_image_size` | int | 100 | Min image dimension |
+| `use_ocr` | bool | False | Enable OCR |
+| `password` | str | None | PDF password |
+| `extract_tables` | bool | False | Extract tables |
+| `parallel` | bool | False | Parallel processing |
+| `max_workers` | int | CPU count | Worker threads |
+| `use_cache` | bool | True | Enable caching |
+
+---
+
+## Summary
+
+✅ **6 Advanced Features** implemented (Priority 2 & 3)
+✅ **3x Performance Boost** with parallel processing
+✅ **OCR Support** for scanned PDFs
+✅ **Password Protection** support
+✅ **Table Extraction** from complex PDFs
+✅ **Intelligent Caching** for faster re-runs
+
+The PDF extractor now handles virtually any PDF scenario with maximum performance!

+ 521 - 0
libs/external/Skill_Seekers-development/docs/PDF_CHUNKING.md

@@ -0,0 +1,521 @@
+# PDF Page Detection and Chunking (Task B1.3)
+
+**Status:** ✅ Completed
+**Date:** October 21, 2025
+**Task:** B1.3 - Add PDF page detection and chunking
+
+---
+
+## Overview
+
+Task B1.3 enhances the PDF extractor with intelligent page chunking and chapter detection capabilities. This allows large PDF documentation to be split into manageable, logical sections for better processing and organization.
+
+## New Features
+
+### ✅ 1. Page Chunking
+
+Break large PDFs into smaller, manageable chunks:
+- Configurable chunk size (default: 10 pages per chunk)
+- Smart chunking that respects chapter boundaries
+- Chunk metadata includes page ranges and chapter titles
+
+**Usage:**
+```bash
+# Default chunking (10 pages per chunk)
+python3 cli/pdf_extractor_poc.py input.pdf
+
+# Custom chunk size (20 pages per chunk)
+python3 cli/pdf_extractor_poc.py input.pdf --chunk-size 20
+
+# Disable chunking (single chunk with all pages)
+python3 cli/pdf_extractor_poc.py input.pdf --chunk-size 0
+```
+
+### ✅ 2. Chapter/Section Detection
+
+Automatically detect chapter and section boundaries:
+- Detects H1 and H2 headings as chapter markers
+- Recognizes common chapter patterns:
+  - "Chapter 1", "Chapter 2", etc.
+  - "Part 1", "Part 2", etc.
+  - "Section 1", "Section 2", etc.
+  - Numbered sections like "1. Introduction"
+
+**Chapter Detection Logic:**
+1. Check for H1/H2 headings at page start
+2. Pattern match against common chapter formats
+3. Extract chapter title for metadata
+
+### ✅ 3. Code Block Merging
+
+Intelligently merge code blocks split across pages:
+- Detects when code continues from one page to the next
+- Checks language and detection method consistency
+- Looks for continuation indicators:
+  - Doesn't end with `}`, `;`
+  - Ends with `,`, `\`
+  - Incomplete syntax structures
+
+**Example:**
+```
+Page 5:  def calculate_total(items):
+             total = 0
+             for item in items:
+
+Page 6:         total += item.price
+             return total
+```
+
+The merger will combine these into a single code block.
+
+---
+
+## Output Format
+
+### Enhanced JSON Structure
+
+The output now includes chunking and chapter information:
+
+```json
+{
+  "source_file": "manual.pdf",
+  "metadata": { ... },
+  "total_pages": 150,
+  "total_chunks": 15,
+  "chapters": [
+    {
+      "title": "Getting Started",
+      "start_page": 1,
+      "end_page": 12
+    },
+    {
+      "title": "API Reference",
+      "start_page": 13,
+      "end_page": 45
+    }
+  ],
+  "chunks": [
+    {
+      "chunk_number": 1,
+      "start_page": 1,
+      "end_page": 12,
+      "chapter_title": "Getting Started",
+      "pages": [ ... ]
+    },
+    {
+      "chunk_number": 2,
+      "start_page": 13,
+      "end_page": 22,
+      "chapter_title": "API Reference",
+      "pages": [ ... ]
+    }
+  ],
+  "pages": [ ... ]
+}
+```
+
+### Chunk Object
+
+Each chunk contains:
+- `chunk_number` - Sequential chunk identifier (1-indexed)
+- `start_page` - First page in chunk (1-indexed)
+- `end_page` - Last page in chunk (1-indexed)
+- `chapter_title` - Detected chapter title (if any)
+- `pages` - Array of page objects in this chunk
+
+### Merged Code Block Indicator
+
+Code blocks merged from multiple pages include a flag:
+```json
+{
+  "code": "def example():\n    ...",
+  "language": "python",
+  "detection_method": "font",
+  "merged_from_next_page": true
+}
+```
+
+---
+
+## Implementation Details
+
+### Chapter Detection Algorithm
+
+```python
+def detect_chapter_start(self, page_data):
+    """
+    Detect if a page starts a new chapter/section.
+
+    Returns (is_chapter_start, chapter_title) tuple.
+    """
+    # Check H1/H2 headings first
+    headings = page_data.get('headings', [])
+    if headings:
+        first_heading = headings[0]
+        if first_heading['level'] in ['h1', 'h2']:
+            return True, first_heading['text']
+
+    # Pattern match against common chapter formats
+    text = page_data.get('text', '')
+    first_line = text.split('\n')[0] if text else ''
+
+    chapter_patterns = [
+        r'^Chapter\s+\d+',
+        r'^Part\s+\d+',
+        r'^Section\s+\d+',
+        r'^\d+\.\s+[A-Z]',  # "1. Introduction"
+    ]
+
+    for pattern in chapter_patterns:
+        if re.match(pattern, first_line, re.IGNORECASE):
+            return True, first_line.strip()
+
+    return False, None
+```
+
+### Code Block Merging Algorithm
+
+```python
+def merge_continued_code_blocks(self, pages):
+    """
+    Merge code blocks that are split across pages.
+    """
+    for i in range(len(pages) - 1):
+        current_page = pages[i]
+        next_page = pages[i + 1]
+
+        # Get last code block of current page
+        last_code = current_page['code_samples'][-1]
+
+        # Get first code block of next page
+        first_next_code = next_page['code_samples'][0]
+
+        # Check if they're likely the same code block
+        if (last_code['language'] == first_next_code['language'] and
+            last_code['detection_method'] == first_next_code['detection_method']):
+
+            # Check for continuation indicators
+            last_code_text = last_code['code'].rstrip()
+            continuation_indicators = [
+                not last_code_text.endswith('}'),
+                not last_code_text.endswith(';'),
+                last_code_text.endswith(','),
+                last_code_text.endswith('\\'),
+            ]
+
+            if any(continuation_indicators):
+                # Merge the blocks
+                merged_code = last_code['code'] + '\n' + first_next_code['code']
+                last_code['code'] = merged_code
+                last_code['merged_from_next_page'] = True
+
+                # Remove duplicate from next page
+                next_page['code_samples'].pop(0)
+
+    return pages
+```
+
+### Chunking Algorithm
+
+```python
+def create_chunks(self, pages):
+    """
+    Create chunks of pages respecting chapter boundaries.
+    """
+    chunks = []
+    current_chunk = []
+    current_chapter = None
+
+    for i, page in enumerate(pages):
+        # Detect chapter start
+        is_chapter, chapter_title = self.detect_chapter_start(page)
+
+        if is_chapter and current_chunk:
+            # Save current chunk before starting new one
+            chunks.append({
+                'chunk_number': len(chunks) + 1,
+                'start_page': chunk_start + 1,
+                'end_page': i,
+                'pages': current_chunk,
+                'chapter_title': current_chapter
+            })
+            current_chunk = []
+            current_chapter = chapter_title
+
+        current_chunk.append(page)
+
+        # Check if chunk size reached (but don't break chapters)
+        if not is_chapter and len(current_chunk) >= self.chunk_size:
+            # Create chunk
+            chunks.append(...)
+            current_chunk = []
+
+    return chunks
+```
+
+---
+
+## Usage Examples
+
+### Basic Chunking
+
+```bash
+# Extract with default 10-page chunks
+python3 cli/pdf_extractor_poc.py manual.pdf -o manual.json
+
+# Output includes chunks
+cat manual.json | jq '.total_chunks'
+# Output: 15
+```
+
+### Large PDF Processing
+
+```bash
+# Large PDF with bigger chunks (50 pages each)
+python3 cli/pdf_extractor_poc.py large_manual.pdf --chunk-size 50 -o output.json -v
+
+# Verbose output shows:
+# 📦 Creating chunks (chunk_size=50)...
+# 🔗 Merging code blocks across pages...
+# ✅ Extraction complete:
+#    Chunks created: 8
+#    Chapters detected: 12
+```
+
+### No Chunking (Single Output)
+
+```bash
+# Process all pages as single chunk
+python3 cli/pdf_extractor_poc.py small_doc.pdf --chunk-size 0 -o output.json
+```
+
+---
+
+## Performance
+
+### Chunking Performance
+
+- **Chapter Detection:** ~0.1ms per page (negligible overhead)
+- **Code Merging:** ~0.5ms per page (fast)
+- **Chunk Creation:** ~1ms total (very fast)
+
+**Total overhead:** < 1% of extraction time
+
+### Memory Benefits
+
+Chunking large PDFs helps reduce memory usage:
+- **Without chunking:** Entire PDF loaded in memory
+- **With chunking:** Process chunk-by-chunk (future enhancement)
+
+**Current implementation** still loads entire PDF but provides structured output for chunked processing downstream.
+
+---
+
+## Limitations
+
+### Current Limitations
+
+1. **Chapter Pattern Matching**
+   - Limited to common English chapter patterns
+   - May miss non-standard chapter formats
+   - No support for non-English chapters (e.g., "Capitulo", "Chapitre")
+
+2. **Code Merging Heuristics**
+   - Based on simple continuation indicators
+   - May miss some edge cases
+   - No AST-based validation
+
+3. **Chunk Size**
+   - Fixed page count (not by content size)
+   - Doesn't account for page content volume
+   - No auto-sizing based on memory constraints
+
+### Known Issues
+
+1. **Multi-Chapter Pages**
+   - If a single page has multiple chapters, only first is detected
+   - Workaround: Use smaller chunk sizes
+
+2. **False Code Merges**
+   - Rare cases where separate code blocks are merged
+   - Detection: Look for `merged_from_next_page` flag
+
+3. **Table of Contents**
+   - TOC pages may be detected as chapters
+   - Workaround: Manual filtering in downstream processing
+
+---
+
+## Comparison: Before vs After
+
+| Feature | Before (B1.2) | After (B1.3) |
+|---------|---------------|--------------|
+| Page chunking | None | ✅ Configurable |
+| Chapter detection | None | ✅ Auto-detect |
+| Code spanning pages | Split | ✅ Merged |
+| Large PDF handling | Difficult | ✅ Chunked |
+| Memory efficiency | Poor | Better (structure for future) |
+| Output organization | Flat | ✅ Hierarchical |
+
+---
+
+## Testing
+
+### Test Chapter Detection
+
+Create a test PDF with chapters:
+1. Page 1: "Chapter 1: Introduction"
+2. Page 15: "Chapter 2: Getting Started"
+3. Page 30: "Chapter 3: API Reference"
+
+```bash
+python3 cli/pdf_extractor_poc.py test.pdf -o test.json --chunk-size 20 -v
+
+# Verify chapters detected
+cat test.json | jq '.chapters'
+```
+
+Expected output:
+```json
+[
+  {
+    "title": "Chapter 1: Introduction",
+    "start_page": 1,
+    "end_page": 14
+  },
+  {
+    "title": "Chapter 2: Getting Started",
+    "start_page": 15,
+    "end_page": 29
+  },
+  {
+    "title": "Chapter 3: API Reference",
+    "start_page": 30,
+    "end_page": 50
+  }
+]
+```
+
+### Test Code Merging
+
+Create a test PDF with code spanning pages:
+- Page 1 ends with: `def example():\n    total = 0`
+- Page 2 starts with: `    for i in range(10):\n        total += i`
+
+```bash
+python3 cli/pdf_extractor_poc.py test.pdf -o test.json -v
+
+# Check for merged code blocks
+cat test.json | jq '.pages[0].code_samples[] | select(.merged_from_next_page == true)'
+```
+
+---
+
+## Next Steps (Future Tasks)
+
+### Task B1.4: Improve Code Block Detection
+- Add syntax validation
+- Use AST parsing for better language detection
+- Improve continuation detection accuracy
+
+### Task B1.5: Add Image Extraction
+- Extract images from chunks
+- OCR for code in images
+- Diagram detection and extraction
+
+### Task B1.6: Full PDF Scraper CLI
+- Build on chunking foundation
+- Category detection for chunks
+- Multi-PDF support
+
+---
+
+## Integration with Skill Seeker
+
+The chunking feature lays groundwork for:
+1. **Memory-efficient processing** - Process PDFs chunk-by-chunk
+2. **Better categorization** - Chapters become categories
+3. **Improved SKILL.md** - Organize by detected chapters
+4. **Large PDF support** - Handle 500+ page manuals
+
+**Example workflow:**
+```bash
+# Extract large manual with chapters
+python3 cli/pdf_extractor_poc.py large_manual.pdf --chunk-size 25 -o manual.json
+
+# Future: Build skill from chunks
+python3 cli/build_skill_from_pdf.py manual.json
+
+# Result: SKILL.md organized by detected chapters
+```
+
+---
+
+## API Usage
+
+### Using PDFExtractor with Chunking
+
+```python
+from cli.pdf_extractor_poc import PDFExtractor
+
+# Create extractor with 15-page chunks
+extractor = PDFExtractor('manual.pdf', verbose=True, chunk_size=15)
+
+# Extract
+result = extractor.extract_all()
+
+# Access chunks
+for chunk in result['chunks']:
+    print(f"Chunk {chunk['chunk_number']}: {chunk['chapter_title']}")
+    print(f"  Pages: {chunk['start_page']}-{chunk['end_page']}")
+    print(f"  Total pages: {len(chunk['pages'])}")
+
+# Access chapters
+for chapter in result['chapters']:
+    print(f"Chapter: {chapter['title']}")
+    print(f"  Pages: {chapter['start_page']}-{chapter['end_page']}")
+```
+
+### Processing Chunks Independently
+
+```python
+# Extract
+result = extractor.extract_all()
+
+# Process each chunk separately
+for chunk in result['chunks']:
+    # Get pages in chunk
+    pages = chunk['pages']
+
+    # Process pages
+    for page in pages:
+        # Extract code samples
+        for code in page['code_samples']:
+            print(f"Found {code['language']} code")
+
+            # Check if merged from next page
+            if code.get('merged_from_next_page'):
+                print("  (merged from next page)")
+```
+
+---
+
+## Conclusion
+
+Task B1.3 successfully implements:
+- ✅ Page chunking with configurable size
+- ✅ Automatic chapter/section detection
+- ✅ Code block merging across pages
+- ✅ Enhanced output format with structure
+- ✅ Foundation for large PDF handling
+
+**Performance:** Minimal overhead (<1%)
+**Compatibility:** Backward compatible (pages array still included)
+**Quality:** Significantly improved organization
+
+**Ready for B1.4:** Code block detection improvements
+
+---
+
+**Task Completed:** October 21, 2025
+**Next Task:** B1.4 - Improve code block extraction with syntax detection

+ 420 - 0
libs/external/Skill_Seekers-development/docs/PDF_EXTRACTOR_POC.md

@@ -0,0 +1,420 @@
+# PDF Extractor - Proof of Concept (Task B1.2)
+
+**Status:** ✅ Completed
+**Date:** October 21, 2025
+**Task:** B1.2 - Create simple PDF text extractor (proof of concept)
+
+---
+
+## Overview
+
+This is a proof-of-concept PDF text and code extractor built for Skill Seeker. It demonstrates the feasibility of extracting documentation content from PDF files using PyMuPDF (fitz).
+
+## Features
+
+### ✅ Implemented
+
+1. **Text Extraction** - Extract plain text from all PDF pages
+2. **Markdown Conversion** - Convert PDF content to markdown format
+3. **Code Block Detection** - Multiple detection methods:
+   - **Font-based:** Detects monospace fonts (Courier, Mono, Consolas, etc.)
+   - **Indent-based:** Detects consistently indented code blocks
+   - **Pattern-based:** Detects function/class definitions, imports
+4. **Language Detection** - Auto-detect programming language from code content
+5. **Heading Extraction** - Extract document structure from markdown
+6. **Image Counting** - Track diagrams and screenshots
+7. **JSON Output** - Compatible format with existing doc_scraper.py
+
+### 🎯 Detection Methods
+
+#### Font-Based Detection
+Analyzes font properties to find monospace fonts typically used for code:
+- Courier, Courier New
+- Monaco, Menlo
+- Consolas
+- DejaVu Sans Mono
+
+#### Indentation-Based Detection
+Identifies code blocks by consistent indentation patterns:
+- 4 spaces or tabs
+- Minimum 2 consecutive lines
+- Minimum 20 characters
+
+#### Pattern-Based Detection
+Uses regex to find common code structures:
+- Function definitions (Python, JS, Go, etc.)
+- Class definitions
+- Import/require statements
+
+### 🔍 Language Detection
+
+Supports detection of 19 programming languages:
+- Python, JavaScript, Java, C, C++, C#
+- Go, Rust, PHP, Ruby, Swift, Kotlin
+- Shell, SQL, HTML, CSS
+- JSON, YAML, XML
+
+---
+
+## Installation
+
+### Prerequisites
+
+```bash
+pip install PyMuPDF
+```
+
+### Verify Installation
+
+```bash
+python3 -c "import fitz; print(fitz.__doc__)"
+```
+
+---
+
+## Usage
+
+### Basic Usage
+
+```bash
+# Extract from PDF (print to stdout)
+python3 cli/pdf_extractor_poc.py input.pdf
+
+# Save to JSON file
+python3 cli/pdf_extractor_poc.py input.pdf --output result.json
+
+# Verbose mode (shows progress)
+python3 cli/pdf_extractor_poc.py input.pdf --verbose
+
+# Pretty-printed JSON
+python3 cli/pdf_extractor_poc.py input.pdf --pretty
+```
+
+### Examples
+
+```bash
+# Extract Python documentation
+python3 cli/pdf_extractor_poc.py docs/python_guide.pdf -o python_extracted.json -v
+
+# Extract with verbose and pretty output
+python3 cli/pdf_extractor_poc.py manual.pdf -o manual.json -v --pretty
+
+# Quick test (print to screen)
+python3 cli/pdf_extractor_poc.py sample.pdf --pretty
+```
+
+---
+
+## Output Format
+
+### JSON Structure
+
+```json
+{
+  "source_file": "input.pdf",
+  "metadata": {
+    "title": "Documentation Title",
+    "author": "Author Name",
+    "subject": "Subject",
+    "creator": "PDF Creator",
+    "producer": "PDF Producer"
+  },
+  "total_pages": 50,
+  "total_chars": 125000,
+  "total_code_blocks": 87,
+  "total_headings": 45,
+  "total_images": 12,
+  "languages_detected": {
+    "python": 52,
+    "javascript": 20,
+    "sql": 10,
+    "shell": 5
+  },
+  "pages": [
+    {
+      "page_number": 1,
+      "text": "Plain text content...",
+      "markdown": "# Heading\nContent...",
+      "headings": [
+        {
+          "level": "h1",
+          "text": "Getting Started"
+        }
+      ],
+      "code_samples": [
+        {
+          "code": "def hello():\n    print('Hello')",
+          "language": "python",
+          "detection_method": "font",
+          "font": "Courier-New"
+        }
+      ],
+      "images_count": 2,
+      "char_count": 2500,
+      "code_blocks_count": 3
+    }
+  ]
+}
+```
+
+### Page Object
+
+Each page contains:
+- `page_number` - 1-indexed page number
+- `text` - Plain text content
+- `markdown` - Markdown-formatted content
+- `headings` - Array of heading objects
+- `code_samples` - Array of detected code blocks
+- `images_count` - Number of images on page
+- `char_count` - Character count
+- `code_blocks_count` - Number of code blocks found
+
+### Code Sample Object
+
+Each code sample includes:
+- `code` - The actual code text
+- `language` - Detected language (or 'unknown')
+- `detection_method` - How it was found ('font', 'indent', or 'pattern')
+- `font` - Font name (if detected by font method)
+- `pattern_type` - Type of pattern (if detected by pattern method)
+
+---
+
+## Technical Details
+
+### Detection Accuracy
+
+**Font-based detection:** ⭐⭐⭐⭐⭐ (Best)
+- Highly accurate for well-formatted PDFs
+- Relies on proper font usage in source document
+- Works with: Technical docs, programming books, API references
+
+**Indent-based detection:** ⭐⭐⭐⭐ (Good)
+- Good for structured code blocks
+- May capture non-code indented content
+- Works with: Tutorials, guides, examples
+
+**Pattern-based detection:** ⭐⭐⭐ (Fair)
+- Captures specific code constructs
+- May miss complex or unusual code
+- Works with: Code snippets, function examples
+
+### Language Detection Accuracy
+
+- **High confidence:** Python, JavaScript, Java, Go, SQL
+- **Medium confidence:** C++, Rust, PHP, Ruby, Swift
+- **Basic detection:** Shell, JSON, YAML, XML
+
+Detection based on keyword patterns, not AST parsing.
+
+### Performance
+
+Tested on various PDF sizes:
+- Small (1-10 pages): < 1 second
+- Medium (10-100 pages): 1-5 seconds
+- Large (100-500 pages): 5-30 seconds
+- Very Large (500+ pages): 30+ seconds
+
+Memory usage: ~50-200 MB depending on PDF size and image content.
+
+---
+
+## Limitations
+
+### Current Limitations
+
+1. **No OCR** - Cannot extract text from scanned/image PDFs
+2. **No Table Extraction** - Tables are treated as plain text
+3. **No Image Extraction** - Only counts images, doesn't extract them
+4. **Simple Deduplication** - May miss some duplicate code blocks
+5. **No Multi-column Support** - May jumble multi-column layouts
+
+### Known Issues
+
+1. **Code Split Across Pages** - Code blocks spanning pages may be split
+2. **Complex Layouts** - May struggle with complex PDF layouts
+3. **Non-standard Fonts** - May miss code in non-standard monospace fonts
+4. **Unicode Issues** - Some special characters may not preserve correctly
+
+---
+
+## Comparison with Web Scraper
+
+| Feature | Web Scraper | PDF Extractor POC |
+|---------|-------------|-------------------|
+| Content source | HTML websites | PDF files |
+| Code detection | CSS selectors | Font/indent/pattern |
+| Language detection | CSS classes + heuristics | Pattern matching |
+| Structure | Excellent | Good |
+| Links | Full support | Not supported |
+| Images | Referenced | Counted only |
+| Categories | Auto-categorized | Not implemented |
+| Output format | JSON | JSON (compatible) |
+
+---
+
+## Next Steps (Tasks B1.3-B1.8)
+
+### B1.3: Add PDF Page Detection and Chunking
+- Split large PDFs into manageable chunks
+- Handle page-spanning code blocks
+- Add chapter/section detection
+
+### B1.4: Extract Code Blocks from PDFs
+- Improve code block detection accuracy
+- Add syntax validation
+- Better language detection (use tree-sitter?)
+
+### B1.5: Add PDF Image Extraction
+- Extract diagrams as separate files
+- Extract screenshots
+- OCR support for code in images
+
+### B1.6: Create `pdf_scraper.py` CLI Tool
+- Full-featured CLI like `doc_scraper.py`
+- Config file support
+- Category detection
+- Multi-PDF support
+
+### B1.7: Add MCP Tool `scrape_pdf`
+- Integrate with MCP server
+- Add to existing 9 MCP tools
+- Test with Claude Code
+
+### B1.8: Create PDF Config Format
+- Define JSON config for PDF sources
+- Similar to web scraper configs
+- Support multiple PDFs per skill
+
+---
+
+## Testing
+
+### Manual Testing
+
+1. **Create test PDF** (or use existing PDF documentation)
+2. **Run extractor:**
+   ```bash
+   python3 cli/pdf_extractor_poc.py test.pdf -o test_result.json -v --pretty
+   ```
+3. **Verify output:**
+   - Check `total_code_blocks` > 0
+   - Verify `languages_detected` includes expected languages
+   - Inspect `code_samples` for accuracy
+
+### Test with Real Documentation
+
+Recommended test PDFs:
+- Python documentation (python.org)
+- Django documentation
+- PostgreSQL manual
+- Any programming language reference
+
+### Expected Results
+
+Good PDF (well-formatted with monospace code):
+- Detection rate: 80-95%
+- Language accuracy: 85-95%
+- False positives: < 5%
+
+Poor PDF (scanned or badly formatted):
+- Detection rate: 20-50%
+- Language accuracy: 60-80%
+- False positives: 10-30%
+
+---
+
+## Code Examples
+
+### Using PDFExtractor Class Directly
+
+```python
+from cli.pdf_extractor_poc import PDFExtractor
+
+# Create extractor
+extractor = PDFExtractor('docs/manual.pdf', verbose=True)
+
+# Extract all pages
+result = extractor.extract_all()
+
+# Access data
+print(f"Total pages: {result['total_pages']}")
+print(f"Code blocks: {result['total_code_blocks']}")
+print(f"Languages: {result['languages_detected']}")
+
+# Iterate pages
+for page in result['pages']:
+    print(f"\nPage {page['page_number']}:")
+    print(f"  Code blocks: {page['code_blocks_count']}")
+    for code in page['code_samples']:
+        print(f"  - {code['language']}: {len(code['code'])} chars")
+```
+
+### Custom Language Detection
+
+```python
+from cli.pdf_extractor_poc import PDFExtractor
+
+extractor = PDFExtractor('input.pdf')
+
+# Override language detection
+def custom_detect(code):
+    if 'SELECT' in code.upper():
+        return 'sql'
+    return extractor.detect_language_from_code(code)
+
+# Use in extraction
+# (requires modifying the class to support custom detection)
+```
+
+---
+
+## Contributing
+
+### Adding New Languages
+
+To add language detection for a new language, edit `detect_language_from_code()`:
+
+```python
+patterns = {
+    # ... existing languages ...
+    'newlang': [r'pattern1', r'pattern2', r'pattern3'],
+}
+```
+
+### Adding Detection Methods
+
+To add a new detection method, create a method like:
+
+```python
+def detect_code_blocks_by_newmethod(self, page):
+    """Detect code using new method"""
+    code_blocks = []
+    # ... your detection logic ...
+    return code_blocks
+```
+
+Then add it to `extract_page()`:
+
+```python
+newmethod_code_blocks = self.detect_code_blocks_by_newmethod(page)
+all_code_blocks = font_code_blocks + indent_code_blocks + pattern_code_blocks + newmethod_code_blocks
+```
+
+---
+
+## Conclusion
+
+This POC successfully demonstrates:
+- ✅ PyMuPDF can extract text from PDF documentation
+- ✅ Multiple detection methods can identify code blocks
+- ✅ Language detection works for common languages
+- ✅ JSON output is compatible with existing doc_scraper.py
+- ✅ Performance is acceptable for typical documentation PDFs
+
+**Ready for B1.3:** The foundation is solid. Next step is adding page chunking and handling large PDFs.
+
+---
+
+**POC Completed:** October 21, 2025
+**Next Task:** B1.3 - Add PDF page detection and chunking

+ 553 - 0
libs/external/Skill_Seekers-development/docs/PDF_IMAGE_EXTRACTION.md

@@ -0,0 +1,553 @@
+# PDF Image Extraction (Task B1.5)
+
+**Status:** ✅ Completed
+**Date:** October 21, 2025
+**Task:** B1.5 - Add PDF image extraction (diagrams, screenshots)
+
+---
+
+## Overview
+
+Task B1.5 adds the ability to extract images (diagrams, screenshots, charts) from PDF documentation and save them as separate files. This is essential for preserving visual documentation elements in skills.
+
+## New Features
+
+### ✅ 1. Image Extraction to Files
+
+Extract embedded images from PDFs and save them to disk:
+
+```bash
+# Extract images along with text
+python3 cli/pdf_extractor_poc.py manual.pdf --extract-images
+
+# Specify output directory
+python3 cli/pdf_extractor_poc.py manual.pdf --extract-images --image-dir assets/images/
+
+# Filter small images (icons, bullets)
+python3 cli/pdf_extractor_poc.py manual.pdf --extract-images --min-image-size 200
+```
+
+### ✅ 2. Size-Based Filtering
+
+Automatically filter out small images (icons, bullets, decorations):
+
+- **Default threshold:** 100x100 pixels
+- **Configurable:** `--min-image-size`
+- **Purpose:** Focus on meaningful diagrams and screenshots
+
+### ✅ 3. Image Metadata
+
+Each extracted image includes comprehensive metadata:
+
+```json
+{
+  "filename": "manual_page5_img1.png",
+  "path": "output/manual_images/manual_page5_img1.png",
+  "page_number": 5,
+  "width": 800,
+  "height": 600,
+  "format": "png",
+  "size_bytes": 45821,
+  "xref": 42
+}
+```
+
+### ✅ 4. Automatic Directory Creation
+
+Images are automatically organized:
+
+- **Default:** `output/{pdf_name}_images/`
+- **Naming:** `{pdf_name}_page{N}_img{M}.{ext}`
+- **Formats:** PNG, JPEG, GIF, BMP, etc.
+
+---
+
+## Usage Examples
+
+### Basic Image Extraction
+
+```bash
+# Extract all images from PDF
+python3 cli/pdf_extractor_poc.py tutorial.pdf --extract-images -v
+```
+
+**Output:**
+```
+📄 Extracting from: tutorial.pdf
+   Pages: 50
+   Metadata: {...}
+   Image directory: output/tutorial_images
+
+  Page 1: 2500 chars, 3 code blocks, 2 headings, 0 images
+  Page 2: 1800 chars, 1 code blocks, 1 headings, 2 images
+    Extracted image: tutorial_page2_img1.png (800x600)
+    Extracted image: tutorial_page2_img2.jpeg (1024x768)
+  ...
+
+✅ Extraction complete:
+   Images found: 45
+   Images extracted: 32
+   Image directory: output/tutorial_images
+```
+
+### Custom Image Directory
+
+```bash
+# Save images to specific directory
+python3 cli/pdf_extractor_poc.py manual.pdf --extract-images --image-dir docs/images/
+```
+
+Result: Images saved to `docs/images/manual_page*_img*.{ext}`
+
+### Filter Small Images
+
+```bash
+# Only extract images >= 200x200 pixels
+python3 cli/pdf_extractor_poc.py guide.pdf --extract-images --min-image-size 200 -v
+```
+
+**Verbose output shows filtering:**
+```
+  Page 5: 3200 chars, 4 code blocks, 3 headings, 3 images
+    Skipping small image: 32x32
+    Skipping small image: 64x48
+    Extracted image: guide_page5_img3.png (1200x800)
+```
+
+### Complete Extraction Workflow
+
+```bash
+# Extract everything: text, code, images
+python3 cli/pdf_extractor_poc.py documentation.pdf \
+  --extract-images \
+  --min-image-size 150 \
+  --min-quality 6.0 \
+  --chunk-size 20 \
+  --output documentation.json \
+  --verbose \
+  --pretty
+```
+
+---
+
+## Output Format
+
+### Enhanced JSON Structure
+
+The output now includes image extraction data:
+
+```json
+{
+  "source_file": "manual.pdf",
+  "total_pages": 50,
+  "total_images": 45,
+  "total_extracted_images": 32,
+  "image_directory": "output/manual_images",
+  "extracted_images": [
+    {
+      "filename": "manual_page2_img1.png",
+      "path": "output/manual_images/manual_page2_img1.png",
+      "page_number": 2,
+      "width": 800,
+      "height": 600,
+      "format": "png",
+      "size_bytes": 45821,
+      "xref": 42
+    }
+  ],
+  "pages": [
+    {
+      "page_number": 1,
+      "images_count": 3,
+      "extracted_images": [
+        {
+          "filename": "manual_page1_img1.jpeg",
+          "path": "output/manual_images/manual_page1_img1.jpeg",
+          "width": 1024,
+          "height": 768,
+          "format": "jpeg",
+          "size_bytes": 87543
+        }
+      ]
+    }
+  ]
+}
+```
+
+### File System Layout
+
+```
+output/
+├── manual.json                          # Extraction results
+└── manual_images/                       # Image directory
+    ├── manual_page2_img1.png           # Page 2, Image 1
+    ├── manual_page2_img2.jpeg          # Page 2, Image 2
+    ├── manual_page5_img1.png           # Page 5, Image 1
+    └── ...
+```
+
+---
+
+## Technical Implementation
+
+### Image Extraction Method
+
+```python
+def extract_images_from_page(self, page, page_num):
+    """Extract images from PDF page and save to disk"""
+
+    extracted = []
+    image_list = page.get_images()
+
+    for img_index, img in enumerate(image_list):
+        # Get image data from PDF
+        xref = img[0]
+        base_image = self.doc.extract_image(xref)
+
+        image_bytes = base_image["image"]
+        image_ext = base_image["ext"]
+        width = base_image.get("width", 0)
+        height = base_image.get("height", 0)
+
+        # Filter small images
+        if width < self.min_image_size or height < self.min_image_size:
+            continue
+
+        # Generate filename
+        image_filename = f"{pdf_basename}_page{page_num+1}_img{img_index+1}.{image_ext}"
+        image_path = Path(self.image_dir) / image_filename
+
+        # Save image
+        with open(image_path, "wb") as f:
+            f.write(image_bytes)
+
+        # Store metadata
+        image_info = {
+            'filename': image_filename,
+            'path': str(image_path),
+            'page_number': page_num + 1,
+            'width': width,
+            'height': height,
+            'format': image_ext,
+            'size_bytes': len(image_bytes),
+        }
+
+        extracted.append(image_info)
+
+    return extracted
+```
+
+---
+
+## Performance
+
+### Extraction Speed
+
+| PDF Size | Images | Extraction Time | Overhead |
+|----------|--------|-----------------|----------|
+| Small (10 pages, 5 images) | 5 | +200ms | ~10% |
+| Medium (100 pages, 50 images) | 50 | +2s | ~15% |
+| Large (500 pages, 200 images) | 200 | +8s | ~20% |
+
+**Note:** Image extraction adds 10-20% overhead depending on image count and size.
+
+### Storage Requirements
+
+- **PNG images:** ~10-500 KB each (diagrams)
+- **JPEG images:** ~50-2000 KB each (screenshots)
+- **Typical documentation (100 pages):** ~50-200 MB total
+
+---
+
+## Supported Image Formats
+
+PyMuPDF automatically handles format detection and extraction:
+
+- ✅ PNG (lossless, best for diagrams)
+- ✅ JPEG (lossy, best for photos)
+- ✅ GIF (animated, rare in PDFs)
+- ✅ BMP (uncompressed)
+- ✅ TIFF (high quality)
+
+Images are extracted in their original format.
+
+---
+
+## Filtering Strategy
+
+### Why Filter Small Images?
+
+PDFs often contain:
+- **Icons:** 16x16, 32x32 (UI elements)
+- **Bullets:** 8x8, 12x12 (decorative)
+- **Logos:** 50x50, 100x100 (branding)
+
+These are usually not useful for documentation skills.
+
+### Recommended Thresholds
+
+| Use Case | Min Size | Reasoning |
+|----------|----------|-----------|
+| **General docs** | 100x100 | Filters icons, keeps diagrams |
+| **Technical diagrams** | 200x200 | Only meaningful charts |
+| **Screenshots** | 300x300 | Only full-size screenshots |
+| **All images** | 0 | No filtering |
+
+**Set with:** `--min-image-size N`
+
+---
+
+## Integration with Skill Seeker
+
+### Future Workflow (Task B1.6+)
+
+When building PDF-based skills, images will be:
+
+1. **Extracted** from PDF documentation
+2. **Organized** into skill's `assets/` directory
+3. **Referenced** in SKILL.md and reference files
+4. **Packaged** in final .zip file
+
+**Example:**
+```markdown
+# API Architecture
+
+See diagram below for the complete API flow:
+
+![API Flow](assets/images/api_flow.png)
+
+The diagram shows...
+```
+
+---
+
+## Limitations
+
+### Current Limitations
+
+1. **No OCR**
+   - Cannot extract text from images
+   - Code screenshots are not parsed
+   - Future: Add OCR support for code in images
+
+2. **No Image Analysis**
+   - Cannot detect diagram types (flowchart, UML, etc.)
+   - Cannot extract captions
+   - Future: Add AI-based image classification
+
+3. **No Deduplication**
+   - Same image on multiple pages extracted multiple times
+   - Future: Add image hash-based deduplication
+
+4. **Format Preservation**
+   - Images saved in original format (no conversion)
+   - No optimization or compression
+
+### Known Issues
+
+1. **Vector Graphics**
+   - Some PDFs use vector graphics (not images)
+   - These are not extracted (rendered as part of page)
+   - Workaround: Use PDF-to-image tools first
+
+2. **Embedded vs Referenced**
+   - Only embedded images are extracted
+   - External image references are not followed
+
+3. **Image Quality**
+   - Quality depends on PDF source
+   - Low-res source = low-res output
+
+---
+
+## Troubleshooting
+
+### No Images Extracted
+
+**Problem:** `total_extracted_images: 0` but PDF has visible images
+
+**Possible causes:**
+1. Images are vector graphics (not raster)
+2. Images smaller than `--min-image-size` threshold
+3. Images are page backgrounds (not embedded images)
+
+**Solution:**
+```bash
+# Try with no size filter
+python3 cli/pdf_extractor_poc.py input.pdf --extract-images --min-image-size 0 -v
+```
+
+### Permission Errors
+
+**Problem:** `PermissionError: [Errno 13] Permission denied`
+
+**Solution:**
+```bash
+# Ensure output directory is writable
+mkdir -p output/images
+chmod 755 output/images
+
+# Or specify different directory
+python3 cli/pdf_extractor_poc.py input.pdf --extract-images --image-dir ~/my_images/
+```
+
+### Disk Space
+
+**Problem:** Running out of disk space
+
+**Solution:**
+```bash
+# Check PDF size first
+du -h input.pdf
+
+# Estimate: ~100-200 MB per 100 pages with images
+# Use higher min-image-size to extract fewer images
+python3 cli/pdf_extractor_poc.py input.pdf --extract-images --min-image-size 300
+```
+
+---
+
+## Examples
+
+### Extract Diagram-Heavy Documentation
+
+```bash
+# Architecture documentation with many diagrams
+python3 cli/pdf_extractor_poc.py architecture.pdf \
+  --extract-images \
+  --min-image-size 250 \
+  --image-dir docs/diagrams/ \
+  -v
+```
+
+**Result:** High-quality diagrams extracted, icons filtered out.
+
+### Tutorial with Screenshots
+
+```bash
+# Tutorial with step-by-step screenshots
+python3 cli/pdf_extractor_poc.py tutorial.pdf \
+  --extract-images \
+  --min-image-size 400 \
+  --image-dir tutorial_screenshots/ \
+  -v
+```
+
+**Result:** Full screenshots extracted, UI icons ignored.
+
+### API Reference with Small Charts
+
+```bash
+# API docs with various image sizes
+python3 cli/pdf_extractor_poc.py api_reference.pdf \
+  --extract-images \
+  --min-image-size 150 \
+  -o api.json \
+  --pretty
+```
+
+**Result:** Charts and graphs extracted, small icons filtered.
+
+---
+
+## Command-Line Reference
+
+### Image Extraction Options
+
+```
+--extract-images
+    Enable image extraction to files
+    Default: disabled
+
+--image-dir PATH
+    Directory to save extracted images
+    Default: output/{pdf_name}_images/
+
+--min-image-size PIXELS
+    Minimum image dimension (width or height)
+    Filters out icons and small decorations
+    Default: 100
+```
+
+### Complete Example
+
+```bash
+python3 cli/pdf_extractor_poc.py manual.pdf \
+  --extract-images \
+  --image-dir assets/images/ \
+  --min-image-size 200 \
+  --min-quality 7.0 \
+  --chunk-size 15 \
+  --output manual.json \
+  --verbose \
+  --pretty
+```
+
+---
+
+## Comparison: Before vs After
+
+| Feature | Before (B1.4) | After (B1.5) |
+|---------|---------------|--------------|
+| Image detection | ✅ Count only | ✅ Count + Extract |
+| Image files | ❌ Not saved | ✅ Saved to disk |
+| Image metadata | ❌ None | ✅ Full metadata |
+| Size filtering | ❌ None | ✅ Configurable |
+| Directory organization | ❌ N/A | ✅ Automatic |
+| Format support | ❌ N/A | ✅ All formats |
+
+---
+
+## Next Steps
+
+### Task B1.6: Full PDF Scraper CLI
+
+The image extraction feature will be integrated into the full PDF scraper:
+
+```bash
+# Future: Full PDF scraper with images
+python3 cli/pdf_scraper.py \
+  --config configs/manual_pdf.json \
+  --extract-images \
+  --enhance-local
+```
+
+### Task B1.7: MCP Tool Integration
+
+Images will be available through MCP:
+
+```python
+# Future: MCP tool
+result = mcp.scrape_pdf(
+    pdf_path="manual.pdf",
+    extract_images=True,
+    min_image_size=200
+)
+```
+
+---
+
+## Conclusion
+
+Task B1.5 successfully implements:
+- ✅ Image extraction from PDF pages
+- ✅ Automatic file saving with metadata
+- ✅ Size-based filtering (configurable)
+- ✅ Organized directory structure
+- ✅ Multiple format support
+
+**Impact:**
+- Preserves visual documentation
+- Essential for diagram-heavy docs
+- Improves skill completeness
+
+**Performance:** 10-20% overhead (acceptable)
+
+**Compatibility:** Backward compatible (images optional)
+
+**Ready for B1.6:** Full PDF scraper CLI tool
+
+---
+
+**Task Completed:** October 21, 2025
+**Next Task:** B1.6 - Create `pdf_scraper.py` CLI tool

+ 437 - 0
libs/external/Skill_Seekers-development/docs/PDF_MCP_TOOL.md

@@ -0,0 +1,437 @@
+# PDF Scraping MCP Tool (Task B1.7)
+
+**Status:** ✅ Completed
+**Date:** October 21, 2025
+**Task:** B1.7 - Add MCP tool `scrape_pdf`
+
+---
+
+## Overview
+
+Task B1.7 adds the `scrape_pdf` MCP tool to the Skill Seeker MCP server, making PDF documentation scraping available through the Model Context Protocol. This allows Claude Code and other MCP clients to scrape PDF documentation directly.
+
+## Features
+
+### ✅ MCP Tool Integration
+
+- **Tool name:** `scrape_pdf`
+- **Description:** Scrape PDF documentation and build Claude skill
+- **Supports:** All three usage modes (config, direct, from-json)
+- **Integration:** Uses `cli/pdf_scraper.py` backend
+
+### ✅ Three Usage Modes
+
+1. **Config File Mode** - Use PDF config JSON
+2. **Direct PDF Mode** - Quick conversion from PDF file
+3. **From JSON Mode** - Build from pre-extracted data
+
+---
+
+## Usage
+
+### Mode 1: Config File
+
+```python
+# Through MCP
+result = await mcp.call_tool("scrape_pdf", {
+    "config_path": "configs/manual_pdf.json"
+})
+```
+
+**Example config** (`configs/manual_pdf.json`):
+```json
+{
+  "name": "mymanual",
+  "description": "My Manual documentation",
+  "pdf_path": "docs/manual.pdf",
+  "extract_options": {
+    "chunk_size": 10,
+    "min_quality": 6.0,
+    "extract_images": true,
+    "min_image_size": 150
+  },
+  "categories": {
+    "getting_started": ["introduction", "setup"],
+    "api": ["api", "reference"],
+    "tutorial": ["tutorial", "example"]
+  }
+}
+```
+
+**Output:**
+```
+🔍 Extracting from PDF: docs/manual.pdf
+📄 Extracting from: docs/manual.pdf
+   Pages: 150
+   ...
+✅ Extraction complete
+
+🏗️  Building skill: mymanual
+📋 Categorizing content...
+✅ Created 3 categories
+
+📝 Generating reference files...
+   Generated: output/mymanual/references/getting_started.md
+   Generated: output/mymanual/references/api.md
+   Generated: output/mymanual/references/tutorial.md
+
+✅ Skill built successfully: output/mymanual/
+
+📦 Next step: Package with: python3 cli/package_skill.py output/mymanual/
+```
+
+### Mode 2: Direct PDF
+
+```python
+# Through MCP
+result = await mcp.call_tool("scrape_pdf", {
+    "pdf_path": "manual.pdf",
+    "name": "mymanual",
+    "description": "My Manual Docs"
+})
+```
+
+**Uses default settings:**
+- Chunk size: 10
+- Min quality: 5.0
+- Extract images: true
+- Chapter-based categorization
+
+### Mode 3: From Extracted JSON
+
+```python
+# Step 1: Extract to JSON (separate tool or CLI)
+# python3 cli/pdf_extractor_poc.py manual.pdf -o manual_extracted.json
+
+# Step 2: Build skill from JSON via MCP
+result = await mcp.call_tool("scrape_pdf", {
+    "from_json": "output/manual_extracted.json"
+})
+```
+
+**Benefits:**
+- Separate extraction and building
+- Fast iteration on skill structure
+- No re-extraction needed
+
+---
+
+## MCP Tool Definition
+
+### Input Schema
+
+```json
+{
+  "name": "scrape_pdf",
+  "description": "Scrape PDF documentation and build Claude skill. Extracts text, code, and images from PDF files (NEW in B1.7).",
+  "inputSchema": {
+    "type": "object",
+    "properties": {
+      "config_path": {
+        "type": "string",
+        "description": "Path to PDF config JSON file (e.g., configs/manual_pdf.json)"
+      },
+      "pdf_path": {
+        "type": "string",
+        "description": "Direct PDF path (alternative to config_path)"
+      },
+      "name": {
+        "type": "string",
+        "description": "Skill name (required with pdf_path)"
+      },
+      "description": {
+        "type": "string",
+        "description": "Skill description (optional)"
+      },
+      "from_json": {
+        "type": "string",
+        "description": "Build from extracted JSON file (e.g., output/manual_extracted.json)"
+      }
+    },
+    "required": []
+  }
+}
+```
+
+### Return Format
+
+Returns `TextContent` with:
+- Success: stdout from `pdf_scraper.py`
+- Failure: stderr + stdout for debugging
+
+---
+
+## Implementation
+
+### MCP Server Changes
+
+**Location:** `skill_seeker_mcp/server.py`
+
+**Changes:**
+1. Added `scrape_pdf` to `list_tools()` (lines 220-249)
+2. Added handler in `call_tool()` (lines 276-277)
+3. Implemented `scrape_pdf_tool()` function (lines 591-625)
+
+### Code Implementation
+
+```python
+async def scrape_pdf_tool(args: dict) -> list[TextContent]:
+    """Scrape PDF documentation and build skill (NEW in B1.7)"""
+    config_path = args.get("config_path")
+    pdf_path = args.get("pdf_path")
+    name = args.get("name")
+    description = args.get("description")
+    from_json = args.get("from_json")
+
+    # Build command
+    cmd = [sys.executable, str(CLI_DIR / "pdf_scraper.py")]
+
+    # Mode 1: Config file
+    if config_path:
+        cmd.extend(["--config", config_path])
+
+    # Mode 2: Direct PDF
+    elif pdf_path and name:
+        cmd.extend(["--pdf", pdf_path, "--name", name])
+        if description:
+            cmd.extend(["--description", description])
+
+    # Mode 3: From JSON
+    elif from_json:
+        cmd.extend(["--from-json", from_json])
+
+    else:
+        return [TextContent(type="text", text="❌ Error: Must specify --config, --pdf + --name, or --from-json")]
+
+    # Run pdf_scraper.py
+    result = subprocess.run(cmd, capture_output=True, text=True)
+
+    if result.returncode == 0:
+        return [TextContent(type="text", text=result.stdout)]
+    else:
+        return [TextContent(type="text", text=f"Error: {result.stderr}\n\n{result.stdout}")]
+```
+
+---
+
+## Integration with MCP Workflow
+
+### Complete Workflow Through MCP
+
+```python
+# 1. Create PDF config (optional - can use direct mode)
+config_result = await mcp.call_tool("generate_config", {
+    "name": "api_manual",
+    "url": "N/A",  # Not used for PDF
+    "description": "API Manual from PDF"
+})
+
+# 2. Scrape PDF
+scrape_result = await mcp.call_tool("scrape_pdf", {
+    "pdf_path": "docs/api_manual.pdf",
+    "name": "api_manual",
+    "description": "API Manual Documentation"
+})
+
+# 3. Package skill
+package_result = await mcp.call_tool("package_skill", {
+    "skill_dir": "output/api_manual/",
+    "auto_upload": True  # Upload if ANTHROPIC_API_KEY set
+})
+
+# 4. Upload (if not auto-uploaded)
+if "ANTHROPIC_API_KEY" in os.environ:
+    upload_result = await mcp.call_tool("upload_skill", {
+        "skill_zip": "output/api_manual.zip"
+    })
+```
+
+### Combined with Web Scraping
+
+```python
+# Scrape web documentation
+web_result = await mcp.call_tool("scrape_docs", {
+    "config_path": "configs/framework.json"
+})
+
+# Scrape PDF supplement
+pdf_result = await mcp.call_tool("scrape_pdf", {
+    "pdf_path": "docs/framework_api.pdf",
+    "name": "framework_pdf"
+})
+
+# Package both
+await mcp.call_tool("package_skill", {"skill_dir": "output/framework/"})
+await mcp.call_tool("package_skill", {"skill_dir": "output/framework_pdf/"})
+```
+
+---
+
+## Error Handling
+
+### Common Errors
+
+**Error 1: Missing required parameters**
+```
+❌ Error: Must specify --config, --pdf + --name, or --from-json
+```
+**Solution:** Provide one of the three modes
+
+**Error 2: PDF file not found**
+```
+Error: [Errno 2] No such file or directory: 'manual.pdf'
+```
+**Solution:** Check PDF path is correct
+
+**Error 3: PyMuPDF not installed**
+```
+ERROR: PyMuPDF not installed
+Install with: pip install PyMuPDF
+```
+**Solution:** Install PyMuPDF: `pip install PyMuPDF`
+
+**Error 4: Invalid JSON config**
+```
+Error: json.decoder.JSONDecodeError: Expecting value: line 1 column 1
+```
+**Solution:** Check config file is valid JSON
+
+---
+
+## Testing
+
+### Test MCP Tool
+
+```bash
+# 1. Start MCP server
+python3 skill_seeker_mcp/server.py
+
+# 2. Test with MCP client or via Claude Code
+
+# 3. Verify tool is listed
+# Should see "scrape_pdf" in available tools
+```
+
+### Test All Modes
+
+**Mode 1: Config**
+```python
+result = await mcp.call_tool("scrape_pdf", {
+    "config_path": "configs/example_pdf.json"
+})
+assert "✅ Skill built successfully" in result[0].text
+```
+
+**Mode 2: Direct**
+```python
+result = await mcp.call_tool("scrape_pdf", {
+    "pdf_path": "test.pdf",
+    "name": "test_skill"
+})
+assert "✅ Skill built successfully" in result[0].text
+```
+
+**Mode 3: From JSON**
+```python
+# First extract
+subprocess.run(["python3", "cli/pdf_extractor_poc.py", "test.pdf", "-o", "test.json"])
+
+# Then build via MCP
+result = await mcp.call_tool("scrape_pdf", {
+    "from_json": "test.json"
+})
+assert "✅ Skill built successfully" in result[0].text
+```
+
+---
+
+## Comparison with Other MCP Tools
+
+| Tool | Input | Output | Use Case |
+|------|-------|--------|----------|
+| `scrape_docs` | HTML URL | Skill | Web documentation |
+| `scrape_pdf` | PDF file | Skill | PDF documentation |
+| `generate_config` | URL | Config | Create web config |
+| `package_skill` | Skill dir | .zip | Package for upload |
+| `upload_skill` | .zip file | Upload | Send to Claude |
+
+---
+
+## Performance
+
+### MCP Tool Overhead
+
+- **MCP overhead:** ~50-100ms
+- **Extraction time:** Same as CLI (15s-5m depending on PDF)
+- **Building time:** Same as CLI (5s-45s)
+
+**Total:** MCP adds negligible overhead (<1%)
+
+### Async Execution
+
+The MCP tool runs `pdf_scraper.py` synchronously via `subprocess.run()`. For long-running PDFs:
+- Client waits for completion
+- No progress updates during extraction
+- Consider using `--from-json` mode for faster iteration
+
+---
+
+## Future Enhancements
+
+### Potential Improvements
+
+1. **Async Extraction**
+   - Stream progress updates to client
+   - Allow cancellation
+   - Background processing
+
+2. **Batch Processing**
+   - Process multiple PDFs in parallel
+   - Merge into single skill
+   - Shared categories
+
+3. **Enhanced Options**
+   - Pass all extraction options through MCP
+   - Dynamic quality threshold
+   - Image filter controls
+
+4. **Status Checking**
+   - Query extraction status
+   - Get progress percentage
+   - Estimate time remaining
+
+---
+
+## Conclusion
+
+Task B1.7 successfully implements:
+- ✅ MCP tool `scrape_pdf`
+- ✅ Three usage modes (config, direct, from-json)
+- ✅ Integration with MCP server
+- ✅ Error handling
+- ✅ Compatible with existing MCP workflow
+
+**Impact:**
+- PDF scraping available through MCP
+- Seamless integration with Claude Code
+- Unified workflow for web + PDF documentation
+- 10th MCP tool in Skill Seeker
+
+**Total MCP Tools:** 10
+1. generate_config
+2. estimate_pages
+3. scrape_docs
+4. package_skill
+5. upload_skill
+6. list_configs
+7. validate_config
+8. split_config
+9. generate_router
+10. **scrape_pdf** (NEW)
+
+---
+
+**Task Completed:** October 21, 2025
+**B1 Group Complete:** All 8 tasks (B1.1-B1.8) finished!
+
+**Next:** Task group B2 (Microsoft Word .docx support)

+ 491 - 0
libs/external/Skill_Seekers-development/docs/PDF_PARSING_RESEARCH.md

@@ -0,0 +1,491 @@
+# PDF Parsing Libraries Research (Task B1.1)
+
+**Date:** October 21, 2025
+**Task:** B1.1 - Research PDF parsing libraries
+**Purpose:** Evaluate Python libraries for extracting text and code from PDF documentation
+
+---
+
+## Executive Summary
+
+After comprehensive research, **PyMuPDF (fitz)** is recommended as the primary library for Skill Seeker's PDF parsing needs, with **pdfplumber** as a secondary option for complex table extraction.
+
+### Quick Recommendation:
+- **Primary Choice:** PyMuPDF (fitz) - Fast, comprehensive, well-maintained
+- **Secondary/Fallback:** pdfplumber - Better for tables, slower but more precise
+- **Avoid:** PyPDF2 (deprecated, merged into pypdf)
+
+---
+
+## Library Comparison Matrix
+
+| Library | Speed | Text Quality | Code Detection | Tables | Maintenance | License |
+|---------|-------|--------------|----------------|--------|-------------|---------|
+| **PyMuPDF** | ⚡⚡⚡⚡⚡ Fastest (42ms) | High | Excellent | Good | Active | AGPL/Commercial |
+| **pdfplumber** | ⚡⚡ Slower (2.5s) | Very High | Excellent | Excellent | Active | MIT |
+| **pypdf** | ⚡⚡⚡ Fast | Medium | Good | Basic | Active | BSD |
+| **pdfminer.six** | ⚡ Slow | Very High | Good | Medium | Active | MIT |
+| **pypdfium2** | ⚡⚡⚡⚡⚡ Very Fast (3ms) | Medium | Good | Basic | Active | Apache-2.0 |
+
+---
+
+## Detailed Analysis
+
+### 1. PyMuPDF (fitz) ⭐ RECOMMENDED
+
+**Performance:** 42 milliseconds (60x faster than pdfminer.six)
+
+**Installation:**
+```bash
+pip install PyMuPDF
+```
+
+**Pros:**
+- ✅ Extremely fast (C-based MuPDF backend)
+- ✅ Comprehensive features (text, images, tables, metadata)
+- ✅ Supports markdown output
+- ✅ Can extract images and diagrams
+- ✅ Well-documented and actively maintained
+- ✅ Handles complex layouts well
+
+**Cons:**
+- ⚠️ AGPL license (requires commercial license for proprietary projects)
+- ⚠️ Requires MuPDF binary installation (handled by pip)
+- ⚠️ Slightly larger dependency footprint
+
+**Code Example:**
+```python
+import fitz  # PyMuPDF
+
+# Extract text from entire PDF
+def extract_pdf_text(pdf_path):
+    doc = fitz.open(pdf_path)
+    text = ''
+    for page in doc:
+        text += page.get_text()
+    doc.close()
+    return text
+
+# Extract text from single page
+def extract_page_text(pdf_path, page_num):
+    doc = fitz.open(pdf_path)
+    page = doc.load_page(page_num)
+    text = page.get_text()
+    doc.close()
+    return text
+
+# Extract with markdown formatting
+def extract_as_markdown(pdf_path):
+    doc = fitz.open(pdf_path)
+    markdown = ''
+    for page in doc:
+        markdown += page.get_text("markdown")
+    doc.close()
+    return markdown
+```
+
+**Use Cases for Skill Seeker:**
+- Fast extraction of code examples from PDF docs
+- Preserving formatting for code blocks
+- Extracting diagrams and screenshots
+- High-volume documentation scraping
+
+---
+
+### 2. pdfplumber ⭐ RECOMMENDED (for tables)
+
+**Performance:** ~2.5 seconds (slower but more precise)
+
+**Installation:**
+```bash
+pip install pdfplumber
+```
+
+**Pros:**
+- ✅ MIT license (fully open source)
+- ✅ Exceptional table extraction
+- ✅ Visual debugging tool
+- ✅ Precise layout preservation
+- ✅ Built on pdfminer (proven text extraction)
+- ✅ No binary dependencies
+
+**Cons:**
+- ⚠️ Slower than PyMuPDF
+- ⚠️ Higher memory usage for large PDFs
+- ⚠️ Requires more configuration for optimal results
+
+**Code Example:**
+```python
+import pdfplumber
+
+# Extract text from PDF
+def extract_with_pdfplumber(pdf_path):
+    with pdfplumber.open(pdf_path) as pdf:
+        text = ''
+        for page in pdf.pages:
+            text += page.extract_text()
+        return text
+
+# Extract tables
+def extract_tables(pdf_path):
+    tables = []
+    with pdfplumber.open(pdf_path) as pdf:
+        for page in pdf.pages:
+            page_tables = page.extract_tables()
+            tables.extend(page_tables)
+    return tables
+
+# Extract specific region (for code blocks)
+def extract_region(pdf_path, page_num, bbox):
+    with pdfplumber.open(pdf_path) as pdf:
+        page = pdf.pages[page_num]
+        cropped = page.crop(bbox)
+        return cropped.extract_text()
+```
+
+**Use Cases for Skill Seeker:**
+- Extracting API reference tables from PDFs
+- Precise code block extraction with layout
+- Documentation with complex table structures
+
+---
+
+### 3. pypdf (formerly PyPDF2)
+
+**Performance:** Fast (medium speed)
+
+**Installation:**
+```bash
+pip install pypdf
+```
+
+**Pros:**
+- ✅ BSD license
+- ✅ Simple API
+- ✅ Can modify PDFs (merge, split, encrypt)
+- ✅ Actively maintained (PyPDF2 merged back)
+- ✅ No external dependencies
+
+**Cons:**
+- ⚠️ Limited complex layout support
+- ⚠️ Basic text extraction only
+- ⚠️ Poor with scanned/image PDFs
+- ⚠️ No table extraction
+
+**Code Example:**
+```python
+from pypdf import PdfReader
+
+# Extract text
+def extract_with_pypdf(pdf_path):
+    reader = PdfReader(pdf_path)
+    text = ''
+    for page in reader.pages:
+        text += page.extract_text()
+    return text
+```
+
+**Use Cases for Skill Seeker:**
+- Simple text extraction
+- Fallback when PyMuPDF licensing is an issue
+- Basic PDF manipulation tasks
+
+---
+
+### 4. pdfminer.six
+
+**Performance:** Slow (~2.5 seconds)
+
+**Installation:**
+```bash
+pip install pdfminer.six
+```
+
+**Pros:**
+- ✅ MIT license
+- ✅ Excellent text quality (preserves formatting)
+- ✅ Handles complex layouts
+- ✅ Pure Python (no binaries)
+
+**Cons:**
+- ⚠️ Slowest option
+- ⚠️ Complex API
+- ⚠️ Poor documentation
+- ⚠️ Limited table support
+
+**Use Cases for Skill Seeker:**
+- Not recommended (pdfplumber is built on this with better API)
+
+---
+
+### 5. pypdfium2
+
+**Performance:** Very fast (3ms - fastest tested)
+
+**Installation:**
+```bash
+pip install pypdfium2
+```
+
+**Pros:**
+- ✅ Extremely fast
+- ✅ Apache 2.0 license
+- ✅ Lightweight
+- ✅ Clean output
+
+**Cons:**
+- ⚠️ Basic features only
+- ⚠️ Limited documentation
+- ⚠️ No table extraction
+- ⚠️ Newer/less proven
+
+**Use Cases for Skill Seeker:**
+- High-speed basic extraction
+- Potential future optimization
+
+---
+
+## Licensing Considerations
+
+### Open Source Projects (Skill Seeker):
+- **PyMuPDF:** ✅ AGPL license is fine for open-source projects
+- **pdfplumber:** ✅ MIT license (most permissive)
+- **pypdf:** ✅ BSD license (permissive)
+
+### Important Note:
+PyMuPDF requires AGPL compliance (source code must be shared) OR a commercial license for proprietary use. Since Skill Seeker is open source on GitHub, AGPL is acceptable.
+
+---
+
+## Performance Benchmarks
+
+Based on 2025 testing:
+
+| Library | Time (single page) | Time (100 pages) |
+|---------|-------------------|------------------|
+| pypdfium2 | 0.003s | 0.3s |
+| PyMuPDF | 0.042s | 4.2s |
+| pypdf | 0.1s | 10s |
+| pdfplumber | 2.5s | 250s |
+| pdfminer.six | 2.5s | 250s |
+
+**Winner:** pypdfium2 (speed) / PyMuPDF (features + speed balance)
+
+---
+
+## Recommendations for Skill Seeker
+
+### Primary Approach: PyMuPDF (fitz)
+
+**Why:**
+1. **Speed** - 60x faster than alternatives
+2. **Features** - Text, images, markdown output, metadata
+3. **Quality** - High-quality text extraction
+4. **Maintained** - Active development, good docs
+5. **License** - AGPL is fine for open source
+
+**Implementation Strategy:**
+```python
+import fitz  # PyMuPDF
+
+def extract_pdf_documentation(pdf_path):
+    """
+    Extract documentation from PDF with code block detection
+    """
+    doc = fitz.open(pdf_path)
+    pages = []
+
+    for page_num, page in enumerate(doc):
+        # Get text with layout info
+        text = page.get_text("text")
+
+        # Get markdown (preserves code blocks)
+        markdown = page.get_text("markdown")
+
+        # Get images (for diagrams)
+        images = page.get_images()
+
+        pages.append({
+            'page_number': page_num,
+            'text': text,
+            'markdown': markdown,
+            'images': images
+        })
+
+    doc.close()
+    return pages
+```
+
+### Fallback Approach: pdfplumber
+
+**When to use:**
+- PDF has complex tables that PyMuPDF misses
+- Need visual debugging
+- License concerns (use MIT instead of AGPL)
+
+**Implementation Strategy:**
+```python
+import pdfplumber
+
+def extract_pdf_tables(pdf_path):
+    """
+    Extract tables from PDF documentation
+    """
+    with pdfplumber.open(pdf_path) as pdf:
+        tables = []
+        for page in pdf.pages:
+            page_tables = page.extract_tables()
+            if page_tables:
+                tables.extend(page_tables)
+        return tables
+```
+
+---
+
+## Code Block Detection Strategy
+
+PDFs don't have semantic "code block" markers like HTML. Detection strategies:
+
+### 1. Font-based Detection
+```python
+# PyMuPDF can detect font changes
+def detect_code_by_font(page):
+    blocks = page.get_text("dict")["blocks"]
+    code_blocks = []
+
+    for block in blocks:
+        if 'lines' in block:
+            for line in block['lines']:
+                for span in line['spans']:
+                    font = span['font']
+                    # Monospace fonts indicate code
+                    if 'Courier' in font or 'Mono' in font:
+                        code_blocks.append(span['text'])
+
+    return code_blocks
+```
+
+### 2. Indentation-based Detection
+```python
+def detect_code_by_indent(text):
+    lines = text.split('\n')
+    code_blocks = []
+    current_block = []
+
+    for line in lines:
+        # Code often has consistent indentation
+        if line.startswith('    ') or line.startswith('\t'):
+            current_block.append(line)
+        elif current_block:
+            code_blocks.append('\n'.join(current_block))
+            current_block = []
+
+    return code_blocks
+```
+
+### 3. Pattern-based Detection
+```python
+import re
+
+def detect_code_by_pattern(text):
+    # Look for common code patterns
+    patterns = [
+        r'(def \w+\(.*?\):)',  # Python functions
+        r'(function \w+\(.*?\) \{)',  # JavaScript
+        r'(class \w+:)',  # Python classes
+        r'(import \w+)',  # Import statements
+    ]
+
+    code_snippets = []
+    for pattern in patterns:
+        matches = re.findall(pattern, text)
+        code_snippets.extend(matches)
+
+    return code_snippets
+```
+
+---
+
+## Next Steps (Task B1.2+)
+
+### Immediate Next Task: B1.2 - Create Simple PDF Text Extractor
+
+**Goal:** Proof of concept using PyMuPDF
+
+**Implementation Plan:**
+1. Create `cli/pdf_extractor_poc.py`
+2. Extract text from sample PDF
+3. Detect code blocks using font/pattern matching
+4. Output to JSON (similar to web scraper)
+
+**Dependencies:**
+```bash
+pip install PyMuPDF
+```
+
+**Expected Output:**
+```json
+{
+  "pages": [
+    {
+      "page_number": 1,
+      "text": "...",
+      "code_blocks": ["def main():", "import sys"],
+      "images": []
+    }
+  ]
+}
+```
+
+### Future Tasks:
+- **B1.3:** Add page chunking (split large PDFs)
+- **B1.4:** Improve code block detection
+- **B1.5:** Extract images/diagrams
+- **B1.6:** Create full `pdf_scraper.py` CLI
+- **B1.7:** Add MCP tool integration
+- **B1.8:** Create PDF config format
+
+---
+
+## Additional Resources
+
+### Documentation:
+- PyMuPDF: https://pymupdf.readthedocs.io/
+- pdfplumber: https://github.com/jsvine/pdfplumber
+- pypdf: https://pypdf.readthedocs.io/
+
+### Comparison Studies:
+- 2025 Comparative Study: https://arxiv.org/html/2410.09871v1
+- Performance Benchmarks: https://github.com/py-pdf/benchmarks
+
+### Example Use Cases:
+- Extracting API docs from PDF manuals
+- Converting PDF guides to markdown
+- Building skills from PDF-only documentation
+
+---
+
+## Conclusion
+
+**For Skill Seeker's PDF documentation extraction:**
+
+1. **Use PyMuPDF (fitz)** as primary library
+2. **Add pdfplumber** for complex table extraction
+3. **Detect code blocks** using font + pattern matching
+4. **Preserve formatting** with markdown output
+5. **Extract images** for diagrams/screenshots
+
+**Estimated Implementation Time:**
+- B1.2 (POC): 2-3 hours
+- B1.3-B1.5 (Features): 5-8 hours
+- B1.6 (CLI): 3-4 hours
+- B1.7 (MCP): 2-3 hours
+- B1.8 (Config): 1-2 hours
+- **Total: 13-20 hours** for complete PDF support
+
+**License:** AGPL (PyMuPDF) is acceptable for Skill Seeker (open source)
+
+---
+
+**Research completed:** ✅ October 21, 2025
+**Next task:** B1.2 - Create simple PDF text extractor (proof of concept)

+ 616 - 0
libs/external/Skill_Seekers-development/docs/PDF_SCRAPER.md

@@ -0,0 +1,616 @@
+# PDF Scraper CLI Tool (Tasks B1.6 + B1.8)
+
+**Status:** ✅ Completed
+**Date:** October 21, 2025
+**Tasks:** B1.6 - Create pdf_scraper.py CLI tool, B1.8 - PDF config format
+
+---
+
+## Overview
+
+The PDF scraper (`pdf_scraper.py`) is a complete CLI tool that converts PDF documentation into Claude AI skills. It integrates all PDF extraction features (B1.1-B1.5) with the Skill Seeker workflow to produce packaged, uploadable skills.
+
+## Features
+
+### ✅ Complete Workflow
+
+1. **Extract** - Uses `pdf_extractor_poc.py` for extraction
+2. **Categorize** - Organizes content by chapters or keywords
+3. **Build** - Creates skill structure (SKILL.md, references/)
+4. **Package** - Ready for `package_skill.py`
+
+### ✅ Three Usage Modes
+
+1. **Config File** - Use JSON configuration (recommended)
+2. **Direct PDF** - Quick conversion from PDF file
+3. **From JSON** - Build skill from pre-extracted data
+
+### ✅ Automatic Categorization
+
+- Chapter-based (from PDF structure)
+- Keyword-based (configurable)
+- Fallback to single category
+
+### ✅ Quality Filtering
+
+- Uses quality scores from B1.4
+- Extracts top code examples
+- Filters by minimum quality threshold
+
+---
+
+## Usage
+
+### Mode 1: Config File (Recommended)
+
+```bash
+# Create config file
+cat > configs/my_manual.json <<EOF
+{
+  "name": "mymanual",
+  "description": "My Manual documentation",
+  "pdf_path": "docs/manual.pdf",
+  "extract_options": {
+    "chunk_size": 10,
+    "min_quality": 6.0,
+    "extract_images": true,
+    "min_image_size": 150
+  },
+  "categories": {
+    "getting_started": ["introduction", "setup"],
+    "api": ["api", "reference", "function"],
+    "tutorial": ["tutorial", "example", "guide"]
+  }
+}
+EOF
+
+# Run scraper
+python3 cli/pdf_scraper.py --config configs/my_manual.json
+```
+
+**Output:**
+```
+🔍 Extracting from PDF: docs/manual.pdf
+📄 Extracting from: docs/manual.pdf
+   Pages: 150
+   ...
+✅ Extraction complete
+
+💾 Saved extracted data to: output/mymanual_extracted.json
+
+🏗️  Building skill: mymanual
+📋 Categorizing content...
+✅ Created 3 categories
+   - Getting Started: 25 pages
+   - Api: 80 pages
+   - Tutorial: 45 pages
+
+📝 Generating reference files...
+   Generated: output/mymanual/references/getting_started.md
+   Generated: output/mymanual/references/api.md
+   Generated: output/mymanual/references/tutorial.md
+   Generated: output/mymanual/references/index.md
+   Generated: output/mymanual/SKILL.md
+
+✅ Skill built successfully: output/mymanual/
+
+📦 Next step: Package with: python3 cli/package_skill.py output/mymanual/
+```
+
+### Mode 2: Direct PDF
+
+```bash
+# Quick conversion without config file
+python3 cli/pdf_scraper.py --pdf manual.pdf --name mymanual --description "My Manual Docs"
+```
+
+**Uses default settings:**
+- Chunk size: 10
+- Min quality: 5.0
+- Extract images: true
+- Min image size: 100px
+- No custom categories (chapter-based)
+
+### Mode 3: From Extracted JSON
+
+```bash
+# Step 1: Extract only (saves JSON)
+python3 cli/pdf_extractor_poc.py manual.pdf -o manual_extracted.json --extract-images
+
+# Step 2: Build skill from JSON (fast, can iterate)
+python3 cli/pdf_scraper.py --from-json manual_extracted.json
+```
+
+**Benefits:**
+- Separate extraction and building
+- Iterate on skill structure without re-extracting
+- Faster development cycle
+
+---
+
+## Config File Format (Task B1.8)
+
+### Complete Example
+
+```json
+{
+  "name": "godot_manual",
+  "description": "Godot Engine documentation from PDF manual",
+  "pdf_path": "docs/godot_manual.pdf",
+  "extract_options": {
+    "chunk_size": 15,
+    "min_quality": 6.0,
+    "extract_images": true,
+    "min_image_size": 200
+  },
+  "categories": {
+    "getting_started": [
+      "introduction",
+      "getting started",
+      "installation",
+      "first steps"
+    ],
+    "scripting": [
+      "gdscript",
+      "scripting",
+      "code",
+      "programming"
+    ],
+    "3d": [
+      "3d",
+      "spatial",
+      "mesh",
+      "shader"
+    ],
+    "2d": [
+      "2d",
+      "sprite",
+      "tilemap",
+      "animation"
+    ],
+    "api": [
+      "api",
+      "class reference",
+      "method",
+      "property"
+    ]
+  }
+}
+```
+
+### Field Reference
+
+#### Required Fields
+
+- **`name`** (string): Skill identifier
+  - Used for directory names
+  - Should be lowercase, no spaces
+  - Example: `"python_guide"`
+
+- **`pdf_path`** (string): Path to PDF file
+  - Absolute or relative to working directory
+  - Example: `"docs/manual.pdf"`
+
+#### Optional Fields
+
+- **`description`** (string): Skill description
+  - Shows in SKILL.md
+  - Explains when to use the skill
+  - Default: `"Documentation skill for {name}"`
+
+- **`extract_options`** (object): Extraction settings
+  - `chunk_size` (number): Pages per chunk (default: 10)
+  - `min_quality` (number): Minimum code quality 0-10 (default: 5.0)
+  - `extract_images` (boolean): Extract images to files (default: true)
+  - `min_image_size` (number): Minimum image dimension in pixels (default: 100)
+
+- **`categories`** (object): Keyword-based categorization
+  - Keys: Category names (will be sanitized for filenames)
+  - Values: Arrays of keywords to match
+  - If omitted: Uses chapter-based categorization from PDF
+
+---
+
+## Output Structure
+
+### Generated Files
+
+```
+output/
+├── mymanual_extracted.json          # Raw extraction data (B1.5 format)
+└── mymanual/                        # Skill directory
+    ├── SKILL.md                     # Main skill file
+    ├── references/                  # Reference documentation
+    │   ├── index.md                 # Category index
+    │   ├── getting_started.md       # Category 1
+    │   ├── api.md                   # Category 2
+    │   └── tutorial.md              # Category 3
+    ├── scripts/                     # Empty (for user scripts)
+    └── assets/                      # Assets directory
+        └── images/                  # Extracted images (if enabled)
+            ├── mymanual_page5_img1.png
+            └── mymanual_page12_img2.jpeg
+```
+
+### SKILL.md Format
+
+```markdown
+# Mymanual Documentation Skill
+
+My Manual documentation
+
+## When to use this skill
+
+Use this skill when the user asks about mymanual documentation,
+including API references, tutorials, examples, and best practices.
+
+## What's included
+
+This skill contains:
+
+- **Getting Started**: 25 pages
+- **Api**: 80 pages
+- **Tutorial**: 45 pages
+
+## Quick Reference
+
+### Top Code Examples
+
+**Example 1** (Quality: 8.5/10):
+
+```python
+def initialize_system():
+    config = load_config()
+    setup_logging(config)
+    return System(config)
+```
+
+**Example 2** (Quality: 8.2/10):
+
+```javascript
+const app = createApp({
+  data() {
+    return { count: 0 }
+  }
+})
+```
+
+## Navigation
+
+See `references/index.md` for complete documentation structure.
+
+## Languages Covered
+
+- python: 45 examples
+- javascript: 32 examples
+- shell: 8 examples
+```
+
+### Reference File Format
+
+Each category gets its own reference file:
+
+```markdown
+# Getting Started
+
+## Installation
+
+This guide will walk you through installing the software...
+
+### Code Examples
+
+```bash
+curl -O https://example.com/install.sh
+bash install.sh
+```
+
+---
+
+## Configuration
+
+After installation, configure your environment...
+
+### Code Examples
+
+```yaml
+server:
+  port: 8080
+  host: localhost
+```
+
+---
+```
+
+---
+
+## Categorization Logic
+
+### Chapter-Based (Automatic)
+
+If PDF has detectable chapters (from B1.3):
+
+1. Extract chapter titles and page ranges
+2. Create one category per chapter
+3. Assign pages to chapters by page number
+
+**Advantages:**
+- Automatic, no config needed
+- Respects document structure
+- Accurate page assignment
+
+**Example chapters:**
+- "Chapter 1: Introduction" → `chapter_1_introduction.md`
+- "Part 2: Advanced Topics" → `part_2_advanced_topics.md`
+
+### Keyword-Based (Configurable)
+
+If `categories` config is provided:
+
+1. Score each page against keyword lists
+2. Assign to highest-scoring category
+3. Fall back to "other" if no match
+
+**Advantages:**
+- Flexible, customizable
+- Works with PDFs without clear chapters
+- Can combine related sections
+
+**Scoring:**
+- Keyword in page text: +1 point
+- Keyword in page heading: +2 points
+- Assigned to category with highest score
+
+---
+
+## Integration with Skill Seeker
+
+### Complete Workflow
+
+```bash
+# 1. Create PDF config
+cat > configs/api_manual.json <<EOF
+{
+  "name": "api_manual",
+  "pdf_path": "docs/api.pdf",
+  "extract_options": {
+    "min_quality": 7.0,
+    "extract_images": true
+  }
+}
+EOF
+
+# 2. Run PDF scraper
+python3 cli/pdf_scraper.py --config configs/api_manual.json
+
+# 3. Package skill
+python3 cli/package_skill.py output/api_manual/
+
+# 4. Upload to Claude (if ANTHROPIC_API_KEY set)
+python3 cli/package_skill.py output/api_manual/ --upload
+
+# Result: api_manual.zip ready for Claude!
+```
+
+### Enhancement (Optional)
+
+```bash
+# After building, enhance with AI
+python3 cli/enhance_skill_local.py output/api_manual/
+
+# Or with API
+export ANTHROPIC_API_KEY=sk-ant-...
+python3 cli/enhance_skill.py output/api_manual/
+```
+
+---
+
+## Performance
+
+### Benchmark
+
+| PDF Size | Pages | Extraction | Building | Total |
+|----------|-------|------------|----------|-------|
+| Small | 50 | 30s | 5s | 35s |
+| Medium | 200 | 2m | 15s | 2m 15s |
+| Large | 500 | 5m | 45s | 5m 45s |
+
+**Extraction**: PDF → JSON (cpu-intensive)
+**Building**: JSON → Skill (fast, i/o-bound)
+
+### Optimization Tips
+
+1. **Use `--from-json` for iteration**
+   - Extract once, build many times
+   - Test categorization without re-extraction
+
+2. **Adjust chunk size**
+   - Larger chunks: Faster extraction
+   - Smaller chunks: Better chapter detection
+
+3. **Filter aggressively**
+   - Higher `min_quality`: Fewer low-quality code blocks
+   - Higher `min_image_size`: Fewer small images
+
+---
+
+## Examples
+
+### Example 1: Programming Language Manual
+
+```json
+{
+  "name": "python_reference",
+  "description": "Python 3.12 Language Reference",
+  "pdf_path": "python-3.12-reference.pdf",
+  "extract_options": {
+    "chunk_size": 20,
+    "min_quality": 7.0,
+    "extract_images": false
+  },
+  "categories": {
+    "basics": ["introduction", "basic", "syntax", "types"],
+    "functions": ["function", "lambda", "decorator"],
+    "classes": ["class", "object", "inheritance"],
+    "modules": ["module", "package", "import"],
+    "stdlib": ["library", "standard library", "built-in"]
+  }
+}
+```
+
+### Example 2: API Documentation
+
+```json
+{
+  "name": "rest_api_docs",
+  "description": "REST API Documentation",
+  "pdf_path": "api_docs.pdf",
+  "extract_options": {
+    "chunk_size": 10,
+    "min_quality": 6.0,
+    "extract_images": true,
+    "min_image_size": 200
+  },
+  "categories": {
+    "authentication": ["auth", "login", "token", "oauth"],
+    "users": ["user", "account", "profile"],
+    "products": ["product", "catalog", "inventory"],
+    "orders": ["order", "purchase", "checkout"],
+    "webhooks": ["webhook", "event", "callback"]
+  }
+}
+```
+
+### Example 3: Framework Documentation
+
+```json
+{
+  "name": "django_docs",
+  "description": "Django Web Framework Documentation",
+  "pdf_path": "django-4.2-docs.pdf",
+  "extract_options": {
+    "chunk_size": 15,
+    "min_quality": 6.5,
+    "extract_images": true
+  }
+}
+```
+*Note: No categories - uses chapter-based categorization*
+
+---
+
+## Troubleshooting
+
+### No Categories Created
+
+**Problem:** Only "content" or "other" category
+
+**Possible causes:**
+1. No chapters detected in PDF
+2. Keywords don't match content
+3. Config has empty categories
+
+**Solution:**
+```bash
+# Check extracted chapters
+cat output/mymanual_extracted.json | jq '.chapters'
+
+# If empty, add keyword categories to config
+# Or let it create single "content" category (OK for small PDFs)
+```
+
+### Low-Quality Code Blocks
+
+**Problem:** Too many poor code examples
+
+**Solution:**
+```json
+{
+  "extract_options": {
+    "min_quality": 7.0  // Increase threshold
+  }
+}
+```
+
+### Images Not Extracted
+
+**Problem:** No images in `assets/images/`
+
+**Solution:**
+```json
+{
+  "extract_options": {
+    "extract_images": true,  // Enable extraction
+    "min_image_size": 50     // Lower threshold
+  }
+}
+```
+
+---
+
+## Comparison with Web Scraper
+
+| Feature | Web Scraper | PDF Scraper |
+|---------|-------------|-------------|
+| Input | HTML websites | PDF files |
+| Crawling | Multi-page BFS | Single-file extraction |
+| Structure detection | CSS selectors | Font/heading analysis |
+| Categorization | URL patterns | Chapters/keywords |
+| Images | Referenced | Embedded (extracted) |
+| Code detection | `<pre><code>` | Font/indent/pattern |
+| Language detection | CSS classes | Pattern matching |
+| Quality scoring | No | Yes (B1.4) |
+| Chunking | No | Yes (B1.3) |
+
+---
+
+## Next Steps
+
+### Task B1.7: MCP Tool Integration
+
+The PDF scraper will be available through MCP:
+
+```python
+# Future: MCP tool
+result = mcp.scrape_pdf(
+    config_path="configs/manual.json"
+)
+
+# Or direct
+result = mcp.scrape_pdf(
+    pdf_path="manual.pdf",
+    name="mymanual",
+    extract_images=True
+)
+```
+
+---
+
+## Conclusion
+
+Tasks B1.6 and B1.8 successfully implement:
+
+**B1.6 - PDF Scraper CLI:**
+- ✅ Complete extraction → building workflow
+- ✅ Three usage modes (config, direct, from-json)
+- ✅ Automatic categorization (chapter or keyword-based)
+- ✅ Integration with Skill Seeker workflow
+- ✅ Quality filtering and top examples
+
+**B1.8 - PDF Config Format:**
+- ✅ JSON configuration format
+- ✅ Extraction options (chunk size, quality, images)
+- ✅ Category definitions (keyword-based)
+- ✅ Compatible with web scraper config style
+
+**Impact:**
+- Complete PDF documentation support
+- Parallel workflow to web scraping
+- Reusable extraction results
+- High-quality skill generation
+
+**Ready for B1.7:** MCP tool integration
+
+---
+
+**Tasks Completed:** October 21, 2025
+**Next Task:** B1.7 - Add MCP tool `scrape_pdf`

+ 576 - 0
libs/external/Skill_Seekers-development/docs/PDF_SYNTAX_DETECTION.md

@@ -0,0 +1,576 @@
+# PDF Code Block Syntax Detection (Task B1.4)
+
+**Status:** ✅ Completed
+**Date:** October 21, 2025
+**Task:** B1.4 - Extract code blocks from PDFs with syntax detection
+
+---
+
+## Overview
+
+Task B1.4 enhances the PDF extractor with advanced code block detection capabilities including:
+- **Confidence scoring** for language detection
+- **Syntax validation** to filter out false positives
+- **Quality scoring** to rank code blocks by usefulness
+- **Automatic filtering** of low-quality code
+
+This dramatically improves the accuracy and usefulness of extracted code samples from PDF documentation.
+
+---
+
+## New Features
+
+### ✅ 1. Confidence-Based Language Detection
+
+Enhanced language detection now returns both language and confidence score:
+
+**Before (B1.2):**
+```python
+lang = detect_language_from_code(code)  # Returns: 'python'
+```
+
+**After (B1.4):**
+```python
+lang, confidence = detect_language_from_code(code)  # Returns: ('python', 0.85)
+```
+
+**Confidence Calculation:**
+- Pattern matches are weighted (1-5 points)
+- Scores are normalized to 0-1 range
+- Higher confidence = more reliable detection
+
+**Example Pattern Weights:**
+```python
+'python': [
+    (r'\bdef\s+\w+\s*\(', 3),       # Strong indicator
+    (r'\bimport\s+\w+', 2),          # Medium indicator
+    (r':\s*$', 1),                   # Weak indicator (lines ending with :)
+]
+```
+
+### ✅ 2. Syntax Validation
+
+Validates detected code blocks to filter false positives:
+
+**Validation Checks:**
+1. **Not empty** - Rejects empty code blocks
+2. **Indentation consistency** (Python) - Detects mixed tabs/spaces
+3. **Balanced brackets** - Checks for unclosed parentheses, braces
+4. **Language-specific syntax** (JSON) - Attempts to parse
+5. **Natural language detection** - Filters out prose misidentified as code
+6. **Comment ratio** - Rejects blocks that are mostly comments
+
+**Output:**
+```json
+{
+  "code": "def example():\n    return True",
+  "language": "python",
+  "is_valid": true,
+  "validation_issues": []
+}
+```
+
+**Invalid example:**
+```json
+{
+  "code": "This is not code",
+  "language": "unknown",
+  "is_valid": false,
+  "validation_issues": ["May be natural language, not code"]
+}
+```
+
+### ✅ 3. Quality Scoring
+
+Each code block receives a quality score (0-10) based on multiple factors:
+
+**Scoring Factors:**
+1. **Language confidence** (+0 to +2.0 points)
+2. **Code length** (optimal: 20-500 chars, +1.0)
+3. **Line count** (optimal: 2-50 lines, +1.0)
+4. **Has definitions** (functions/classes, +1.5)
+5. **Meaningful variable names** (+1.0)
+6. **Syntax validation** (+1.0 if valid, -0.5 per issue)
+
+**Quality Tiers:**
+- **High quality (7-10):** Complete, valid, useful code examples
+- **Medium quality (4-7):** Partial or simple code snippets
+- **Low quality (0-4):** Fragments, false positives, invalid code
+
+**Example:**
+```python
+# High-quality code block (score: 8.5/10)
+def calculate_total(items):
+    total = 0
+    for item in items:
+        total += item.price
+    return total
+
+# Low-quality code block (score: 2.0/10)
+x = y
+```
+
+### ✅ 4. Quality Filtering
+
+Filter out low-quality code blocks automatically:
+
+```bash
+# Keep only high-quality code (score >= 7.0)
+python3 cli/pdf_extractor_poc.py input.pdf --min-quality 7.0
+
+# Keep medium and high quality (score >= 4.0)
+python3 cli/pdf_extractor_poc.py input.pdf --min-quality 4.0
+
+# No filtering (default)
+python3 cli/pdf_extractor_poc.py input.pdf
+```
+
+**Benefits:**
+- Reduces noise in output
+- Focuses on useful examples
+- Improves downstream skill quality
+
+### ✅ 5. Quality Statistics
+
+New summary statistics show overall code quality:
+
+```
+📊 Code Quality Statistics:
+   Average quality: 6.8/10
+   Average confidence: 78.5%
+   Valid code blocks: 45/52 (86.5%)
+   High quality (7+): 28
+   Medium quality (4-7): 17
+   Low quality (<4): 7
+```
+
+---
+
+## Output Format
+
+### Enhanced Code Block Object
+
+Each code block now includes quality metadata:
+
+```json
+{
+  "code": "def example():\n    return True",
+  "language": "python",
+  "confidence": 0.85,
+  "quality_score": 7.5,
+  "is_valid": true,
+  "validation_issues": [],
+  "detection_method": "font",
+  "font": "Courier-New"
+}
+```
+
+### Quality Statistics Object
+
+Top-level summary of code quality:
+
+```json
+{
+  "quality_statistics": {
+    "average_quality": 6.8,
+    "average_confidence": 0.785,
+    "valid_code_blocks": 45,
+    "invalid_code_blocks": 7,
+    "validation_rate": 0.865,
+    "high_quality_blocks": 28,
+    "medium_quality_blocks": 17,
+    "low_quality_blocks": 7
+  }
+}
+```
+
+---
+
+## Usage Examples
+
+### Basic Extraction with Quality Stats
+
+```bash
+python3 cli/pdf_extractor_poc.py manual.pdf -o output.json --pretty
+```
+
+**Output:**
+```
+✅ Extraction complete:
+   Total characters: 125,000
+   Code blocks found: 52
+   Headings found: 45
+   Images found: 12
+   Chunks created: 5
+   Chapters detected: 3
+   Languages detected: python, javascript, sql
+
+📊 Code Quality Statistics:
+   Average quality: 6.8/10
+   Average confidence: 78.5%
+   Valid code blocks: 45/52 (86.5%)
+   High quality (7+): 28
+   Medium quality (4-7): 17
+   Low quality (<4): 7
+```
+
+### Filter Low-Quality Code
+
+```bash
+# Keep only high-quality examples
+python3 cli/pdf_extractor_poc.py tutorial.pdf --min-quality 7.0 -v
+
+# Verbose output shows filtering:
+# 📄 Extracting from: tutorial.pdf
+# ...
+#   Filtered out 12 low-quality code blocks (min_quality=7.0)
+#
+# ✅ Extraction complete:
+#    Code blocks found: 28 (after filtering)
+```
+
+### Inspect Quality Scores
+
+```bash
+# Extract and view quality scores
+python3 cli/pdf_extractor_poc.py input.pdf -o output.json
+
+# View quality scores with jq
+cat output.json | jq '.pages[0].code_samples[] | {language, quality_score, is_valid}'
+```
+
+**Output:**
+```json
+{
+  "language": "python",
+  "quality_score": 8.5,
+  "is_valid": true
+}
+{
+  "language": "javascript",
+  "quality_score": 6.2,
+  "is_valid": true
+}
+{
+  "language": "unknown",
+  "quality_score": 2.1,
+  "is_valid": false
+}
+```
+
+---
+
+## Technical Implementation
+
+### Language Detection with Confidence
+
+```python
+def detect_language_from_code(self, code):
+    """Enhanced with weighted pattern matching"""
+
+    patterns = {
+        'python': [
+            (r'\bdef\s+\w+\s*\(', 3),  # Weight: 3
+            (r'\bimport\s+\w+', 2),     # Weight: 2
+            (r':\s*$', 1),              # Weight: 1
+        ],
+        # ... other languages
+    }
+
+    # Calculate scores for each language
+    scores = {}
+    for lang, lang_patterns in patterns.items():
+        score = 0
+        for pattern, weight in lang_patterns:
+            if re.search(pattern, code, re.IGNORECASE | re.MULTILINE):
+                score += weight
+        if score > 0:
+            scores[lang] = score
+
+    # Get best match
+    best_lang = max(scores, key=scores.get)
+    confidence = min(scores[best_lang] / 10.0, 1.0)
+
+    return best_lang, confidence
+```
+
+### Syntax Validation
+
+```python
+def validate_code_syntax(self, code, language):
+    """Validate code syntax"""
+    issues = []
+
+    if language == 'python':
+        # Check indentation consistency
+        indent_chars = set()
+        for line in code.split('\n'):
+            if line.startswith(' '):
+                indent_chars.add('space')
+            elif line.startswith('\t'):
+                indent_chars.add('tab')
+
+        if len(indent_chars) > 1:
+            issues.append('Mixed tabs and spaces')
+
+        # Check balanced brackets
+        open_count = code.count('(') + code.count('[') + code.count('{')
+        close_count = code.count(')') + code.count(']') + code.count('}')
+        if abs(open_count - close_count) > 2:
+            issues.append('Unbalanced brackets')
+
+    # Check if it's actually natural language
+    common_words = ['the', 'and', 'for', 'with', 'this', 'that']
+    word_count = sum(1 for word in common_words if word in code.lower())
+    if word_count > 5:
+        issues.append('May be natural language, not code')
+
+    return len(issues) == 0, issues
+```
+
+### Quality Scoring
+
+```python
+def score_code_quality(self, code, language, confidence):
+    """Score code quality (0-10)"""
+    score = 5.0  # Neutral baseline
+
+    # Factor 1: Language confidence
+    score += confidence * 2.0
+
+    # Factor 2: Code length (optimal range)
+    code_length = len(code.strip())
+    if 20 <= code_length <= 500:
+        score += 1.0
+
+    # Factor 3: Has function/class definitions
+    if re.search(r'\b(def|function|class|func)\b', code):
+        score += 1.5
+
+    # Factor 4: Meaningful variable names
+    meaningful_vars = re.findall(r'\b[a-z_][a-z0-9_]{3,}\b', code.lower())
+    if len(meaningful_vars) >= 2:
+        score += 1.0
+
+    # Factor 5: Syntax validation
+    is_valid, issues = self.validate_code_syntax(code, language)
+    if is_valid:
+        score += 1.0
+    else:
+        score -= len(issues) * 0.5
+
+    return max(0, min(10, score))  # Clamp to 0-10
+```
+
+---
+
+## Performance Impact
+
+### Overhead Analysis
+
+| Operation | Time per page | Impact |
+|-----------|---------------|--------|
+| Confidence scoring | +0.2ms | Negligible |
+| Syntax validation | +0.5ms | Negligible |
+| Quality scoring | +0.3ms | Negligible |
+| **Total overhead** | **+1.0ms** | **<2%** |
+
+**Benchmark:**
+- Small PDF (10 pages): +10ms total (~1% overhead)
+- Medium PDF (100 pages): +100ms total (~2% overhead)
+- Large PDF (500 pages): +500ms total (~2% overhead)
+
+### Memory Usage
+
+- Quality metadata adds ~200 bytes per code block
+- Statistics add ~500 bytes to output
+- **Impact:** Negligible (<1% increase)
+
+---
+
+## Comparison: Before vs After
+
+| Metric | Before (B1.3) | After (B1.4) | Improvement |
+|--------|---------------|--------------|-------------|
+| Language detection | Single return | Lang + confidence | ✅ More reliable |
+| Syntax validation | None | Multiple checks | ✅ Filters false positives |
+| Quality scoring | None | 0-10 scale | ✅ Ranks code blocks |
+| False positives | ~15-20% | ~3-5% | ✅ 75% reduction |
+| Code quality avg | Unknown | Measurable | ✅ Trackable |
+| Filtering | None | Automatic | ✅ Cleaner output |
+
+---
+
+## Testing
+
+### Test Quality Scoring
+
+```bash
+# Create test PDF with various code qualities
+# - High-quality: Complete function with meaningful names
+# - Medium-quality: Simple variable assignments
+# - Low-quality: Natural language text
+
+python3 cli/pdf_extractor_poc.py test.pdf -o test.json -v
+
+# Check quality scores
+cat test.json | jq '.pages[].code_samples[] | {language, quality_score}'
+```
+
+**Expected Results:**
+```json
+{"language": "python", "quality_score": 8.5}
+{"language": "javascript", "quality_score": 6.2}
+{"language": "unknown", "quality_score": 1.8}
+```
+
+### Test Validation
+
+```bash
+# Check validation results
+cat test.json | jq '.pages[].code_samples[] | select(.is_valid == false)'
+```
+
+**Should show:**
+- Empty code blocks
+- Natural language misdetected as code
+- Code with severe syntax errors
+
+### Test Filtering
+
+```bash
+# Extract with different quality thresholds
+python3 cli/pdf_extractor_poc.py test.pdf --min-quality 7.0 -o high_quality.json
+python3 cli/pdf_extractor_poc.py test.pdf --min-quality 4.0 -o medium_quality.json
+python3 cli/pdf_extractor_poc.py test.pdf --min-quality 0.0 -o all_quality.json
+
+# Compare counts
+echo "High quality:"; cat high_quality.json | jq '[.pages[].code_samples[]] | length'
+echo "Medium+:"; cat medium_quality.json | jq '[.pages[].code_samples[]] | length'
+echo "All:"; cat all_quality.json | jq '[.pages[].code_samples[]] | length'
+```
+
+---
+
+## Limitations
+
+### Current Limitations
+
+1. **Validation is heuristic-based**
+   - No AST parsing (yet)
+   - Some edge cases may be missed
+   - Language-specific validation only for Python, JS, Java, C
+
+2. **Quality scoring is subjective**
+   - Based on heuristics, not compilation
+   - May not match human judgment perfectly
+   - Tuned for documentation examples, not production code
+
+3. **Confidence scoring is pattern-based**
+   - No machine learning
+   - Limited to defined patterns
+   - May struggle with uncommon languages
+
+### Known Issues
+
+1. **Short Code Snippets**
+   - May score lower than deserved
+   - Example: `x = 5` is valid but scores low
+
+2. **Comments-Heavy Code**
+   - Well-commented code may be penalized
+   - Workaround: Adjust comment ratio threshold
+
+3. **Domain-Specific Languages**
+   - Not covered by pattern detection
+   - Will be marked as 'unknown'
+
+---
+
+## Future Enhancements
+
+### Potential Improvements
+
+1. **AST-Based Validation**
+   - Use Python's `ast` module for Python code
+   - Use esprima/acorn for JavaScript
+   - Actual syntax parsing instead of heuristics
+
+2. **Machine Learning Detection**
+   - Train classifier on code vs non-code
+   - More accurate language detection
+   - Context-aware quality scoring
+
+3. **Custom Quality Metrics**
+   - User-defined quality factors
+   - Domain-specific scoring
+   - Configurable weights
+
+4. **More Language Support**
+   - Add TypeScript, Dart, Lua, etc.
+   - Better pattern coverage
+   - Language-specific validation
+
+---
+
+## Integration with Skill Seeker
+
+### Improved Skill Quality
+
+With B1.4 enhancements, PDF-based skills will have:
+
+1. **Higher quality code examples**
+   - Automatic filtering of noise
+   - Only meaningful snippets included
+
+2. **Better categorization**
+   - Confidence scores help categorization
+   - Language-specific references
+
+3. **Validation feedback**
+   - Know which code blocks may have issues
+   - Fix before packaging skill
+
+### Example Workflow
+
+```bash
+# Step 1: Extract with high-quality filter
+python3 cli/pdf_extractor_poc.py manual.pdf --min-quality 7.0 -o manual.json -v
+
+# Step 2: Review quality statistics
+cat manual.json | jq '.quality_statistics'
+
+# Step 3: Inspect any invalid blocks
+cat manual.json | jq '.pages[].code_samples[] | select(.is_valid == false)'
+
+# Step 4: Build skill (future task B1.6)
+python3 cli/pdf_scraper.py --from-json manual.json
+```
+
+---
+
+## Conclusion
+
+Task B1.4 successfully implements:
+- ✅ Confidence-based language detection
+- ✅ Syntax validation for common languages
+- ✅ Quality scoring (0-10 scale)
+- ✅ Automatic quality filtering
+- ✅ Comprehensive quality statistics
+
+**Impact:**
+- 75% reduction in false positives
+- More reliable code extraction
+- Better skill quality
+- Measurable code quality metrics
+
+**Performance:** <2% overhead (negligible)
+
+**Compatibility:** Backward compatible (existing fields preserved)
+
+**Ready for B1.5:** Image extraction from PDFs
+
+---
+
+**Task Completed:** October 21, 2025
+**Next Task:** B1.5 - Add PDF image extraction (diagrams, screenshots)

+ 94 - 0
libs/external/Skill_Seekers-development/docs/TERMINAL_SELECTION.md

@@ -0,0 +1,94 @@
+# Terminal Selection Guide
+
+When using `--enhance-local`, Skill Seeker opens a new terminal window to run Claude Code. This guide explains how to control which terminal app is used.
+
+## Priority Order
+
+The script automatically detects which terminal to use in this order:
+
+1. **`SKILL_SEEKER_TERMINAL` environment variable** (highest priority)
+2. **`TERM_PROGRAM` environment variable** (inherit current terminal)
+3. **Terminal.app** (fallback default)
+
+## Setting Your Preferred Terminal
+
+### Option 1: Set Environment Variable (Recommended)
+
+Add this to your shell config (`~/.zshrc` or `~/.bashrc`):
+
+```bash
+# For Ghostty users
+export SKILL_SEEKER_TERMINAL="Ghostty"
+
+# For iTerm users
+export SKILL_SEEKER_TERMINAL="iTerm"
+
+# For WezTerm users
+export SKILL_SEEKER_TERMINAL="WezTerm"
+```
+
+Then reload your shell:
+```bash
+source ~/.zshrc  # or source ~/.bashrc
+```
+
+### Option 2: Set Per-Session
+
+Set the variable before running the command:
+
+```bash
+SKILL_SEEKER_TERMINAL="Ghostty" python3 cli/doc_scraper.py --config configs/react.json --enhance-local
+```
+
+### Option 3: Inherit Current Terminal (Automatic)
+
+If you run the script from Ghostty, iTerm2, or WezTerm, it will automatically open the enhancement in the same terminal app.
+
+**Note:** IDE terminals (VS Code, Zed, JetBrains) use unique `TERM_PROGRAM` values, so they fall back to Terminal.app unless you set `SKILL_SEEKER_TERMINAL`.
+
+## Supported Terminals
+
+- **Ghostty** (`ghostty`)
+- **iTerm2** (`iTerm.app`)
+- **Terminal.app** (`Apple_Terminal`)
+- **WezTerm** (`WezTerm`)
+
+## Example Output
+
+When terminal detection works:
+```
+🚀 Launching Claude Code in new terminal...
+   Using terminal: Ghostty (from SKILL_SEEKER_TERMINAL)
+```
+
+When running from an IDE terminal:
+```
+🚀 Launching Claude Code in new terminal...
+⚠️  unknown TERM_PROGRAM (zed)
+   → Using Terminal.app as fallback
+```
+
+**Tip:** Set `SKILL_SEEKER_TERMINAL` to avoid the fallback behavior.
+
+## Troubleshooting
+
+**Q: The wrong terminal opens even though I set `SKILL_SEEKER_TERMINAL`**
+
+A: Make sure you reloaded your shell after editing `~/.zshrc`:
+```bash
+source ~/.zshrc
+```
+
+**Q: I want to use a different terminal temporarily**
+
+A: Set the variable inline:
+```bash
+SKILL_SEEKER_TERMINAL="iTerm" python3 cli/doc_scraper.py --enhance-local ...
+```
+
+**Q: Can I use a custom terminal app?**
+
+A: Yes! Just use the app name as it appears in `/Applications/`:
+```bash
+export SKILL_SEEKER_TERMINAL="Alacritty"
+```

+ 716 - 0
libs/external/Skill_Seekers-development/docs/TESTING.md

@@ -0,0 +1,716 @@
+# Testing Guide for Skill Seeker
+
+Comprehensive testing documentation for the Skill Seeker project.
+
+## Quick Start
+
+```bash
+# Run all tests
+python3 run_tests.py
+
+# Run all tests with verbose output
+python3 run_tests.py -v
+
+# Run specific test suite
+python3 run_tests.py --suite config
+python3 run_tests.py --suite features
+python3 run_tests.py --suite integration
+
+# Stop on first failure
+python3 run_tests.py --failfast
+
+# List all available tests
+python3 run_tests.py --list
+```
+
+## Test Structure
+
+```
+tests/
+├── __init__.py                          # Test package marker
+├── test_config_validation.py            # Config validation tests (30+ tests)
+├── test_scraper_features.py             # Core feature tests (25+ tests)
+├── test_integration.py                  # Integration tests (15+ tests)
+├── test_pdf_extractor.py                # PDF extraction tests (23 tests)
+├── test_pdf_scraper.py                  # PDF workflow tests (18 tests)
+└── test_pdf_advanced_features.py        # PDF advanced features (26 tests) NEW
+```
+
+## Test Suites
+
+### 1. Config Validation Tests (`test_config_validation.py`)
+
+Tests the `validate_config()` function with comprehensive coverage.
+
+**Test Categories:**
+- ✅ Valid configurations (minimal and complete)
+- ✅ Missing required fields (`name`, `base_url`)
+- ✅ Invalid name formats (special characters)
+- ✅ Valid name formats (alphanumeric, hyphens, underscores)
+- ✅ Invalid URLs (missing protocol)
+- ✅ Valid URL protocols (http, https)
+- ✅ Selector validation (structure and recommended fields)
+- ✅ URL patterns validation (include/exclude lists)
+- ✅ Categories validation (structure and keywords)
+- ✅ Rate limit validation (range 0-10, type checking)
+- ✅ Max pages validation (range 1-10000, type checking)
+- ✅ Start URLs validation (format and protocol)
+
+**Example Test:**
+```python
+def test_valid_complete_config(self):
+    """Test valid complete configuration"""
+    config = {
+        'name': 'godot',
+        'base_url': 'https://docs.godotengine.org/en/stable/',
+        'selectors': {
+            'main_content': 'div[role="main"]',
+            'title': 'title',
+            'code_blocks': 'pre code'
+        },
+        'rate_limit': 0.5,
+        'max_pages': 500
+    }
+    errors = validate_config(config)
+    self.assertEqual(len(errors), 0)
+```
+
+**Running:**
+```bash
+python3 run_tests.py --suite config -v
+```
+
+---
+
+### 2. Scraper Features Tests (`test_scraper_features.py`)
+
+Tests core scraper functionality including URL validation, language detection, pattern extraction, and categorization.
+
+**Test Categories:**
+
+**URL Validation:**
+- ✅ URL matching include patterns
+- ✅ URL matching exclude patterns
+- ✅ Different domain rejection
+- ✅ No pattern configuration
+
+**Language Detection:**
+- ✅ Detection from CSS classes (`language-*`, `lang-*`)
+- ✅ Detection from parent elements
+- ✅ Python detection (import, from, def)
+- ✅ JavaScript detection (const, let, arrow functions)
+- ✅ GDScript detection (func, var)
+- ✅ C++ detection (#include, int main)
+- ✅ Unknown language fallback
+
+**Pattern Extraction:**
+- ✅ Extraction with "Example:" marker
+- ✅ Extraction with "Usage:" marker
+- ✅ Pattern limit (max 5)
+
+**Categorization:**
+- ✅ Categorization by URL keywords
+- ✅ Categorization by title keywords
+- ✅ Categorization by content keywords
+- ✅ Fallback to "other" category
+- ✅ Empty category removal
+
+**Text Cleaning:**
+- ✅ Multiple spaces normalization
+- ✅ Newline normalization
+- ✅ Tab normalization
+- ✅ Whitespace stripping
+
+**Example Test:**
+```python
+def test_detect_python_from_heuristics(self):
+    """Test Python detection from code content"""
+    html = '<code>import os\nfrom pathlib import Path</code>'
+    elem = BeautifulSoup(html, 'html.parser').find('code')
+    lang = self.converter.detect_language(elem, elem.get_text())
+    self.assertEqual(lang, 'python')
+```
+
+**Running:**
+```bash
+python3 run_tests.py --suite features -v
+```
+
+---
+
+### 3. Integration Tests (`test_integration.py`)
+
+Tests complete workflows and interactions between components.
+
+**Test Categories:**
+
+**Dry-Run Mode:**
+- ✅ No directories created in dry-run mode
+- ✅ Dry-run flag properly set
+- ✅ Normal mode creates directories
+
+**Config Loading:**
+- ✅ Load valid configuration files
+- ✅ Invalid JSON error handling
+- ✅ Nonexistent file error handling
+- ✅ Validation errors during load
+
+**Real Config Validation:**
+- ✅ Godot config validation
+- ✅ React config validation
+- ✅ Vue config validation
+- ✅ Django config validation
+- ✅ FastAPI config validation
+- ✅ Steam Economy config validation
+
+**URL Processing:**
+- ✅ URL normalization
+- ✅ Start URLs fallback to base_url
+- ✅ Multiple start URLs handling
+
+**Content Extraction:**
+- ✅ Empty content handling
+- ✅ Basic content extraction
+- ✅ Code sample extraction with language detection
+
+**Example Test:**
+```python
+def test_dry_run_no_directories_created(self):
+    """Test that dry-run mode doesn't create directories"""
+    converter = DocToSkillConverter(self.config, dry_run=True)
+
+    data_dir = Path(f"output/{self.config['name']}_data")
+    skill_dir = Path(f"output/{self.config['name']}")
+
+    self.assertFalse(data_dir.exists())
+    self.assertFalse(skill_dir.exists())
+```
+
+**Running:**
+```bash
+python3 run_tests.py --suite integration -v
+```
+
+---
+
+### 4. PDF Extraction Tests (`test_pdf_extractor.py`) **NEW**
+
+Tests PDF content extraction functionality (B1.2-B1.5).
+
+**Note:** These tests require PyMuPDF (`pip install PyMuPDF`). They will be skipped if not installed.
+
+**Test Categories:**
+
+**Language Detection (5 tests):**
+- ✅ Python detection with confidence scoring
+- ✅ JavaScript detection with confidence
+- ✅ C++ detection with confidence
+- ✅ Unknown language returns low confidence
+- ✅ Confidence always between 0 and 1
+
+**Syntax Validation (5 tests):**
+- ✅ Valid Python syntax validation
+- ✅ Invalid Python indentation detection
+- ✅ Unbalanced brackets detection
+- ✅ Valid JavaScript syntax validation
+- ✅ Natural language fails validation
+
+**Quality Scoring (4 tests):**
+- ✅ Quality score between 0 and 10
+- ✅ High-quality code gets good score (>7)
+- ✅ Low-quality code gets low score (<4)
+- ✅ Quality considers multiple factors
+
+**Chapter Detection (4 tests):**
+- ✅ Detect chapters with numbers
+- ✅ Detect uppercase chapter headers
+- ✅ Detect section headings (e.g., "2.1")
+- ✅ Normal text not detected as chapter
+
+**Code Block Merging (2 tests):**
+- ✅ Merge code blocks split across pages
+- ✅ Don't merge different languages
+
+**Code Detection Methods (2 tests):**
+- ✅ Pattern-based detection (keywords)
+- ✅ Indent-based detection
+
+**Quality Filtering (1 test):**
+- ✅ Filter by minimum quality threshold
+
+**Example Test:**
+```python
+def test_detect_python_with_confidence(self):
+    """Test Python detection returns language and confidence"""
+    extractor = self.PDFExtractor.__new__(self.PDFExtractor)
+    code = "def hello():\n    print('world')\n    return True"
+
+    language, confidence = extractor.detect_language_from_code(code)
+
+    self.assertEqual(language, "python")
+    self.assertGreater(confidence, 0.7)
+    self.assertLessEqual(confidence, 1.0)
+```
+
+**Running:**
+```bash
+python3 -m pytest tests/test_pdf_extractor.py -v
+```
+
+---
+
+### 5. PDF Workflow Tests (`test_pdf_scraper.py`) **NEW**
+
+Tests PDF to skill conversion workflow (B1.6).
+
+**Note:** These tests require PyMuPDF (`pip install PyMuPDF`). They will be skipped if not installed.
+
+**Test Categories:**
+
+**PDFToSkillConverter (3 tests):**
+- ✅ Initialization with name and PDF path
+- ✅ Initialization with config file
+- ✅ Requires name or config_path
+
+**Categorization (3 tests):**
+- ✅ Categorize by keywords
+- ✅ Categorize by chapters
+- ✅ Handle missing chapters
+
+**Skill Building (3 tests):**
+- ✅ Create required directory structure
+- ✅ Create SKILL.md with metadata
+- ✅ Create reference files for categories
+
+**Code Block Handling (2 tests):**
+- ✅ Include code blocks in references
+- ✅ Prefer high-quality code
+
+**Image Handling (2 tests):**
+- ✅ Save images to assets directory
+- ✅ Reference images in markdown
+
+**Error Handling (3 tests):**
+- ✅ Handle missing PDF files
+- ✅ Handle invalid config JSON
+- ✅ Handle missing required config fields
+
+**JSON Workflow (2 tests):**
+- ✅ Load from extracted JSON
+- ✅ Build from JSON without extraction
+
+**Example Test:**
+```python
+def test_build_skill_creates_structure(self):
+    """Test that build_skill creates required directory structure"""
+    converter = self.PDFToSkillConverter(
+        name="test_skill",
+        pdf_path="test.pdf",
+        output_dir=self.temp_dir
+    )
+
+    converter.extracted_data = {
+        "pages": [{"page_number": 1, "text": "Test", "code_blocks": [], "images": []}],
+        "total_pages": 1
+    }
+    converter.categories = {"test": [converter.extracted_data["pages"][0]]}
+
+    converter.build_skill()
+
+    skill_dir = Path(self.temp_dir) / "test_skill"
+    self.assertTrue(skill_dir.exists())
+    self.assertTrue((skill_dir / "references").exists())
+    self.assertTrue((skill_dir / "scripts").exists())
+    self.assertTrue((skill_dir / "assets").exists())
+```
+
+**Running:**
+```bash
+python3 -m pytest tests/test_pdf_scraper.py -v
+```
+
+---
+
+### 6. PDF Advanced Features Tests (`test_pdf_advanced_features.py`) **NEW**
+
+Tests advanced PDF features (Priority 2 & 3).
+
+**Note:** These tests require PyMuPDF (`pip install PyMuPDF`). OCR tests also require pytesseract and Pillow. They will be skipped if not installed.
+
+**Test Categories:**
+
+**OCR Support (5 tests):**
+- ✅ OCR flag initialization
+- ✅ OCR disabled behavior
+- ✅ OCR only triggers for minimal text
+- ✅ Warning when pytesseract unavailable
+- ✅ OCR extraction triggered correctly
+
+**Password Protection (4 tests):**
+- ✅ Password parameter initialization
+- ✅ Encrypted PDF detection
+- ✅ Wrong password handling
+- ✅ Missing password error
+
+**Table Extraction (5 tests):**
+- ✅ Table extraction flag initialization
+- ✅ No extraction when disabled
+- ✅ Basic table extraction
+- ✅ Multiple tables per page
+- ✅ Error handling during extraction
+
+**Caching (5 tests):**
+- ✅ Cache initialization
+- ✅ Set and get cached values
+- ✅ Cache miss returns None
+- ✅ Caching can be disabled
+- ✅ Cache overwrite
+
+**Parallel Processing (4 tests):**
+- ✅ Parallel flag initialization
+- ✅ Disabled by default
+- ✅ Worker count auto-detection
+- ✅ Custom worker count
+
+**Integration (3 tests):**
+- ✅ Full initialization with all features
+- ✅ Various feature combinations
+- ✅ Page data includes tables
+
+**Example Test:**
+```python
+def test_table_extraction_basic(self):
+    """Test basic table extraction"""
+    extractor = self.PDFExtractor.__new__(self.PDFExtractor)
+    extractor.extract_tables = True
+    extractor.verbose = False
+
+    # Create mock table
+    mock_table = Mock()
+    mock_table.extract.return_value = [
+        ["Header 1", "Header 2", "Header 3"],
+        ["Data 1", "Data 2", "Data 3"]
+    ]
+    mock_table.bbox = (0, 0, 100, 100)
+
+    mock_tables = Mock()
+    mock_tables.tables = [mock_table]
+
+    mock_page = Mock()
+    mock_page.find_tables.return_value = mock_tables
+
+    tables = extractor.extract_tables_from_page(mock_page)
+
+    self.assertEqual(len(tables), 1)
+    self.assertEqual(tables[0]['row_count'], 2)
+    self.assertEqual(tables[0]['col_count'], 3)
+```
+
+**Running:**
+```bash
+python3 -m pytest tests/test_pdf_advanced_features.py -v
+```
+
+---
+
+## Test Runner Features
+
+The custom test runner (`run_tests.py`) provides:
+
+### Colored Output
+- 🟢 Green for passing tests
+- 🔴 Red for failures and errors
+- 🟡 Yellow for skipped tests
+
+### Detailed Summary
+```
+======================================================================
+TEST SUMMARY
+======================================================================
+
+Total Tests: 70
+✓ Passed: 68
+✗ Failed: 2
+⊘ Skipped: 0
+
+Success Rate: 97.1%
+
+Test Breakdown by Category:
+  TestConfigValidation: 28/30 passed
+  TestURLValidation: 6/6 passed
+  TestLanguageDetection: 10/10 passed
+  TestPatternExtraction: 3/3 passed
+  TestCategorization: 5/5 passed
+  TestDryRunMode: 3/3 passed
+  TestConfigLoading: 4/4 passed
+  TestRealConfigFiles: 6/6 passed
+  TestContentExtraction: 3/3 passed
+
+======================================================================
+```
+
+### Command-Line Options
+
+```bash
+# Verbose output (show each test name)
+python3 run_tests.py -v
+
+# Quiet output (minimal)
+python3 run_tests.py -q
+
+# Stop on first failure
+python3 run_tests.py --failfast
+
+# Run specific suite
+python3 run_tests.py --suite config
+
+# List all tests
+python3 run_tests.py --list
+```
+
+---
+
+## Running Individual Tests
+
+### Run Single Test File
+```bash
+python3 -m unittest tests.test_config_validation
+python3 -m unittest tests.test_scraper_features
+python3 -m unittest tests.test_integration
+```
+
+### Run Single Test Class
+```bash
+python3 -m unittest tests.test_config_validation.TestConfigValidation
+python3 -m unittest tests.test_scraper_features.TestLanguageDetection
+```
+
+### Run Single Test Method
+```bash
+python3 -m unittest tests.test_config_validation.TestConfigValidation.test_valid_complete_config
+python3 -m unittest tests.test_scraper_features.TestLanguageDetection.test_detect_python_from_heuristics
+```
+
+---
+
+## Test Coverage
+
+### Current Coverage
+
+| Component | Tests | Coverage |
+|-----------|-------|----------|
+| Config Validation | 30+ | 100% |
+| URL Validation | 6 | 95% |
+| Language Detection | 10 | 90% |
+| Pattern Extraction | 3 | 85% |
+| Categorization | 5 | 90% |
+| Text Cleaning | 4 | 100% |
+| Dry-Run Mode | 3 | 100% |
+| Config Loading | 4 | 95% |
+| Real Configs | 6 | 100% |
+| Content Extraction | 3 | 80% |
+| **PDF Extraction** | **23** | **90%** |
+| **PDF Workflow** | **18** | **85%** |
+| **PDF Advanced Features** | **26** | **95%** |
+
+**Total: 142 tests (75 passing + 67 PDF tests)**
+
+**Note:** PDF tests (67 total) require PyMuPDF and will be skipped if not installed. When PyMuPDF is available, all 142 tests run.
+
+### Not Yet Covered
+- Network operations (actual scraping)
+- Enhancement scripts (`enhance_skill.py`, `enhance_skill_local.py`)
+- Package creation (`package_skill.py`)
+- Interactive mode
+- SKILL.md generation
+- Reference file creation
+- PDF extraction with real PDF files (tests use mocked data)
+
+---
+
+## Writing New Tests
+
+### Test Template
+
+```python
+#!/usr/bin/env python3
+"""
+Test suite for [feature name]
+Tests [description of what's being tested]
+"""
+
+import sys
+import os
+import unittest
+
+# Add parent directory to path
+sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+
+from doc_scraper import DocToSkillConverter
+
+
+class TestYourFeature(unittest.TestCase):
+    """Test [feature] functionality"""
+
+    def setUp(self):
+        """Set up test fixtures"""
+        self.config = {
+            'name': 'test',
+            'base_url': 'https://example.com/',
+            'selectors': {
+                'main_content': 'article',
+                'title': 'h1',
+                'code_blocks': 'pre code'
+            },
+            'rate_limit': 0.1,
+            'max_pages': 10
+        }
+        self.converter = DocToSkillConverter(self.config, dry_run=True)
+
+    def tearDown(self):
+        """Clean up after tests"""
+        pass
+
+    def test_your_feature(self):
+        """Test description"""
+        # Arrange
+        test_input = "something"
+
+        # Act
+        result = self.converter.some_method(test_input)
+
+        # Assert
+        self.assertEqual(result, expected_value)
+
+
+if __name__ == '__main__':
+    unittest.main()
+```
+
+### Best Practices
+
+1. **Use descriptive test names**: `test_valid_name_formats` not `test1`
+2. **Follow AAA pattern**: Arrange, Act, Assert
+3. **One assertion per test** when possible
+4. **Test edge cases**: empty inputs, invalid inputs, boundary values
+5. **Use setUp/tearDown**: for common initialization and cleanup
+6. **Mock external dependencies**: don't make real network calls
+7. **Keep tests independent**: tests should not depend on each other
+8. **Use dry_run=True**: for converter tests to avoid file creation
+
+---
+
+## Continuous Integration
+
+### GitHub Actions (Future)
+
+```yaml
+name: Tests
+
+on: [push, pull_request]
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v2
+      - uses: actions/setup-python@v2
+        with:
+          python-version: '3.7'
+      - run: pip install requests beautifulsoup4
+      - run: python3 run_tests.py
+```
+
+---
+
+## Troubleshooting
+
+### Tests Fail with Import Errors
+```bash
+# Make sure you're in the repository root
+cd /path/to/Skill_Seekers
+
+# Run tests from root directory
+python3 run_tests.py
+```
+
+### Tests Create Output Directories
+```bash
+# Clean up test artifacts
+rm -rf output/test-*
+
+# Make sure tests use dry_run=True
+# Check test setUp methods
+```
+
+### Specific Test Keeps Failing
+```bash
+# Run only that test with verbose output
+python3 -m unittest tests.test_config_validation.TestConfigValidation.test_name -v
+
+# Check the error message carefully
+# Verify test expectations match implementation
+```
+
+---
+
+## Performance
+
+Test execution times:
+- **Config Validation**: ~0.1 seconds (30 tests)
+- **Scraper Features**: ~0.3 seconds (25 tests)
+- **Integration Tests**: ~0.5 seconds (15 tests)
+- **Total**: ~1 second (70 tests)
+
+---
+
+## Contributing Tests
+
+When adding new features:
+
+1. Write tests **before** implementing the feature (TDD)
+2. Ensure tests cover:
+   - ✅ Happy path (valid inputs)
+   - ✅ Edge cases (empty, null, boundary values)
+   - ✅ Error cases (invalid inputs)
+3. Run tests before committing:
+   ```bash
+   python3 run_tests.py
+   ```
+4. Aim for >80% coverage for new code
+
+---
+
+## Additional Resources
+
+- **unittest documentation**: https://docs.python.org/3/library/unittest.html
+- **pytest** (alternative): https://pytest.org/ (more powerful, but requires installation)
+- **Test-Driven Development**: https://en.wikipedia.org/wiki/Test-driven_development
+
+---
+
+## Summary
+
+✅ **142 comprehensive tests** covering all major features (75 + 67 PDF)
+✅ **PDF support testing** with 67 tests for B1 tasks + Priority 2 & 3
+✅ **Colored test runner** with detailed summaries
+✅ **Fast execution** (~1 second for full suite)
+✅ **Easy to extend** with clear patterns and templates
+✅ **Good coverage** of critical paths
+
+**PDF Tests Status:**
+- 23 tests for PDF extraction (language detection, syntax validation, quality scoring, chapter detection)
+- 18 tests for PDF workflow (initialization, categorization, skill building, code/image handling)
+- **26 tests for advanced features (OCR, passwords, tables, parallel, caching)** NEW!
+- Tests are skipped gracefully when PyMuPDF is not installed
+- Full test coverage when PyMuPDF + optional dependencies are available
+
+**Advanced PDF Features Tested:**
+- ✅ OCR support for scanned PDFs (5 tests)
+- ✅ Password-protected PDFs (4 tests)
+- ✅ Table extraction (5 tests)
+- ✅ Parallel processing (4 tests)
+- ✅ Caching (5 tests)
+- ✅ Integration (3 tests)
+
+Run tests frequently to catch bugs early! 🚀

+ 342 - 0
libs/external/Skill_Seekers-development/docs/TEST_MCP_IN_CLAUDE_CODE.md

@@ -0,0 +1,342 @@
+# Testing MCP Server in Claude Code
+
+This guide shows you how to test the Skill Seeker MCP server **through actual Claude Code** using the MCP protocol (not just Python function calls).
+
+## Important: What We Tested vs What You Need to Test
+
+### What I Tested (Python Direct Calls) ✅
+I tested the MCP server **functions** by calling them directly with Python:
+```python
+await server.list_configs_tool({})
+await server.generate_config_tool({...})
+```
+
+This verified the **code works**, but didn't test the **MCP protocol integration**.
+
+### What You Need to Test (Actual MCP Protocol) 🎯
+You need to test via **Claude Code** using the MCP protocol:
+```
+In Claude Code:
+> List all available configs
+> mcp__skill-seeker__list_configs
+```
+
+This verifies the **full integration** works.
+
+## Setup Instructions
+
+### Step 1: Configure Claude Code
+
+Create the MCP configuration file:
+
+```bash
+# Create config directory
+mkdir -p ~/.config/claude-code
+
+# Create/edit MCP configuration
+nano ~/.config/claude-code/mcp.json
+```
+
+Add this configuration (replace `/path/to/` with your actual path):
+
+```json
+{
+  "mcpServers": {
+    "skill-seeker": {
+      "command": "python3",
+      "args": [
+        "/mnt/1ece809a-2821-4f10-aecb-fcdf34760c0b/Git/Skill_Seekers/skill_seeker_mcp/server.py"
+      ],
+      "cwd": "/mnt/1ece809a-2821-4f10-aecb-fcdf34760c0b/Git/Skill_Seekers"
+    }
+  }
+}
+```
+
+Or use the setup script:
+```bash
+./setup_mcp.sh
+```
+
+### Step 2: Restart Claude Code
+
+**IMPORTANT:** Completely quit and restart Claude Code (don't just close the window).
+
+### Step 3: Verify MCP Server Loaded
+
+In Claude Code, check if the server loaded:
+
+```
+Show me all available MCP tools
+```
+
+You should see 6 tools with the prefix `mcp__skill-seeker__`:
+- `mcp__skill-seeker__list_configs`
+- `mcp__skill-seeker__generate_config`
+- `mcp__skill-seeker__validate_config`
+- `mcp__skill-seeker__estimate_pages`
+- `mcp__skill-seeker__scrape_docs`
+- `mcp__skill-seeker__package_skill`
+
+## Testing All 6 MCP Tools
+
+### Test 1: list_configs
+
+**In Claude Code, type:**
+```
+List all available Skill Seeker configs
+```
+
+**Or explicitly:**
+```
+Use mcp__skill-seeker__list_configs
+```
+
+**Expected Output:**
+```
+📋 Available Configs:
+
+  • django.json
+  • fastapi.json
+  • godot.json
+  • react.json
+  • vue.json
+  ...
+```
+
+### Test 2: generate_config
+
+**In Claude Code, type:**
+```
+Generate a config for Astro documentation at https://docs.astro.build with max 15 pages
+```
+
+**Or explicitly:**
+```
+Use mcp__skill-seeker__generate_config with:
+- name: astro-test
+- url: https://docs.astro.build
+- description: Astro framework testing
+- max_pages: 15
+```
+
+**Expected Output:**
+```
+✅ Config created: configs/astro-test.json
+```
+
+### Test 3: validate_config
+
+**In Claude Code, type:**
+```
+Validate the astro-test config
+```
+
+**Or explicitly:**
+```
+Use mcp__skill-seeker__validate_config for configs/astro-test.json
+```
+
+**Expected Output:**
+```
+✅ Config is valid!
+  Name: astro-test
+  Base URL: https://docs.astro.build
+  Max pages: 15
+```
+
+### Test 4: estimate_pages
+
+**In Claude Code, type:**
+```
+Estimate pages for the astro-test config
+```
+
+**Or explicitly:**
+```
+Use mcp__skill-seeker__estimate_pages for configs/astro-test.json
+```
+
+**Expected Output:**
+```
+📊 ESTIMATION RESULTS
+Estimated Total: ~25 pages
+Recommended max_pages: 75
+```
+
+### Test 5: scrape_docs
+
+**In Claude Code, type:**
+```
+Scrape docs using the astro-test config
+```
+
+**Or explicitly:**
+```
+Use mcp__skill-seeker__scrape_docs with configs/astro-test.json
+```
+
+**Expected Output:**
+```
+✅ Skill built: output/astro-test/
+Scraped X pages
+Created Y categories
+```
+
+### Test 6: package_skill
+
+**In Claude Code, type:**
+```
+Package the astro-test skill
+```
+
+**Or explicitly:**
+```
+Use mcp__skill-seeker__package_skill for output/astro-test/
+```
+
+**Expected Output:**
+```
+✅ Package created: output/astro-test.zip
+Size: X KB
+```
+
+## Complete Workflow Test
+
+Test the entire workflow in Claude Code with natural language:
+
+```
+Step 1:
+> List all available configs
+
+Step 2:
+> Generate config for Svelte at https://svelte.dev/docs with description "Svelte framework" and max 20 pages
+
+Step 3:
+> Validate configs/svelte.json
+
+Step 4:
+> Estimate pages for configs/svelte.json
+
+Step 5:
+> Scrape docs using configs/svelte.json
+
+Step 6:
+> Package skill at output/svelte/
+```
+
+Expected result: `output/svelte.zip` ready to upload to Claude!
+
+## Troubleshooting
+
+### Issue: Tools Not Appearing
+
+**Symptoms:**
+- Claude Code doesn't recognize skill-seeker commands
+- No `mcp__skill-seeker__` tools listed
+
+**Solutions:**
+
+1. Check configuration exists:
+   ```bash
+   cat ~/.config/claude-code/mcp.json
+   ```
+
+2. Verify server can start:
+   ```bash
+   cd /path/to/Skill_Seekers
+   python3 skill_seeker_mcp/server.py
+   # Should start without errors (Ctrl+C to exit)
+   ```
+
+3. Check dependencies installed:
+   ```bash
+   pip3 list | grep mcp
+   # Should show: mcp x.x.x
+   ```
+
+4. Completely restart Claude Code (quit and reopen)
+
+5. Check Claude Code logs:
+   - macOS: `~/Library/Logs/Claude Code/`
+   - Linux: `~/.config/claude-code/logs/`
+
+### Issue: "Permission Denied"
+
+```bash
+chmod +x skill_seeker_mcp/server.py
+```
+
+### Issue: "Module Not Found"
+
+```bash
+pip3 install -r skill_seeker_mcp/requirements.txt
+pip3 install requests beautifulsoup4
+```
+
+## Verification Checklist
+
+Use this checklist to verify MCP integration:
+
+- [ ] Configuration file created at `~/.config/claude-code/mcp.json`
+- [ ] Repository path in config is absolute and correct
+- [ ] Python dependencies installed (`mcp`, `requests`, `beautifulsoup4`)
+- [ ] Server starts without errors when run manually
+- [ ] Claude Code completely restarted (quit and reopened)
+- [ ] Tools appear when asking "show me all MCP tools"
+- [ ] Tools have `mcp__skill-seeker__` prefix
+- [ ] Can list configs successfully
+- [ ] Can generate a test config
+- [ ] Can scrape and package a small skill
+
+## What Makes This Different from My Tests
+
+| What I Tested | What You Should Test |
+|---------------|---------------------|
+| Python function calls | Claude Code MCP protocol |
+| `await server.list_configs_tool({})` | Natural language in Claude Code |
+| Direct Python imports | Full MCP server integration |
+| Validates code works | Validates Claude Code integration |
+| Quick unit testing | Real-world usage testing |
+
+## Success Criteria
+
+✅ **MCP Integration is Working When:**
+
+1. You can ask Claude Code to "list all available configs"
+2. Claude Code responds with the actual config list
+3. You can generate, validate, scrape, and package skills
+4. All through natural language commands in Claude Code
+5. No Python code needed - just conversation!
+
+## Next Steps After Successful Testing
+
+Once MCP integration works:
+
+1. **Create your first skill:**
+   ```
+   > Generate config for TailwindCSS at https://tailwindcss.com/docs
+   > Scrape docs using configs/tailwind.json
+   > Package skill at output/tailwind/
+   ```
+
+2. **Upload to Claude:**
+   - Take the generated `.zip` file
+   - Upload to Claude.ai
+   - Start using your new skill!
+
+3. **Share feedback:**
+   - Report any issues on GitHub
+   - Share successful skills created
+   - Suggest improvements
+
+## Reference
+
+- **Full Setup Guide:** [docs/MCP_SETUP.md](docs/MCP_SETUP.md)
+- **MCP Documentation:** [mcp/README.md](mcp/README.md)
+- **Main README:** [README.md](README.md)
+- **Setup Script:** `./setup_mcp.sh`
+
+---
+
+**Important:** This document is for testing the **actual MCP protocol integration** with Claude Code, not just the Python functions. Make sure you're testing through Claude Code's UI, not Python scripts!

+ 633 - 0
libs/external/Skill_Seekers-development/docs/UNIFIED_SCRAPING.md

@@ -0,0 +1,633 @@
+# Unified Multi-Source Scraping
+
+**Version:** 2.0 (Feature complete as of October 2025)
+
+## Overview
+
+Unified multi-source scraping allows you to combine knowledge from multiple sources into a single comprehensive Claude skill. Instead of choosing between documentation, GitHub repositories, or PDF manuals, you can now extract and intelligently merge information from all of them.
+
+## Why Unified Scraping?
+
+**The Problem**: Documentation and code often drift apart over time. Official docs might be outdated, missing features that exist in code, or documenting features that have been removed. Separately scraping docs and code creates two incomplete skills.
+
+**The Solution**: Unified scraping:
+- Extracts information from multiple sources (documentation, GitHub, PDFs)
+- **Detects conflicts** between documentation and actual code implementation
+- **Intelligently merges** conflicting information with transparency
+- **Highlights discrepancies** with inline warnings (⚠️)
+- Creates a single, comprehensive skill that shows the complete picture
+
+## Quick Start
+
+### 1. Create a Unified Config
+
+Create a config file with multiple sources:
+
+```json
+{
+  "name": "react",
+  "description": "Complete React knowledge from docs + codebase",
+  "merge_mode": "rule-based",
+  "sources": [
+    {
+      "type": "documentation",
+      "base_url": "https://react.dev/",
+      "extract_api": true,
+      "max_pages": 200
+    },
+    {
+      "type": "github",
+      "repo": "facebook/react",
+      "include_code": true,
+      "code_analysis_depth": "surface",
+      "max_issues": 100
+    }
+  ]
+}
+```
+
+### 2. Scrape and Build
+
+```bash
+python3 cli/unified_scraper.py --config configs/react_unified.json
+```
+
+The tool will:
+1. ✅ **Phase 1**: Scrape all sources (docs + GitHub)
+2. ✅ **Phase 2**: Detect conflicts between sources
+3. ✅ **Phase 3**: Merge conflicts intelligently
+4. ✅ **Phase 4**: Build unified skill with conflict transparency
+
+### 3. Package and Upload
+
+```bash
+python3 cli/package_skill.py output/react/
+```
+
+## Config Format
+
+### Unified Config Structure
+
+```json
+{
+  "name": "skill-name",
+  "description": "When to use this skill",
+  "merge_mode": "rule-based|claude-enhanced",
+  "sources": [
+    {
+      "type": "documentation|github|pdf",
+      ...source-specific fields...
+    }
+  ]
+}
+```
+
+### Documentation Source
+
+```json
+{
+  "type": "documentation",
+  "base_url": "https://docs.example.com/",
+  "extract_api": true,
+  "selectors": {
+    "main_content": "article",
+    "title": "h1",
+    "code_blocks": "pre code"
+  },
+  "url_patterns": {
+    "include": [],
+    "exclude": ["/blog/"]
+  },
+  "categories": {
+    "getting_started": ["intro", "tutorial"],
+    "api": ["api", "reference"]
+  },
+  "rate_limit": 0.5,
+  "max_pages": 200
+}
+```
+
+### GitHub Source
+
+```json
+{
+  "type": "github",
+  "repo": "owner/repo",
+  "github_token": "ghp_...",
+  "include_issues": true,
+  "max_issues": 100,
+  "include_changelog": true,
+  "include_releases": true,
+  "include_code": true,
+  "code_analysis_depth": "surface|deep|full",
+  "file_patterns": [
+    "src/**/*.js",
+    "lib/**/*.ts"
+  ]
+}
+```
+
+**Code Analysis Depth**:
+- `surface` (default): Basic structure, no code analysis
+- `deep`: Extract class/function signatures, parameters, return types
+- `full`: Complete AST analysis (expensive)
+
+### PDF Source
+
+```json
+{
+  "type": "pdf",
+  "path": "/path/to/manual.pdf",
+  "extract_tables": false,
+  "ocr": false,
+  "password": "optional-password"
+}
+```
+
+## Conflict Detection
+
+The unified scraper automatically detects 4 types of conflicts:
+
+### 1. Missing in Documentation
+
+**Severity**: Medium
+**Description**: API exists in code but is not documented
+
+**Example**:
+```python
+# Code has this method:
+def move_local_x(self, delta: float, snap: bool = False) -> None:
+    """Move node along local X axis"""
+
+# But documentation doesn't mention it
+```
+
+**Suggestion**: Add documentation for this API
+
+### 2. Missing in Code
+
+**Severity**: High
+**Description**: API is documented but not found in codebase
+
+**Example**:
+```python
+# Docs say:
+def rotate(angle: float) -> None
+
+# But code doesn't have this function
+```
+
+**Suggestion**: Update documentation to remove this API, or add it to codebase
+
+### 3. Signature Mismatch
+
+**Severity**: Medium-High
+**Description**: API exists in both but signatures differ
+
+**Example**:
+```python
+# Docs say:
+def move_local_x(delta: float)
+
+# Code has:
+def move_local_x(delta: float, snap: bool = False)
+```
+
+**Suggestion**: Update documentation to match actual signature
+
+### 4. Description Mismatch
+
+**Severity**: Low
+**Description**: Different descriptions/docstrings
+
+## Merge Modes
+
+### Rule-Based Merge (Default)
+
+Fast, deterministic merging using predefined rules:
+
+1. **If API only in docs** → Include with `[DOCS_ONLY]` tag
+2. **If API only in code** → Include with `[UNDOCUMENTED]` tag
+3. **If both match perfectly** → Include normally
+4. **If conflict exists** → Prefer code signature, keep docs description
+
+**When to use**:
+- Fast merging (< 1 second)
+- Automated workflows
+- You don't need human oversight
+
+**Example**:
+```bash
+python3 cli/unified_scraper.py --config config.json --merge-mode rule-based
+```
+
+### Claude-Enhanced Merge
+
+AI-powered reconciliation using local Claude Code:
+
+1. Opens new terminal with Claude Code
+2. Provides conflict context and instructions
+3. Claude analyzes and creates reconciled API reference
+4. Human can review and adjust before finalizing
+
+**When to use**:
+- Complex conflicts requiring judgment
+- You want highest quality merge
+- You have time for human oversight
+
+**Example**:
+```bash
+python3 cli/unified_scraper.py --config config.json --merge-mode claude-enhanced
+```
+
+## Skill Output Structure
+
+The unified scraper creates this structure:
+
+```
+output/skill-name/
+├── SKILL.md                     # Main skill file with merged APIs
+├── references/
+│   ├── documentation/           # Documentation references
+│   │   └── index.md
+│   ├── github/                  # GitHub references
+│   │   ├── README.md
+│   │   ├── issues.md
+│   │   └── releases.md
+│   ├── pdf/                     # PDF references (if applicable)
+│   │   └── index.md
+│   ├── api/                     # Merged API reference
+│   │   └── merged_api.md
+│   └── conflicts.md             # Detailed conflict report
+├── scripts/                     # Empty (for user scripts)
+└── assets/                      # Empty (for user assets)
+```
+
+### SKILL.md Format
+
+```markdown
+# React
+
+Complete React knowledge base combining official documentation and React codebase insights.
+
+## 📚 Sources
+
+This skill combines knowledge from multiple sources:
+
+- ✅ **Documentation**: https://react.dev/
+  - Pages: 200
+- ✅ **GitHub Repository**: facebook/react
+  - Code Analysis: surface
+  - Issues: 100
+
+## ⚠️ Data Quality
+
+**5 conflicts detected** between sources.
+
+**Conflict Breakdown:**
+- missing_in_docs: 3
+- missing_in_code: 2
+
+See `references/conflicts.md` for detailed conflict information.
+
+## 🔧 API Reference
+
+*Merged from documentation and code analysis*
+
+### ✅ Verified APIs
+
+*Documentation and code agree*
+
+#### `useState(initialValue)`
+
+...
+
+### ⚠️ APIs with Conflicts
+
+*Documentation and code differ*
+
+#### `useEffect(callback, deps?)`
+
+⚠️ **Conflict**: Documentation signature differs from code implementation
+
+**Documentation says:**
+```
+useEffect(callback: () => void, deps: any[])
+```
+
+**Code implementation:**
+```
+useEffect(callback: () => void | (() => void), deps?: readonly any[])
+```
+
+*Source: both*
+
+---
+```
+
+## Examples
+
+### Example 1: React (Docs + GitHub)
+
+```json
+{
+  "name": "react",
+  "description": "Complete React framework knowledge",
+  "merge_mode": "rule-based",
+  "sources": [
+    {
+      "type": "documentation",
+      "base_url": "https://react.dev/",
+      "extract_api": true,
+      "max_pages": 200
+    },
+    {
+      "type": "github",
+      "repo": "facebook/react",
+      "include_code": true,
+      "code_analysis_depth": "surface"
+    }
+  ]
+}
+```
+
+### Example 2: Django (Docs + GitHub)
+
+```json
+{
+  "name": "django",
+  "description": "Complete Django framework knowledge",
+  "merge_mode": "rule-based",
+  "sources": [
+    {
+      "type": "documentation",
+      "base_url": "https://docs.djangoproject.com/en/stable/",
+      "extract_api": true,
+      "max_pages": 300
+    },
+    {
+      "type": "github",
+      "repo": "django/django",
+      "include_code": true,
+      "code_analysis_depth": "deep",
+      "file_patterns": [
+        "django/db/**/*.py",
+        "django/views/**/*.py"
+      ]
+    }
+  ]
+}
+```
+
+### Example 3: Mixed Sources (Docs + GitHub + PDF)
+
+```json
+{
+  "name": "godot",
+  "description": "Complete Godot Engine knowledge",
+  "merge_mode": "claude-enhanced",
+  "sources": [
+    {
+      "type": "documentation",
+      "base_url": "https://docs.godotengine.org/en/stable/",
+      "extract_api": true,
+      "max_pages": 500
+    },
+    {
+      "type": "github",
+      "repo": "godotengine/godot",
+      "include_code": true,
+      "code_analysis_depth": "deep"
+    },
+    {
+      "type": "pdf",
+      "path": "/path/to/godot_manual.pdf",
+      "extract_tables": true
+    }
+  ]
+}
+```
+
+## Command Reference
+
+### Unified Scraper
+
+```bash
+# Basic usage
+python3 cli/unified_scraper.py --config configs/react_unified.json
+
+# Override merge mode
+python3 cli/unified_scraper.py --config configs/react_unified.json --merge-mode claude-enhanced
+
+# Use cached data (skip re-scraping)
+python3 cli/unified_scraper.py --config configs/react_unified.json --skip-scrape
+```
+
+### Validate Config
+
+```bash
+python3 -c "
+import sys
+sys.path.insert(0, 'cli')
+from config_validator import validate_config
+
+validator = validate_config('configs/react_unified.json')
+print(f'Format: {\"Unified\" if validator.is_unified else \"Legacy\"}')
+print(f'Sources: {len(validator.config.get(\"sources\", []))}')
+print(f'Needs API merge: {validator.needs_api_merge()}')
+"
+```
+
+## MCP Integration
+
+The unified scraper is fully integrated with MCP. The `scrape_docs` tool automatically detects unified vs legacy configs and routes to the appropriate scraper.
+
+```python
+# MCP tool usage
+{
+  "name": "scrape_docs",
+  "arguments": {
+    "config_path": "configs/react_unified.json",
+    "merge_mode": "rule-based"  # Optional override
+  }
+}
+```
+
+The tool will:
+1. Auto-detect unified format
+2. Route to `unified_scraper.py`
+3. Apply specified merge mode
+4. Return comprehensive output
+
+## Backward Compatibility
+
+**Legacy configs still work!** The system automatically detects legacy single-source configs and routes to the original `doc_scraper.py`.
+
+```json
+// Legacy config (still works)
+{
+  "name": "react",
+  "base_url": "https://react.dev/",
+  ...
+}
+
+// Automatically detected as legacy format
+// Routes to doc_scraper.py
+```
+
+## Testing
+
+Run integration tests:
+
+```bash
+python3 cli/test_unified_simple.py
+```
+
+Tests validate:
+- ✅ Unified config validation
+- ✅ Backward compatibility with legacy configs
+- ✅ Mixed source type support
+- ✅ Error handling for invalid configs
+
+## Architecture
+
+### Components
+
+1. **config_validator.py**: Validates unified and legacy configs
+2. **code_analyzer.py**: Extracts code signatures at configurable depth
+3. **conflict_detector.py**: Detects API conflicts between sources
+4. **merge_sources.py**: Implements rule-based and Claude-enhanced merging
+5. **unified_scraper.py**: Main orchestrator
+6. **unified_skill_builder.py**: Generates final skill structure
+7. **skill_seeker_mcp/server.py**: MCP integration with auto-detection
+
+### Data Flow
+
+```
+Unified Config
+     ↓
+ConfigValidator (validates format)
+     ↓
+UnifiedScraper.run()
+     ↓
+┌────────────────────────────────────┐
+│ Phase 1: Scrape All Sources        │
+│  - Documentation → doc_scraper     │
+│  - GitHub → github_scraper         │
+│  - PDF → pdf_scraper               │
+└────────────────────────────────────┘
+     ↓
+┌────────────────────────────────────┐
+│ Phase 2: Detect Conflicts          │
+│  - ConflictDetector                │
+│  - Compare docs APIs vs code APIs  │
+│  - Classify by type and severity   │
+└────────────────────────────────────┘
+     ↓
+┌────────────────────────────────────┐
+│ Phase 3: Merge Sources              │
+│  - RuleBasedMerger (fast)          │
+│  - OR ClaudeEnhancedMerger (AI)    │
+│  - Create unified API reference    │
+└────────────────────────────────────┘
+     ↓
+┌────────────────────────────────────┐
+│ Phase 4: Build Skill                │
+│  - UnifiedSkillBuilder             │
+│  - Generate SKILL.md with conflicts│
+│  - Create reference structure      │
+│  - Generate conflicts report       │
+└────────────────────────────────────┘
+     ↓
+Unified Skill (.zip ready)
+```
+
+## Best Practices
+
+### 1. Start with Rule-Based Merge
+
+Rule-based is fast and works well for most cases. Only use Claude-enhanced if you need human oversight.
+
+### 2. Use Surface-Level Code Analysis
+
+`code_analysis_depth: "surface"` is usually sufficient. Deep analysis is expensive and rarely needed.
+
+### 3. Limit GitHub Issues
+
+`max_issues: 100` is a good default. More than 200 issues rarely adds value.
+
+### 4. Be Specific with File Patterns
+
+```json
+"file_patterns": [
+  "src/**/*.js",     // Good: specific paths
+  "lib/**/*.ts"
+]
+
+// Not recommended:
+"file_patterns": ["**/*.js"]  // Too broad, slow
+```
+
+### 5. Monitor Conflict Reports
+
+Always review `references/conflicts.md` to understand discrepancies between sources.
+
+## Troubleshooting
+
+### No Conflicts Detected
+
+**Possible causes**:
+- `extract_api: false` in documentation source
+- `include_code: false` in GitHub source
+- Code analysis found no APIs (check `code_analysis_depth`)
+
+**Solution**: Ensure both sources have API extraction enabled
+
+### Too Many Conflicts
+
+**Possible causes**:
+- Fuzzy matching threshold too strict
+- Documentation uses different naming conventions
+- Old documentation version
+
+**Solution**: Review conflicts manually and adjust merge strategy
+
+### Merge Takes Too Long
+
+**Possible causes**:
+- Using `code_analysis_depth: "full"` (very slow)
+- Too many file patterns
+- Large repository
+
+**Solution**:
+- Use `"surface"` or `"deep"` analysis
+- Narrow file patterns
+- Increase `rate_limit`
+
+## Future Enhancements
+
+Planned features:
+- [ ] Automated conflict resolution strategies
+- [ ] Conflict trend analysis across versions
+- [ ] Multi-version comparison (docs v1 vs v2)
+- [ ] Custom merge rules DSL
+- [ ] Conflict confidence scores
+
+## Support
+
+For issues, questions, or suggestions:
+- GitHub Issues: https://github.com/yusufkaraaslan/Skill_Seekers/issues
+- Documentation: https://github.com/yusufkaraaslan/Skill_Seekers/docs
+
+## Changelog
+
+**v2.0 (October 2025)**: Unified multi-source scraping feature complete
+- ✅ Config validation for unified format
+- ✅ Deep code analysis with AST parsing
+- ✅ Conflict detection (4 types, 3 severity levels)
+- ✅ Rule-based merging
+- ✅ Claude-enhanced merging
+- ✅ Unified skill builder with inline conflict warnings
+- ✅ MCP integration with auto-detection
+- ✅ Backward compatibility with legacy configs
+- ✅ Comprehensive tests and documentation

+ 351 - 0
libs/external/Skill_Seekers-development/docs/UPLOAD_GUIDE.md

@@ -0,0 +1,351 @@
+# How to Upload Skills to Claude
+
+## Quick Answer
+
+**You have 3 options to upload the `.zip` file:**
+
+### Option 1: Automatic Upload (Recommended for CLI)
+
+```bash
+# Set your API key (one-time setup)
+export ANTHROPIC_API_KEY=sk-ant-...
+
+# Package and upload automatically
+python3 cli/package_skill.py output/react/ --upload
+
+# OR upload existing .zip
+python3 cli/upload_skill.py output/react.zip
+```
+
+✅ **Fully automatic** | No manual steps | Requires API key
+
+### Option 2: Manual Upload (No API Key)
+
+```bash
+# Package the skill
+python3 cli/package_skill.py output/react/
+
+# This will:
+# 1. Create output/react.zip
+# 2. Open output/ folder automatically
+# 3. Show clear upload instructions
+
+# Then upload manually to https://claude.ai/skills
+```
+
+✅ **No API key needed** | Works for everyone | Simple
+
+### Option 3: Claude Code MCP (Easiest)
+
+```
+In Claude Code, just say:
+"Package and upload the React skill"
+
+# Automatically packages and uploads!
+```
+
+✅ **Natural language** | Fully automatic | Best UX
+
+---
+
+## What's Inside the Zip?
+
+The `.zip` file contains:
+
+```
+steam-economy.zip
+├── SKILL.md              ← Main skill file (Claude reads this first)
+└── references/           ← Reference documentation
+    ├── index.md          ← Category index
+    ├── api_reference.md  ← API docs
+    ├── pricing.md        ← Pricing docs
+    ├── trading.md        ← Trading docs
+    └── ...               ← Other categorized docs
+```
+
+**Note:** The zip only includes what Claude needs. It excludes:
+- `.backup` files
+- Build artifacts
+- Temporary files
+
+## What Does package_skill.py Do?
+
+The package script:
+
+1. **Finds your skill directory** (e.g., `output/steam-economy/`)
+2. **Validates SKILL.md exists** (required!)
+3. **Creates a .zip file** with the same name
+4. **Includes all files** except backups
+5. **Saves to** `output/` directory
+
+**Example:**
+```bash
+python3 cli/package_skill.py output/steam-economy/
+
+📦 Packaging skill: steam-economy
+   Source: output/steam-economy
+   Output: output/steam-economy.zip
+   + SKILL.md
+   + references/api_reference.md
+   + references/pricing.md
+   + references/trading.md
+   + ...
+
+✅ Package created: output/steam-economy.zip
+   Size: 14,290 bytes (14.0 KB)
+```
+
+## Complete Workflow
+
+### Step 1: Scrape & Build
+```bash
+python3 cli/doc_scraper.py --config configs/steam-economy.json
+```
+
+**Output:**
+- `output/steam-economy_data/` (raw scraped data)
+- `output/steam-economy/` (skill directory)
+
+### Step 2: Enhance (Recommended)
+```bash
+python3 cli/enhance_skill_local.py output/steam-economy/
+```
+
+**What it does:**
+- Analyzes reference files
+- Creates comprehensive SKILL.md
+- Backs up original to SKILL.md.backup
+
+**Output:**
+- `output/steam-economy/SKILL.md` (enhanced)
+- `output/steam-economy/SKILL.md.backup` (original)
+
+### Step 3: Package
+```bash
+python3 cli/package_skill.py output/steam-economy/
+```
+
+**Output:**
+- `output/steam-economy.zip` ← **THIS IS WHAT YOU UPLOAD**
+
+### Step 4: Upload to Claude
+1. Go to Claude (claude.ai)
+2. Click "Add Skill" or skill upload button
+3. Select `output/steam-economy.zip`
+4. Done!
+
+## What Files Are Required?
+
+**Minimum required structure:**
+```
+your-skill/
+└── SKILL.md          ← Required! Claude reads this first
+```
+
+**Recommended structure:**
+```
+your-skill/
+├── SKILL.md          ← Main skill file (required)
+└── references/       ← Reference docs (highly recommended)
+    ├── index.md
+    └── *.md          ← Category files
+```
+
+**Optional (can add manually):**
+```
+your-skill/
+├── SKILL.md
+├── references/
+├── scripts/          ← Helper scripts
+│   └── *.py
+└── assets/           ← Templates, examples
+    └── *.txt
+```
+
+## File Size Limits
+
+The package script shows size after packaging:
+```
+✅ Package created: output/steam-economy.zip
+   Size: 14,290 bytes (14.0 KB)
+```
+
+**Typical sizes:**
+- Small skill: 5-20 KB
+- Medium skill: 20-100 KB
+- Large skill: 100-500 KB
+
+Claude has generous size limits, so most documentation-based skills fit easily.
+
+## Quick Reference
+
+### Package a Skill
+```bash
+python3 cli/package_skill.py output/steam-economy/
+```
+
+### Package Multiple Skills
+```bash
+# Package all skills in output/
+for dir in output/*/; do
+  if [ -f "$dir/SKILL.md" ]; then
+    python3 cli/package_skill.py "$dir"
+  fi
+done
+```
+
+### Check What's in a Zip
+```bash
+unzip -l output/steam-economy.zip
+```
+
+### Test a Packaged Skill Locally
+```bash
+# Extract to temp directory
+mkdir temp-test
+unzip output/steam-economy.zip -d temp-test/
+cat temp-test/SKILL.md
+```
+
+## Troubleshooting
+
+### "SKILL.md not found"
+```bash
+# Make sure you scraped and built first
+python3 cli/doc_scraper.py --config configs/steam-economy.json
+
+# Then package
+python3 cli/package_skill.py output/steam-economy/
+```
+
+### "Directory not found"
+```bash
+# Check what skills are available
+ls output/
+
+# Use correct path
+python3 cli/package_skill.py output/YOUR-SKILL-NAME/
+```
+
+### Zip is Too Large
+Most skills are small, but if yours is large:
+```bash
+# Check size
+ls -lh output/steam-economy.zip
+
+# If needed, check what's taking space
+unzip -l output/steam-economy.zip | sort -k1 -rn | head -20
+```
+
+Reference files are usually small. Large sizes often mean:
+- Many images (skills typically don't need images)
+- Large code examples (these are fine, just be aware)
+
+## What Does Claude Do With the Zip?
+
+When you upload a skill zip:
+
+1. **Claude extracts it**
+2. **Reads SKILL.md first** - This tells Claude:
+   - When to activate this skill
+   - What the skill does
+   - Quick reference examples
+   - How to navigate the references
+3. **Indexes reference files** - Claude can search through:
+   - `references/*.md` files
+   - Find specific APIs, examples, concepts
+4. **Activates automatically** - When you ask about topics matching the skill
+
+## Example: Using the Packaged Skill
+
+After uploading `steam-economy.zip`:
+
+**You ask:** "How do I implement microtransactions in my Steam game?"
+
+**Claude:**
+- Recognizes this matches steam-economy skill
+- Reads SKILL.md for quick reference
+- Searches references/microtransactions.md
+- Provides detailed answer with code examples
+
+## API-Based Automatic Upload
+
+### Setup (One-Time)
+
+```bash
+# Get your API key from https://console.anthropic.com/
+export ANTHROPIC_API_KEY=sk-ant-...
+
+# Add to your shell profile to persist
+echo 'export ANTHROPIC_API_KEY=sk-ant-...' >> ~/.bashrc  # or ~/.zshrc
+```
+
+### Usage
+
+```bash
+# Upload existing .zip
+python3 cli/upload_skill.py output/react.zip
+
+# OR package and upload in one command
+python3 cli/package_skill.py output/react/ --upload
+```
+
+### How It Works
+
+The upload tool uses the Anthropic `/v1/skills` API endpoint to:
+1. Read your .zip file
+2. Authenticate with your API key
+3. Upload to Claude's skill storage
+4. Verify upload success
+
+### Troubleshooting
+
+**"ANTHROPIC_API_KEY not set"**
+```bash
+# Check if set
+echo $ANTHROPIC_API_KEY
+
+# If empty, set it
+export ANTHROPIC_API_KEY=sk-ant-...
+```
+
+**"Authentication failed"**
+- Verify your API key is correct
+- Check https://console.anthropic.com/ for valid keys
+
+**"Upload timed out"**
+- Check your internet connection
+- Try again or use manual upload
+
+**Upload fails with error**
+- Falls back to showing manual upload instructions
+- You can still upload via https://claude.ai/skills
+
+---
+
+## Summary
+
+**What you need to do:**
+
+### With API Key (Automatic):
+1. ✅ Scrape: `python3 cli/doc_scraper.py --config configs/YOUR-CONFIG.json`
+2. ✅ Enhance: `python3 cli/enhance_skill_local.py output/YOUR-SKILL/`
+3. ✅ Package & Upload: `python3 cli/package_skill.py output/YOUR-SKILL/ --upload`
+4. ✅ Done! Skill is live in Claude
+
+### Without API Key (Manual):
+1. ✅ Scrape: `python3 cli/doc_scraper.py --config configs/YOUR-CONFIG.json`
+2. ✅ Enhance: `python3 cli/enhance_skill_local.py output/YOUR-SKILL/`
+3. ✅ Package: `python3 cli/package_skill.py output/YOUR-SKILL/`
+4. ✅ Upload: Go to https://claude.ai/skills and upload the `.zip`
+
+**What you upload:**
+- The `.zip` file from `output/` directory
+- Example: `output/steam-economy.zip`
+
+**What's in the zip:**
+- `SKILL.md` (required)
+- `references/*.md` (recommended)
+- Any scripts/assets you added (optional)
+
+That's it! 🚀

+ 811 - 0
libs/external/Skill_Seekers-development/docs/USAGE.md

@@ -0,0 +1,811 @@
+# Complete Usage Guide for Skill Seeker
+
+Comprehensive reference for all commands, options, and workflows.
+
+## Table of Contents
+
+- [Quick Reference](#quick-reference)
+- [Main Tool: doc_scraper.py](#main-tool-doc_scraperpy)
+- [Estimator: estimate_pages.py](#estimator-estimate_pagespy)
+- [Enhancement Tools](#enhancement-tools)
+- [Packaging Tool](#packaging-tool)
+- [Testing Tools](#testing-tools)
+- [Available Configs](#available-configs)
+- [Common Workflows](#common-workflows)
+- [Troubleshooting](#troubleshooting)
+
+---
+
+## Quick Reference
+
+```bash
+# 1. Estimate pages (fast, 1-2 min)
+python3 cli/estimate_pages.py configs/react.json
+
+# 2. Scrape documentation (20-40 min)
+python3 cli/doc_scraper.py --config configs/react.json
+
+# 3. Enhance with Claude Code (60 sec)
+python3 cli/enhance_skill_local.py output/react/
+
+# 4. Package to .zip (instant)
+python3 cli/package_skill.py output/react/
+
+# 5. Test everything (1 sec)
+python3 cli/run_tests.py
+```
+
+---
+
+## Main Tool: doc_scraper.py
+
+### Full Help
+
+```
+usage: doc_scraper.py [-h] [--interactive] [--config CONFIG] [--name NAME]
+                      [--url URL] [--description DESCRIPTION] [--skip-scrape]
+                      [--dry-run] [--enhance] [--enhance-local]
+                      [--api-key API_KEY]
+
+Convert documentation websites to Claude skills
+
+options:
+  -h, --help            Show this help message and exit
+  --interactive, -i     Interactive configuration mode
+  --config, -c CONFIG   Load configuration from file (e.g., configs/godot.json)
+  --name NAME           Skill name
+  --url URL             Base documentation URL
+  --description, -d DESCRIPTION
+                        Skill description
+  --skip-scrape         Skip scraping, use existing data
+  --dry-run             Preview what will be scraped without actually scraping
+  --enhance             Enhance SKILL.md using Claude API after building
+                        (requires API key)
+  --enhance-local       Enhance SKILL.md using Claude Code in new terminal
+                        (no API key needed)
+  --api-key API_KEY     Anthropic API key for --enhance (or set ANTHROPIC_API_KEY)
+```
+
+### Usage Examples
+
+**1. Use Preset Config (Recommended)**
+```bash
+python3 cli/doc_scraper.py --config configs/godot.json
+python3 cli/doc_scraper.py --config configs/react.json
+python3 cli/doc_scraper.py --config configs/vue.json
+python3 cli/doc_scraper.py --config configs/django.json
+python3 cli/doc_scraper.py --config configs/fastapi.json
+```
+
+**2. Interactive Mode**
+```bash
+python3 cli/doc_scraper.py --interactive
+# Wizard walks you through:
+# - Skill name
+# - Base URL
+# - Description
+# - Selectors (optional)
+# - URL patterns (optional)
+# - Rate limit
+# - Max pages
+```
+
+**3. Quick Mode (Minimal)**
+```bash
+python3 cli/doc_scraper.py \
+  --name react \
+  --url https://react.dev/ \
+  --description "React framework for building UIs"
+```
+
+**4. Dry-Run (Preview)**
+```bash
+python3 cli/doc_scraper.py --config configs/react.json --dry-run
+# Shows what will be scraped without downloading data
+# No directories created
+# Fast validation
+```
+
+**5. Skip Scraping (Use Cached Data)**
+```bash
+python3 cli/doc_scraper.py --config configs/godot.json --skip-scrape
+# Uses existing output/godot_data/
+# Fast rebuild (1-3 minutes)
+# Useful for testing changes
+```
+
+**6. With Local Enhancement**
+```bash
+python3 cli/doc_scraper.py --config configs/react.json --enhance-local
+# Scrapes + enhances in one command
+# Opens new terminal for Claude Code
+# No API key needed
+```
+
+**7. With API Enhancement**
+```bash
+export ANTHROPIC_API_KEY=sk-ant-...
+python3 cli/doc_scraper.py --config configs/react.json --enhance
+
+# Or with inline API key:
+python3 cli/doc_scraper.py --config configs/react.json --enhance --api-key sk-ant-...
+```
+
+### Output Structure
+
+```
+output/
+├── {name}_data/              # Scraped raw data (cached)
+│   ├── pages/
+│   │   ├── page_0.json
+│   │   ├── page_1.json
+│   │   └── ...
+│   └── summary.json          # Scraping stats
+│
+└── {name}/                   # Built skill directory
+    ├── SKILL.md              # Main skill file
+    ├── SKILL.md.backup       # Backup (if enhanced)
+    ├── references/           # Categorized docs
+    │   ├── index.md
+    │   ├── getting_started.md
+    │   ├── api.md
+    │   └── ...
+    ├── scripts/              # Empty (user scripts)
+    └── assets/               # Empty (user assets)
+```
+
+---
+
+## Estimator: estimate_pages.py
+
+### Full Help
+
+```
+usage: estimate_pages.py [-h] [--max-discovery MAX_DISCOVERY]
+                         [--timeout TIMEOUT]
+                         config
+
+Estimate page count for Skill Seeker configs
+
+positional arguments:
+  config                Path to config JSON file
+
+options:
+  -h, --help            Show this help message and exit
+  --max-discovery, -m MAX_DISCOVERY
+                        Maximum pages to discover (default: 1000)
+  --timeout, -t TIMEOUT
+                        HTTP request timeout in seconds (default: 30)
+```
+
+### Usage Examples
+
+**1. Quick Estimate (100 pages)**
+```bash
+python3 cli/estimate_pages.py configs/react.json --max-discovery 100
+# Time: ~30-60 seconds
+# Good for: Quick validation
+```
+
+**2. Standard Estimate (1000 pages - default)**
+```bash
+python3 cli/estimate_pages.py configs/godot.json
+# Time: ~1-2 minutes
+# Good for: Most use cases
+```
+
+**3. Deep Estimate (2000 pages)**
+```bash
+python3 cli/estimate_pages.py configs/vue.json --max-discovery 2000
+# Time: ~3-5 minutes
+# Good for: Large documentation sites
+```
+
+**4. Custom Timeout**
+```bash
+python3 cli/estimate_pages.py configs/django.json --timeout 60
+# Useful for slow servers
+```
+
+### Output Example
+
+```
+🔍 Estimating pages for: react
+📍 Base URL: https://react.dev/
+🎯 Start URLs: 6
+⏱️  Rate limit: 0.5s
+🔢 Max discovery: 1000
+
+⏳ Discovered: 180 pages (1.3 pages/sec)
+
+======================================================================
+📊 ESTIMATION RESULTS
+======================================================================
+
+Config: react
+Base URL: https://react.dev/
+
+✅ Pages Discovered: 180
+⏳ Pages Pending: 50
+📈 Estimated Total: 230
+
+⏱️  Time Elapsed: 140.5s
+⚡ Discovery Rate: 1.28 pages/sec
+
+======================================================================
+💡 RECOMMENDATIONS
+======================================================================
+
+✅ Current max_pages (300) is sufficient
+
+⏱️  Estimated full scrape time: 1.9 minutes
+   (Based on rate_limit: 0.5s)
+```
+
+**What It Shows:**
+- Estimated total pages to scrape
+- Whether current `max_pages` is sufficient
+- Recommended `max_pages` value
+- Estimated scraping time
+- Discovery rate (pages/sec)
+
+---
+
+## Enhancement Tools
+
+### enhance_skill_local.py (Recommended)
+
+**No API key needed - uses Claude Code Max plan**
+
+```bash
+# Usage
+python3 cli/enhance_skill_local.py output/react/
+python3 cli/enhance_skill_local.py output/godot/
+
+# What it does:
+# 1. Reads SKILL.md and references/
+# 2. Opens new terminal with Claude Code
+# 3. Claude enhances SKILL.md
+# 4. Backs up original to SKILL.md.backup
+# 5. Saves enhanced version
+
+# Time: ~60 seconds
+# Cost: Free (uses your Claude Code Max plan)
+```
+
+### enhance_skill.py (Alternative)
+
+**Requires Anthropic API key**
+
+```bash
+# Install dependency first
+pip3 install anthropic
+
+# Usage with environment variable
+export ANTHROPIC_API_KEY=sk-ant-...
+python3 cli/enhance_skill.py output/react/
+
+# Usage with inline API key
+python3 cli/enhance_skill.py output/godot/ --api-key sk-ant-...
+
+# What it does:
+# 1. Reads SKILL.md and references/
+# 2. Calls Claude API (Sonnet 4)
+# 3. Enhances SKILL.md
+# 4. Backs up original to SKILL.md.backup
+# 5. Saves enhanced version
+
+# Time: ~30-60 seconds
+# Cost: ~$0.01-0.10 per skill (depending on size)
+```
+
+---
+
+## Packaging Tool
+
+### package_skill.py
+
+```bash
+# Usage
+python3 cli/package_skill.py output/react/
+python3 cli/package_skill.py output/godot/
+
+# What it does:
+# 1. Validates SKILL.md exists
+# 2. Creates .zip with all skill files
+# 3. Saves to output/{name}.zip
+
+# Output:
+# output/react.zip
+# output/godot.zip
+
+# Time: Instant
+```
+
+---
+
+## Testing Tools
+
+### run_tests.py
+
+```bash
+# Run all tests (default)
+python3 cli/run_tests.py
+# 71 tests, ~1 second
+
+# Verbose output
+python3 cli/run_tests.py -v
+python3 cli/run_tests.py --verbose
+
+# Quiet output
+python3 cli/run_tests.py -q
+python3 cli/run_tests.py --quiet
+
+# Stop on first failure
+python3 cli/run_tests.py -f
+python3 cli/run_tests.py --failfast
+
+# Run specific test suite
+python3 cli/run_tests.py --suite config
+python3 cli/run_tests.py --suite features
+python3 cli/run_tests.py --suite integration
+
+# List all tests
+python3 cli/run_tests.py --list
+```
+
+### Individual Tests
+
+```bash
+# Run single test file
+python3 -m unittest tests.test_config_validation
+python3 -m unittest tests.test_scraper_features
+python3 -m unittest tests.test_integration
+
+# Run single test class
+python3 -m unittest tests.test_config_validation.TestConfigValidation
+
+# Run single test method
+python3 -m unittest tests.test_config_validation.TestConfigValidation.test_valid_complete_config
+```
+
+---
+
+## Available Configs
+
+### Preset Configs (Ready to Use)
+
+| Config | Framework | Pages | Description |
+|--------|-----------|-------|-------------|
+| `godot.json` | Godot Engine | ~500 | Game engine documentation |
+| `react.json` | React | ~300 | React framework docs |
+| `vue.json` | Vue.js | ~250 | Vue.js framework docs |
+| `django.json` | Django | ~400 | Django web framework |
+| `fastapi.json` | FastAPI | ~200 | FastAPI Python framework |
+| `steam-economy-complete.json` | Steam | ~100 | Steam Economy API docs |
+
+### View Config Details
+
+```bash
+# List all configs
+ls configs/
+
+# View config content
+cat configs/react.json
+python3 -m json.tool configs/godot.json
+```
+
+### Config Structure
+
+```json
+{
+  "name": "react",
+  "base_url": "https://react.dev/",
+  "description": "React - JavaScript library for building UIs",
+  "start_urls": [
+    "https://react.dev/learn",
+    "https://react.dev/reference/react",
+    "https://react.dev/reference/react-dom"
+  ],
+  "selectors": {
+    "main_content": "article",
+    "title": "h1",
+    "code_blocks": "pre code"
+  },
+  "url_patterns": {
+    "include": ["/learn/", "/reference/"],
+    "exclude": ["/blog/", "/community/"]
+  },
+  "categories": {
+    "getting_started": ["learn", "tutorial", "intro"],
+    "api": ["reference", "api", "hooks"],
+    "guides": ["guide"]
+  },
+  "rate_limit": 0.5,
+  "max_pages": 300
+}
+```
+
+---
+
+## Common Workflows
+
+### Workflow 1: Use Preset (Fastest)
+
+```bash
+# 1. Estimate (optional, 1-2 min)
+python3 cli/estimate_pages.py configs/react.json
+
+# 2. Scrape with local enhancement (25 min)
+python3 cli/doc_scraper.py --config configs/react.json --enhance-local
+
+# 3. Package (instant)
+python3 cli/package_skill.py output/react/
+
+# Result: output/react.zip
+# Upload to Claude!
+```
+
+### Workflow 2: Custom Documentation
+
+```bash
+# 1. Create config
+cat > configs/my-docs.json << 'EOF'
+{
+  "name": "my-docs",
+  "base_url": "https://docs.example.com/",
+  "description": "My documentation site",
+  "rate_limit": 0.5,
+  "max_pages": 200
+}
+EOF
+
+# 2. Estimate
+python3 cli/estimate_pages.py configs/my-docs.json
+
+# 3. Dry-run test
+python3 cli/doc_scraper.py --config configs/my-docs.json --dry-run
+
+# 4. Full scrape
+python3 cli/doc_scraper.py --config configs/my-docs.json
+
+# 5. Enhance
+python3 cli/enhance_skill_local.py output/my-docs/
+
+# 6. Package
+python3 cli/package_skill.py output/my-docs/
+```
+
+### Workflow 3: Interactive Mode
+
+```bash
+# 1. Start interactive wizard
+python3 cli/doc_scraper.py --interactive
+
+# 2. Answer prompts:
+#    - Name: my-framework
+#    - URL: https://framework.dev/
+#    - Description: My favorite framework
+#    - Selectors: (uses defaults)
+#    - Rate limit: 0.5
+#    - Max pages: 100
+
+# 3. Enhance
+python3 cli/enhance_skill_local.py output/my-framework/
+
+# 4. Package
+python3 cli/package_skill.py output/my-framework/
+```
+
+### Workflow 4: Quick Mode
+
+```bash
+python3 cli/doc_scraper.py \
+  --name vue \
+  --url https://vuejs.org/ \
+  --description "Vue.js framework" \
+  --enhance-local
+```
+
+### Workflow 5: Rebuild from Cache
+
+```bash
+# Already scraped once?
+# Skip re-scraping, just rebuild
+python3 cli/doc_scraper.py --config configs/godot.json --skip-scrape
+
+# Try new enhancement
+python3 cli/enhance_skill_local.py output/godot/
+
+# Re-package
+python3 cli/package_skill.py output/godot/
+```
+
+### Workflow 6: Testing New Config
+
+```bash
+# 1. Create test config with low max_pages
+cat > configs/test.json << 'EOF'
+{
+  "name": "test-site",
+  "base_url": "https://docs.test.com/",
+  "max_pages": 20,
+  "rate_limit": 0.1
+}
+EOF
+
+# 2. Estimate
+python3 cli/estimate_pages.py configs/test.json --max-discovery 50
+
+# 3. Dry-run
+python3 cli/doc_scraper.py --config configs/test.json --dry-run
+
+# 4. Small scrape
+python3 cli/doc_scraper.py --config configs/test.json
+
+# 5. Validate output
+ls output/test-site/
+ls output/test-site/references/
+
+# 6. If good, increase max_pages and re-run
+```
+
+---
+
+## Troubleshooting
+
+### Issue: "Rate limit exceeded"
+
+```bash
+# Increase rate_limit in config
+# Default: 0.5 seconds
+# Conservative: 1.0 seconds
+# Very conservative: 2.0 seconds
+
+# Edit config:
+{
+  "rate_limit": 1.0
+}
+```
+
+### Issue: "Too many pages"
+
+```bash
+# Estimate first
+python3 cli/estimate_pages.py configs/my-config.json
+
+# Set max_pages based on estimate
+# Add buffer: estimated + 50
+
+# Edit config:
+{
+  "max_pages": 350  # for 300 estimated
+}
+```
+
+### Issue: "No content extracted"
+
+```bash
+# Wrong selectors
+# Test selectors manually:
+curl -s https://docs.example.com/ | grep -i 'article\|main\|content'
+
+# Common selectors:
+"main_content": "article"
+"main_content": "main"
+"main_content": ".content"
+"main_content": "#main-content"
+"main_content": "div[role=\"main\"]"
+
+# Update config with correct selector
+```
+
+### Issue: "Tests failing"
+
+```bash
+# Run specific failing test
+python3 -m unittest tests.test_config_validation.TestConfigValidation.test_name -v
+
+# Check error message
+# Verify expectations match implementation
+```
+
+### Issue: "Enhancement fails"
+
+```bash
+# Local enhancement:
+# Make sure Claude Code is running
+# Check terminal output
+
+# API enhancement:
+# Verify API key is set:
+echo $ANTHROPIC_API_KEY
+
+# Or use inline:
+python3 cli/enhance_skill.py output/react/ --api-key sk-ant-...
+```
+
+### Issue: "Package fails"
+
+```bash
+# Verify SKILL.md exists
+ls output/my-skill/SKILL.md
+
+# If missing, build first:
+python3 cli/doc_scraper.py --config configs/my-skill.json --skip-scrape
+```
+
+### Issue: "Can't find output"
+
+```bash
+# Check output directory
+ls output/
+
+# Skill data (cached):
+ls output/{name}_data/
+
+# Built skill:
+ls output/{name}/
+
+# Packaged skill:
+ls output/{name}.zip
+```
+
+---
+
+## Advanced Usage
+
+### Custom Selectors
+
+```json
+{
+  "selectors": {
+    "main_content": "div.documentation",
+    "title": "h1.page-title",
+    "code_blocks": "pre.highlight code",
+    "navigation": "nav.sidebar"
+  }
+}
+```
+
+### URL Pattern Filtering
+
+```json
+{
+  "url_patterns": {
+    "include": [
+      "/docs/",
+      "/guide/",
+      "/api/",
+      "/tutorial/"
+    ],
+    "exclude": [
+      "/blog/",
+      "/news/",
+      "/community/",
+      "/showcase/"
+    ]
+  }
+}
+```
+
+### Custom Categories
+
+```json
+{
+  "categories": {
+    "getting_started": ["intro", "tutorial", "quickstart", "installation"],
+    "core_concepts": ["concept", "fundamental", "architecture"],
+    "api": ["reference", "api", "method", "function"],
+    "guides": ["guide", "how-to", "example"],
+    "advanced": ["advanced", "expert", "performance"]
+  }
+}
+```
+
+### Multiple Start URLs
+
+```json
+{
+  "start_urls": [
+    "https://docs.example.com/getting-started/",
+    "https://docs.example.com/api/",
+    "https://docs.example.com/guides/",
+    "https://docs.example.com/examples/"
+  ]
+}
+```
+
+---
+
+## Performance Tips
+
+1. **Estimate first**: Save 20-40 minutes by validating config
+2. **Use dry-run**: Test selectors before full scrape
+3. **Cache data**: Use `--skip-scrape` for fast rebuilds
+4. **Adjust rate_limit**: Balance speed vs politeness
+5. **Set appropriate max_pages**: Don't scrape more than needed
+6. **Use start_urls**: Target specific documentation sections
+7. **Filter URLs**: Use include/exclude patterns
+8. **Run tests**: Catch issues early
+
+---
+
+## Environment Variables
+
+```bash
+# Anthropic API key (for API enhancement)
+export ANTHROPIC_API_KEY=sk-ant-...
+
+# Optional: Set custom output directory
+export SKILL_SEEKER_OUTPUT_DIR=/path/to/output
+```
+
+---
+
+## Exit Codes
+
+- `0`: Success
+- `1`: Error (general)
+- `2`: Warning (estimation hit limit)
+
+---
+
+## File Locations
+
+```
+Skill_Seekers/
+├── doc_scraper.py           # Main tool
+├── estimate_pages.py        # Estimator
+├── enhance_skill.py         # API enhancement
+├── enhance_skill_local.py   # Local enhancement
+├── package_skill.py         # Packager
+├── run_tests.py             # Test runner
+├── configs/                 # Preset configs
+├── tests/                   # Test suite
+├── docs/                    # Documentation
+└── output/                  # Generated output
+```
+
+---
+
+## Getting Help
+
+```bash
+# Tool-specific help
+python3 cli/doc_scraper.py --help
+python3 cli/estimate_pages.py --help
+python3 cli/run_tests.py --help
+
+# Documentation
+cat CLAUDE.md              # Quick reference for Claude Code
+cat docs/CLAUDE.md         # Detailed technical docs
+cat docs/TESTING.md        # Testing guide
+cat docs/USAGE.md          # This file
+cat docs/ENHANCEMENT.md    # Enhancement guide
+cat docs/UPLOAD_GUIDE.md   # Upload instructions
+cat README.md              # Project overview
+```
+
+---
+
+## Summary
+
+**Essential Commands:**
+```bash
+python3 cli/estimate_pages.py configs/react.json              # Estimate
+python3 cli/doc_scraper.py --config configs/react.json        # Scrape
+python3 cli/enhance_skill_local.py output/react/              # Enhance
+python3 cli/package_skill.py output/react/                    # Package
+python3 cli/run_tests.py                                      # Test
+```
+
+**Quick Start:**
+```bash
+pip3 install requests beautifulsoup4
+python3 cli/doc_scraper.py --config configs/react.json --enhance-local
+python3 cli/package_skill.py output/react/
+# Upload output/react.zip to Claude!
+```
+
+Happy skill creating! 🚀

+ 867 - 0
libs/external/Skill_Seekers-development/docs/plans/2025-10-24-active-skills-design.md

@@ -0,0 +1,867 @@
+# Active Skills Design - Demand-Driven Documentation Loading
+
+**Date:** 2025-10-24
+**Type:** Architecture Design
+**Status:** Phase 1 Implemented ✅
+**Author:** Edgar + Claude (Brainstorming Session)
+
+---
+
+## Executive Summary
+
+Transform Skill_Seekers from creating **passive documentation dumps** into **active, intelligent skills** that load documentation on-demand. This eliminates context bloat (300k → 5-10k per query) while maintaining full access to complete documentation.
+
+**Key Innovation:** Skills become lightweight routers with heavy tools in `scripts/`, not documentation repositories.
+
+---
+
+## Problem Statement
+
+### Current Architecture: Passive Skills
+
+**What happens today:**
+```
+Agent: "How do I use Hono middleware?"
+  ↓
+Skill: *Claude loads 203k llms-txt.md into context*
+  ↓
+Agent: *answers using loaded docs*
+  ↓
+Result: Context bloat, slower performance, hits limits
+```
+
+**Issues:**
+1. **Context Bloat**: 319k llms-full.txt loaded entirely into context
+2. **Wasted Resources**: Agent needs 5k but gets 319k
+3. **Truncation Loss**: 36% of content lost (319k → 203k) due to size limits
+4. **File Extension Bug**: llms.txt files stored as .txt instead of .md
+5. **Single Variant**: Only downloads one file (usually llms-full.txt)
+
+### Current File Structure
+
+```
+output/hono/
+├── SKILL.md ──────────► Documentation dump + instructions
+├── references/
+│   └── llms-txt.md ───► 203k (36% truncated from 319k original)
+├── scripts/ ──────────► EMPTY (placeholder only!)
+└── assets/ ───────────► EMPTY (placeholder only!)
+```
+
+---
+
+## Proposed Architecture: Active Skills
+
+### Core Concept
+
+**Skills = Routers + Tools**, not documentation dumps.
+
+**New workflow:**
+```
+Agent: "How do I use Hono middleware?"
+  ↓
+Skill: *runs scripts/search.py "middleware"*
+  ↓
+Script: *loads llms-full.md, extracts middleware section, returns 8k*
+  ↓
+Agent: *answers using ONLY 8k* (CLEAN CONTEXT!)
+  ↓
+Result: 40x less context, no truncation, full access to docs
+```
+
+### Benefits
+
+| Metric | Before | After | Improvement |
+|--------|--------|-------|-------------|
+| Context per query | 203k | 5-10k | **20-40x reduction** |
+| Content loss | 36% truncated | 0% (no truncation) | **Full fidelity** |
+| Variants available | 1 | 3 | **User choice** |
+| File format | .txt (wrong) | .md (correct) | **Fixed** |
+| Agent workflow | Passive read | Active tools | **Autonomous** |
+
+---
+
+## Design Components
+
+### Component 1: Multi-Variant Download
+
+**Change:** Download ALL 3 variants, not just one.
+
+**File naming (FIXED):**
+- `https://hono.dev/llms-full.txt` → `llms-full.md` ✅
+- `https://hono.dev/llms.txt` → `llms.md` ✅
+- `https://hono.dev/llms-small.txt` → `llms-small.md` ✅
+
+**Sizes (Hono example):**
+- `llms-full.md` - 319k (complete documentation)
+- `llms-small.md` - 176k (curated essentials)
+- `llms.md` - 5.4k (quick reference)
+
+**Storage:**
+```
+output/hono/references/
+├── llms-full.md    # 319k - everything (RENAMED from .txt)
+├── llms-small.md   # 176k - curated (RENAMED from .txt)
+├── llms.md         # 5.4k - quick ref (RENAMED from .txt)
+└── catalog.json    # Generated index (NEW)
+```
+
+**Implementation in `_try_llms_txt()`:**
+```python
+def _try_llms_txt(self) -> bool:
+    """Download ALL llms.txt variants for active skills"""
+
+    # 1. Detect all available variants
+    detector = LlmsTxtDetector(self.base_url)
+    variants = detector.detect_all()  # NEW method
+
+    downloaded = {}
+    for variant_info in variants:
+        url = variant_info['url']       # https://hono.dev/llms-full.txt
+        variant = variant_info['variant']  # 'full', 'standard', 'small'
+
+        downloader = LlmsTxtDownloader(url)
+        content = downloader.download()
+
+        if content:
+            # ✨ FIX: Rename .txt → .md immediately
+            clean_name = f"llms-{variant}.md"
+            downloaded[variant] = {
+                'content': content,
+                'filename': clean_name
+            }
+
+    # 2. Save ALL variants (not just one)
+    for variant, data in downloaded.items():
+        path = os.path.join(self.skill_dir, "references", data['filename'])
+        with open(path, 'w', encoding='utf-8') as f:
+            f.write(data['content'])
+
+    # 3. Generate catalog from smallest variant
+    if 'small' in downloaded:
+        self._generate_catalog(downloaded['small']['content'])
+
+    return True
+```
+
+---
+
+### Component 2: The Catalog System
+
+**Purpose:** Lightweight index of what exists, not the content itself.
+
+**File:** `assets/catalog.json`
+
+**Structure:**
+```json
+{
+  "metadata": {
+    "framework": "hono",
+    "version": "auto-detected",
+    "generated": "2025-10-24T14:30:00Z",
+    "total_sections": 93,
+    "variants": {
+      "quick": "llms-small.md",
+      "standard": "llms.md",
+      "complete": "llms-full.md"
+    }
+  },
+  "sections": [
+    {
+      "id": "routing",
+      "title": "Routing",
+      "h1_marker": "# Routing",
+      "topics": ["routes", "path", "params", "wildcard"],
+      "size_bytes": 4800,
+      "variants": ["quick", "complete"],
+      "complexity": "beginner"
+    },
+    {
+      "id": "middleware",
+      "title": "Middleware",
+      "h1_marker": "# Middleware",
+      "topics": ["cors", "auth", "logging", "compression"],
+      "size_bytes": 8200,
+      "variants": ["quick", "complete"],
+      "complexity": "intermediate"
+    }
+  ],
+  "search_index": {
+    "cors": ["middleware"],
+    "routing": ["routing", "path-parameters"],
+    "authentication": ["middleware", "jwt"],
+    "context": ["context-handling"],
+    "streaming": ["streaming-responses"]
+  }
+}
+```
+
+**Generation (from llms-small.md):**
+```python
+def _generate_catalog(self, llms_small_content):
+    """Generate catalog.json from llms-small.md TOC"""
+    catalog = {
+        "metadata": {...},
+        "sections": [],
+        "search_index": {}
+    }
+
+    # Split by h1 headers
+    sections = re.split(r'\n# ', llms_small_content)
+
+    for section_text in sections[1:]:
+        lines = section_text.split('\n')
+        title = lines[0].strip()
+
+        # Extract h2 topics
+        topics = re.findall(r'^## (.+)$', section_text, re.MULTILINE)
+        topics = [t.strip().lower() for t in topics]
+
+        section_info = {
+            "id": title.lower().replace(' ', '-'),
+            "title": title,
+            "h1_marker": f"# {title}",
+            "topics": topics + [title.lower()],
+            "size_bytes": len(section_text),
+            "variants": ["quick", "complete"]
+        }
+
+        catalog["sections"].append(section_info)
+
+        # Build search index
+        for topic in section_info["topics"]:
+            if topic not in catalog["search_index"]:
+                catalog["search_index"][topic] = []
+            catalog["search_index"][topic].append(section_info["id"])
+
+    # Save to assets/catalog.json
+    catalog_path = os.path.join(self.skill_dir, "assets", "catalog.json")
+    with open(catalog_path, 'w', encoding='utf-8') as f:
+        json.dump(catalog, f, indent=2)
+```
+
+---
+
+### Component 3: Active Scripts
+
+**Location:** `scripts/` directory (currently empty)
+
+#### Script 1: `scripts/search.py`
+
+**Purpose:** Search and return only relevant documentation sections.
+
+```python
+#!/usr/bin/env python3
+"""
+ABOUTME: Searches framework documentation and returns relevant sections
+ABOUTME: Loads only what's needed - keeps agent context clean
+"""
+
+import json
+import sys
+import re
+from pathlib import Path
+
+def search(query, detail="auto"):
+    """
+    Search documentation and return relevant sections.
+
+    Args:
+        query: Search term (e.g., "middleware", "cors", "routing")
+        detail: "quick" | "standard" | "complete" | "auto"
+
+    Returns:
+        Markdown text of relevant sections only
+    """
+    # Load catalog
+    catalog_path = Path(__file__).parent.parent / "assets" / "catalog.json"
+    catalog = json.load(open(catalog_path))
+
+    # 1. Find matching sections using search index
+    query_lower = query.lower()
+    matching_section_ids = set()
+
+    for keyword, section_ids in catalog["search_index"].items():
+        if query_lower in keyword or keyword in query_lower:
+            matching_section_ids.update(section_ids)
+
+    # Get section details
+    matches = [s for s in catalog["sections"] if s["id"] in matching_section_ids]
+
+    if not matches:
+        return f"❌ No sections found for '{query}'. Try: python scripts/list_topics.py"
+
+    # 2. Determine detail level
+    if detail == "auto":
+        # Use quick for overview, complete for deep dive
+        total_size = sum(s["size_bytes"] for s in matches)
+        if total_size > 50000:  # > 50k
+            variant = "quick"
+        else:
+            variant = "complete"
+    else:
+        variant = detail
+
+    variant_file = catalog["metadata"]["variants"].get(variant, "complete")
+
+    # 3. Load documentation file
+    doc_path = Path(__file__).parent.parent / "references" / variant_file
+    doc_content = open(doc_path, 'r', encoding='utf-8').read()
+
+    # 4. Extract matched sections
+    results = []
+    for match in matches:
+        h1_marker = match["h1_marker"]
+
+        # Find section boundaries
+        start = doc_content.find(h1_marker)
+        if start == -1:
+            continue
+
+        # Find next h1 (or end of file)
+        next_h1 = doc_content.find("\n# ", start + len(h1_marker))
+        if next_h1 == -1:
+            section_text = doc_content[start:]
+        else:
+            section_text = doc_content[start:next_h1]
+
+        results.append({
+            'title': match['title'],
+            'size': len(section_text),
+            'content': section_text
+        })
+
+    # 5. Format output
+    output = [f"# Search Results for '{query}' ({len(results)} sections found)\n"]
+    output.append(f"**Variant used:** {variant} ({variant_file})")
+    output.append(f"**Total size:** {sum(r['size'] for r in results):,} bytes\n")
+    output.append("---\n")
+
+    for result in results:
+        output.append(result['content'])
+        output.append("\n---\n")
+
+    return '\n'.join(output)
+
+if __name__ == "__main__":
+    if len(sys.argv) < 2:
+        print("Usage: python search.py <query> [detail]")
+        print("Example: python search.py middleware")
+        print("Example: python search.py routing --detail quick")
+        sys.exit(1)
+
+    query = sys.argv[1]
+    detail = sys.argv[2] if len(sys.argv) > 2 else "auto"
+
+    print(search(query, detail))
+```
+
+#### Script 2: `scripts/list_topics.py`
+
+**Purpose:** Show all available documentation sections.
+
+```python
+#!/usr/bin/env python3
+"""
+ABOUTME: Lists all available documentation sections with sizes
+ABOUTME: Helps agent discover what documentation exists
+"""
+
+import json
+from pathlib import Path
+
+def list_topics():
+    """List all available documentation sections."""
+    catalog_path = Path(__file__).parent.parent / "assets" / "catalog.json"
+    catalog = json.load(open(catalog_path))
+
+    print(f"# Available Documentation Topics ({catalog['metadata']['framework']})\n")
+    print(f"**Total sections:** {catalog['metadata']['total_sections']}")
+    print(f"**Variants:** {', '.join(catalog['metadata']['variants'].keys())}\n")
+    print("---\n")
+
+    # Group by complexity if available
+    by_complexity = {}
+    for section in catalog["sections"]:
+        complexity = section.get("complexity", "general")
+        if complexity not in by_complexity:
+            by_complexity[complexity] = []
+        by_complexity[complexity].append(section)
+
+    for complexity in ["beginner", "intermediate", "advanced", "general"]:
+        if complexity not in by_complexity:
+            continue
+
+        sections = by_complexity[complexity]
+        print(f"## {complexity.title()} ({len(sections)} sections)\n")
+
+        for section in sections:
+            size_kb = section["size_bytes"] / 1024
+            topics_str = ", ".join(section["topics"][:3])
+            print(f"- **{section['title']}** ({size_kb:.1f}k)")
+            print(f"  Topics: {topics_str}")
+            print(f"  Search: `python scripts/search.py {section['id']}`\n")
+
+if __name__ == "__main__":
+    list_topics()
+```
+
+#### Script 3: `scripts/get_section.py`
+
+**Purpose:** Extract a complete section by exact title.
+
+```python
+#!/usr/bin/env python3
+"""
+ABOUTME: Extracts a complete documentation section by title
+ABOUTME: Returns full section from llms-full.md (no truncation)
+"""
+
+import json
+import sys
+from pathlib import Path
+
+def get_section(title, variant="complete"):
+    """
+    Get a complete section by exact title.
+
+    Args:
+        title: Section title (e.g., "Middleware", "Routing")
+        variant: Which file to use (quick/standard/complete)
+
+    Returns:
+        Complete section content
+    """
+    catalog_path = Path(__file__).parent.parent / "assets" / "catalog.json"
+    catalog = json.load(open(catalog_path))
+
+    # Find section
+    section = None
+    for s in catalog["sections"]:
+        if s["title"].lower() == title.lower():
+            section = s
+            break
+
+    if not section:
+        return f"❌ Section '{title}' not found. Try: python scripts/list_topics.py"
+
+    # Load doc
+    variant_file = catalog["metadata"]["variants"].get(variant, "complete")
+    doc_path = Path(__file__).parent.parent / "references" / variant_file
+    doc_content = open(doc_path, 'r', encoding='utf-8').read()
+
+    # Extract section
+    h1_marker = section["h1_marker"]
+    start = doc_content.find(h1_marker)
+
+    if start == -1:
+        return f"❌ Section '{title}' not found in {variant_file}"
+
+    next_h1 = doc_content.find("\n# ", start + len(h1_marker))
+    if next_h1 == -1:
+        section_text = doc_content[start:]
+    else:
+        section_text = doc_content[start:next_h1]
+
+    return section_text
+
+if __name__ == "__main__":
+    if len(sys.argv) < 2:
+        print("Usage: python get_section.py <title> [variant]")
+        print("Example: python get_section.py Middleware")
+        print("Example: python get_section.py Routing quick")
+        sys.exit(1)
+
+    title = sys.argv[1]
+    variant = sys.argv[2] if len(sys.argv) > 2 else "complete"
+
+    print(get_section(title, variant))
+```
+
+---
+
+### Component 4: Active SKILL.md Template
+
+**New template for llms.txt-based skills:**
+
+```markdown
+---
+name: {name}
+description: {description}
+type: active
+---
+
+# {Name} Skill
+
+**⚡ This is an ACTIVE skill** - Uses scripts to load documentation on-demand instead of dumping everything into context.
+
+## 🎯 Strategy: Demand-Driven Documentation
+
+**Traditional approach:**
+- Load 300k+ documentation into context
+- Agent reads everything to answer one question
+- Context bloat, slower performance
+
+**Active approach:**
+- Load 5-10k of relevant sections on-demand
+- Agent calls scripts to fetch what's needed
+- Clean context, faster performance
+
+## 📚 Available Documentation
+
+This skill provides access to {num_sections} documentation sections across 3 detail levels:
+
+- **Quick Reference** (`llms-small.md`): {small_size}k - Curated essentials
+- **Standard** (`llms.md`): {standard_size}k - Core concepts
+- **Complete** (`llms-full.md`): {full_size}k - Everything
+
+## 🔧 Tools Available
+
+### 1. Search Documentation
+Find and load only relevant sections:
+
+```bash
+python scripts/search.py "middleware"
+python scripts/search.py "routing" --detail quick
+```
+
+**Returns:** 5-10k of relevant content (not 300k!)
+
+### 2. List All Topics
+See what documentation exists:
+
+```bash
+python scripts/list_topics.py
+```
+
+**Returns:** Table of contents with section sizes and search hints
+
+### 3. Get Complete Section
+Extract a full section by title:
+
+```bash
+python scripts/get_section.py "Middleware"
+python scripts/get_section.py "Routing" quick
+```
+
+**Returns:** Complete section from chosen variant
+
+## 💡 Recommended Workflow
+
+1. **Discover:** `python scripts/list_topics.py` to see what's available
+2. **Search:** `python scripts/search.py "your topic"` to find relevant sections
+3. **Deep Dive:** Use returned content to answer questions in detail
+4. **Iterate:** Search more specific topics as needed
+
+## ⚠️ Important
+
+**DON'T:** Read `references/*.md` files directly into context
+**DO:** Use scripts to fetch only what you need
+
+This keeps your context clean and focused!
+
+## 📊 Index
+
+Complete section catalog available in `assets/catalog.json` with search mappings and size information.
+
+## 🔄 Updating
+
+To refresh with latest documentation:
+```bash
+python3 cli/doc_scraper.py --config configs/{name}.json
+```
+```
+
+---
+
+## Implementation Plan
+
+### Phase 1: Foundation (Quick Fixes)
+
+**Tasks:**
+1. Fix `.txt` → `.md` renaming in downloader
+2. Download all 3 variants (not just one)
+3. Store all variants in `references/` with correct names
+4. Remove content truncation (2500 chars → unlimited)
+
+**Time:** 1-2 hours
+**Files:** `cli/doc_scraper.py`, `cli/llms_txt_downloader.py`
+
+### Phase 2: Catalog System
+
+**Tasks:**
+1. Implement `_generate_catalog()` method
+2. Parse llms-small.md to extract sections
+3. Build search index from topics
+4. Generate `assets/catalog.json`
+
+**Time:** 2-3 hours
+**Files:** `cli/doc_scraper.py`
+
+### Phase 3: Active Scripts
+
+**Tasks:**
+1. Create `scripts/search.py`
+2. Create `scripts/list_topics.py`
+3. Create `scripts/get_section.py`
+4. Make scripts executable (`chmod +x`)
+
+**Time:** 2-3 hours
+**Files:** New scripts in `scripts/` template directory
+
+### Phase 4: Template Updates
+
+**Tasks:**
+1. Create new active SKILL.md template
+2. Update `create_enhanced_skill_md()` to use active template for llms.txt skills
+3. Update documentation to explain active skills
+
+**Time:** 1 hour
+**Files:** `cli/doc_scraper.py`, `README.md`, `CLAUDE.md`
+
+### Phase 5: Testing & Refinement
+
+**Tasks:**
+1. Test with Hono skill (has all 3 variants)
+2. Test search accuracy
+3. Measure context reduction
+4. Document examples
+
+**Time:** 2-3 hours
+
+**Total Estimated Time:** 8-12 hours
+
+---
+
+## Migration Path
+
+### Backward Compatibility
+
+**Existing skills:** No changes (passive skills still work)
+**New llms.txt skills:** Automatically use active architecture
+**User choice:** Can disable via config flag
+
+### Config Option
+
+```json
+{
+  "name": "hono",
+  "llms_txt_url": "https://hono.dev/llms-full.txt",
+  "active_skill": true,  // NEW: Enable active architecture (default: true)
+  "base_url": "https://hono.dev/docs"
+}
+```
+
+### Detection Logic
+
+```python
+# In _try_llms_txt()
+active_mode = self.config.get('active_skill', True)  # Default true
+
+if active_mode:
+    # Download all variants, generate catalog, create scripts
+    self._build_active_skill(downloaded)
+else:
+    # Traditional: single file, no scripts
+    self._build_passive_skill(downloaded)
+```
+
+---
+
+## Benefits Analysis
+
+### Context Efficiency
+
+| Scenario | Passive Skill | Active Skill | Improvement |
+|----------|---------------|--------------|-------------|
+| Simple query | 203k loaded | 5k loaded | **40x reduction** |
+| Multi-topic query | 203k loaded | 15k loaded | **13x reduction** |
+| Deep dive | 203k loaded | 30k loaded | **6x reduction** |
+
+### Data Fidelity
+
+| Aspect | Passive | Active |
+|--------|---------|--------|
+| Content truncation | 36% lost | 0% lost |
+| Code truncation | 600 chars max | Unlimited |
+| Variants available | 1 | 3 |
+
+### Agent Capabilities
+
+**Passive Skills:**
+- ❌ Cannot choose detail level
+- ❌ Cannot search efficiently
+- ❌ Must read entire context
+- ❌ Limited by context window
+
+**Active Skills:**
+- ✅ Chooses appropriate detail level
+- ✅ Searches catalog efficiently
+- ✅ Loads only what's needed
+- ✅ Unlimited documentation access
+
+---
+
+## Trade-offs
+
+### Advantages
+
+1. **Massive context reduction** (20-40x less per query)
+2. **No content loss** (all 3 variants preserved)
+3. **Correct file format** (.md not .txt)
+4. **Agent autonomy** (tools to fetch docs)
+5. **Scalable** (works with 1MB+ docs)
+
+### Disadvantages
+
+1. **Complexity** (scripts + catalog vs simple files)
+2. **Initial overhead** (catalog generation)
+3. **Agent learning curve** (must learn to use scripts)
+4. **Dependency** (Python required to run scripts)
+
+### Risk Mitigation
+
+**Risk:** Scripts don't work in Claude's sandbox
+**Mitigation:** Test thoroughly, provide fallback to passive mode
+
+**Risk:** Catalog generation fails
+**Mitigation:** Graceful degradation to single-file mode
+
+**Risk:** Agent doesn't use scripts
+**Mitigation:** Clear SKILL.md instructions, examples in quick reference
+
+---
+
+## Success Metrics
+
+### Technical Metrics
+
+- ✅ Context per query < 20k (down from 203k)
+- ✅ All 3 variants downloaded and named correctly
+- ✅ 0% content truncation
+- ✅ Catalog generation < 5 seconds
+- ✅ Search script < 1 second response time
+
+### User Experience Metrics
+
+- ✅ Agent successfully uses scripts without prompting
+- ✅ Answers are equally or more accurate than passive mode
+- ✅ Agent can handle queries about all documentation sections
+- ✅ No "context limit exceeded" errors
+
+---
+
+## Future Enhancements
+
+### Phase 6: Smart Caching
+
+Cache frequently accessed sections in SKILL.md quick reference:
+```python
+# Track access frequency in catalog.json
+"sections": [
+  {
+    "id": "middleware",
+    "access_count": 47,  # NEW: Track usage
+    "last_accessed": "2025-10-24T14:30:00Z"
+  }
+]
+
+# Include top 10 most-accessed sections directly in SKILL.md
+```
+
+### Phase 7: Semantic Search
+
+Use embeddings for better search:
+```python
+# Generate embeddings for each section
+"sections": [
+  {
+    "id": "middleware",
+    "embedding": [...],  # NEW: Vector embedding
+    "topics": ["cors", "auth"]
+  }
+]
+
+# In search.py: Use cosine similarity for better matches
+```
+
+### Phase 8: Progressive Loading
+
+Load increasingly detailed docs:
+```python
+# First: Load llms.md (5.4k - overview)
+# If insufficient: Load llms-small.md section (15k)
+# If still insufficient: Load llms-full.md section (30k)
+```
+
+---
+
+## Conclusion
+
+Active skills represent a fundamental shift from **documentation repositories** to **documentation routers**. By treating skills as intelligent intermediaries rather than static dumps, we can:
+
+1. **Eliminate context bloat** (40x reduction)
+2. **Preserve full fidelity** (0% truncation)
+3. **Enable agent autonomy** (tools to fetch docs)
+4. **Scale indefinitely** (no size limits)
+
+This design maintains backward compatibility while unlocking new capabilities for modern, LLM-optimized documentation sources like llms.txt.
+
+**Recommendation:** Implement in phases, starting with foundation fixes, then catalog system, then active scripts. Test thoroughly with Hono before making it the default for all llms.txt-based skills.
+
+---
+
+## References
+
+- Original brainstorming session: 2025-10-24
+- llms.txt convention: https://llmstxt.org/
+- Hono example: https://hono.dev/llms-full.txt
+- Skill_Seekers repository: Current project
+
+---
+
+## Appendix: Example Workflows
+
+### Example 1: Agent Searches for "Middleware"
+
+```bash
+# Agent runs:
+python scripts/search.py "middleware"
+
+# Script returns ~8k of middleware documentation from llms-full.md
+# Agent uses that 8k to answer the question
+# Total context used: 8k (not 319k!)
+```
+
+### Example 2: Agent Explores Documentation
+
+```bash
+# 1. Agent lists topics
+python scripts/list_topics.py
+# Returns: Table of contents (2k)
+
+# 2. Agent picks a topic
+python scripts/get_section.py "Routing"
+# Returns: Complete Routing section (5k)
+
+# 3. Agent searches related topics
+python scripts/search.py "path parameters"
+# Returns: Routing + Path section (7k)
+
+# Total context used across 3 queries: 14k (not 3 × 319k = 957k!)
+```
+
+### Example 3: Agent Needs Quick Answer
+
+```bash
+# Agent uses quick variant for overview
+python scripts/search.py "cors" --detail quick
+
+# Returns: Short CORS explanation from llms-small.md (2k)
+# If insufficient, agent can follow up with:
+python scripts/get_section.py "Middleware"  # Full section from llms-full.md
+```
+
+---
+
+**Document Status:** Ready for review and implementation planning.

+ 682 - 0
libs/external/Skill_Seekers-development/docs/plans/2025-10-24-active-skills-phase1.md

@@ -0,0 +1,682 @@
+# Active Skills Phase 1: Foundation Implementation Plan
+
+> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
+
+**Goal:** Fix fundamental issues in llms.txt handling: rename .txt→.md, download all 3 variants, remove truncation.
+
+**Architecture:** Modify existing llms.txt download/parse/build workflow to handle multiple variants correctly, store with proper extensions, and preserve complete content without truncation.
+
+**Tech Stack:** Python 3.10+, requests, BeautifulSoup4, existing Skill_Seekers architecture
+
+---
+
+## Task 1: Add Multi-Variant Detection
+
+**Files:**
+- Modify: `cli/llms_txt_detector.py`
+- Test: `tests/test_llms_txt_detector.py`
+
+**Step 1: Write failing test for detect_all() method**
+
+```python
+# tests/test_llms_txt_detector.py (add new test)
+
+def test_detect_all_variants():
+    """Test detecting all llms.txt variants"""
+    from unittest.mock import patch, Mock
+
+    detector = LlmsTxtDetector("https://hono.dev/docs")
+
+    with patch('cli.llms_txt_detector.requests.head') as mock_head:
+        # Mock responses for different variants
+        def mock_response(url, **kwargs):
+            response = Mock()
+            # All 3 variants exist for Hono
+            if 'llms-full.txt' in url or 'llms.txt' in url or 'llms-small.txt' in url:
+                response.status_code = 200
+            else:
+                response.status_code = 404
+            return response
+
+        mock_head.side_effect = mock_response
+
+        variants = detector.detect_all()
+
+        assert len(variants) == 3
+        assert any(v['variant'] == 'full' for v in variants)
+        assert any(v['variant'] == 'standard' for v in variants)
+        assert any(v['variant'] == 'small' for v in variants)
+        assert all('url' in v for v in variants)
+```
+
+**Step 2: Run test to verify it fails**
+
+Run: `source .venv/bin/activate && pytest tests/test_llms_txt_detector.py::test_detect_all_variants -v`
+
+Expected: FAIL with "AttributeError: 'LlmsTxtDetector' object has no attribute 'detect_all'"
+
+**Step 3: Implement detect_all() method**
+
+```python
+# cli/llms_txt_detector.py (add new method)
+
+def detect_all(self) -> List[Dict[str, str]]:
+    """
+    Detect all available llms.txt variants.
+
+    Returns:
+        List of dicts with 'url' and 'variant' keys for each found variant
+    """
+    found_variants = []
+
+    for filename, variant in self.VARIANTS:
+        parsed = urlparse(self.base_url)
+        root_url = f"{parsed.scheme}://{parsed.netloc}"
+        url = f"{root_url}/{filename}"
+
+        if self._check_url_exists(url):
+            found_variants.append({
+                'url': url,
+                'variant': variant
+            })
+
+    return found_variants
+```
+
+**Step 4: Add import for List and Dict at top of file**
+
+```python
+# cli/llms_txt_detector.py (add to imports)
+from typing import Optional, Dict, List
+```
+
+**Step 5: Run test to verify it passes**
+
+Run: `source .venv/bin/activate && pytest tests/test_llms_txt_detector.py::test_detect_all_variants -v`
+
+Expected: PASS
+
+**Step 6: Commit**
+
+```bash
+git add cli/llms_txt_detector.py tests/test_llms_txt_detector.py
+git commit -m "feat: add detect_all() for multi-variant detection"
+```
+
+---
+
+## Task 2: Add File Extension Renaming to Downloader
+
+**Files:**
+- Modify: `cli/llms_txt_downloader.py`
+- Test: `tests/test_llms_txt_downloader.py`
+
+**Step 1: Write failing test for get_proper_filename() method**
+
+```python
+# tests/test_llms_txt_downloader.py (add new test)
+
+def test_get_proper_filename():
+    """Test filename conversion from .txt to .md"""
+    downloader = LlmsTxtDownloader("https://hono.dev/llms-full.txt")
+
+    filename = downloader.get_proper_filename()
+
+    assert filename == "llms-full.md"
+    assert not filename.endswith('.txt')
+
+def test_get_proper_filename_standard():
+    """Test standard variant naming"""
+    downloader = LlmsTxtDownloader("https://hono.dev/llms.txt")
+
+    filename = downloader.get_proper_filename()
+
+    assert filename == "llms.md"
+
+def test_get_proper_filename_small():
+    """Test small variant naming"""
+    downloader = LlmsTxtDownloader("https://hono.dev/llms-small.txt")
+
+    filename = downloader.get_proper_filename()
+
+    assert filename == "llms-small.md"
+```
+
+**Step 2: Run test to verify it fails**
+
+Run: `source .venv/bin/activate && pytest tests/test_llms_txt_downloader.py::test_get_proper_filename -v`
+
+Expected: FAIL with "AttributeError: 'LlmsTxtDownloader' object has no attribute 'get_proper_filename'"
+
+**Step 3: Implement get_proper_filename() method**
+
+```python
+# cli/llms_txt_downloader.py (add new method)
+
+def get_proper_filename(self) -> str:
+    """
+    Extract filename from URL and convert .txt to .md
+
+    Returns:
+        Proper filename with .md extension
+
+    Examples:
+        https://hono.dev/llms-full.txt -> llms-full.md
+        https://hono.dev/llms.txt -> llms.md
+        https://hono.dev/llms-small.txt -> llms-small.md
+    """
+    # Extract filename from URL
+    from urllib.parse import urlparse
+    parsed = urlparse(self.url)
+    filename = parsed.path.split('/')[-1]
+
+    # Replace .txt with .md
+    if filename.endswith('.txt'):
+        filename = filename[:-4] + '.md'
+
+    return filename
+```
+
+**Step 4: Run test to verify it passes**
+
+Run: `source .venv/bin/activate && pytest tests/test_llms_txt_downloader.py::test_get_proper_filename -v`
+
+Expected: PASS (all 3 tests)
+
+**Step 5: Commit**
+
+```bash
+git add cli/llms_txt_downloader.py tests/test_llms_txt_downloader.py
+git commit -m "feat: add get_proper_filename() for .txt to .md conversion"
+```
+
+---
+
+## Task 3: Update _try_llms_txt() to Download All Variants
+
+**Files:**
+- Modify: `cli/doc_scraper.py:337-384` (_try_llms_txt method)
+- Test: `tests/test_integration.py`
+
+**Step 1: Write failing test for multi-variant download**
+
+```python
+# tests/test_integration.py (add to TestFullLlmsTxtWorkflow class)
+
+def test_multi_variant_download(self):
+    """Test downloading all 3 llms.txt variants"""
+    from unittest.mock import patch, Mock
+    import tempfile
+    import os
+
+    config = {
+        'name': 'test-multi-variant',
+        'base_url': 'https://hono.dev/docs'
+    }
+
+    # Mock all 3 variants
+    sample_full = "# Full\n" + "x" * 1000
+    sample_standard = "# Standard\n" + "x" * 200
+    sample_small = "# Small\n" + "x" * 500
+
+    with tempfile.TemporaryDirectory() as tmpdir:
+        with patch('cli.llms_txt_detector.requests.head') as mock_head, \
+             patch('cli.llms_txt_downloader.requests.get') as mock_get:
+
+            # Mock detection (all exist)
+            mock_head_response = Mock()
+            mock_head_response.status_code = 200
+            mock_head.return_value = mock_head_response
+
+            # Mock downloads
+            def mock_download(url, **kwargs):
+                response = Mock()
+                response.status_code = 200
+                if 'llms-full.txt' in url:
+                    response.text = sample_full
+                elif 'llms-small.txt' in url:
+                    response.text = sample_small
+                else:  # llms.txt
+                    response.text = sample_standard
+                return response
+
+            mock_get.side_effect = mock_download
+
+            # Run scraper
+            scraper = DocumentationScraper(config, dry_run=False)
+            result = scraper._try_llms_txt()
+
+            # Verify all 3 files created
+            refs_dir = os.path.join(scraper.skill_dir, 'references')
+
+            assert os.path.exists(os.path.join(refs_dir, 'llms-full.md'))
+            assert os.path.exists(os.path.join(refs_dir, 'llms.md'))
+            assert os.path.exists(os.path.join(refs_dir, 'llms-small.md'))
+
+            # Verify content not truncated
+            with open(os.path.join(refs_dir, 'llms-full.md')) as f:
+                content = f.read()
+                assert len(content) == len(sample_full)
+```
+
+**Step 2: Run test to verify it fails**
+
+Run: `source .venv/bin/activate && pytest tests/test_integration.py::TestFullLlmsTxtWorkflow::test_multi_variant_download -v`
+
+Expected: FAIL - only one file created, not all 3
+
+**Step 3: Modify _try_llms_txt() to use detect_all()**
+
+```python
+# cli/doc_scraper.py (replace _try_llms_txt method, lines 337-384)
+
+def _try_llms_txt(self) -> bool:
+    """
+    Try to use llms.txt instead of HTML scraping.
+    Downloads ALL available variants and stores with .md extension.
+
+    Returns:
+        True if llms.txt was found and processed successfully
+    """
+    print(f"\n🔍 Checking for llms.txt at {self.base_url}...")
+
+    # Check for explicit config URL first
+    explicit_url = self.config.get('llms_txt_url')
+    if explicit_url:
+        print(f"\n📌 Using explicit llms_txt_url from config: {explicit_url}")
+
+        downloader = LlmsTxtDownloader(explicit_url)
+        content = downloader.download()
+
+        if content:
+            # Save with proper .md extension
+            filename = downloader.get_proper_filename()
+            filepath = os.path.join(self.skill_dir, "references", filename)
+            os.makedirs(os.path.dirname(filepath), exist_ok=True)
+
+            with open(filepath, 'w', encoding='utf-8') as f:
+                f.write(content)
+            print(f"  💾 Saved {filename} ({len(content)} chars)")
+
+            # Parse and save pages
+            parser = LlmsTxtParser(content)
+            pages = parser.parse()
+
+            if pages:
+                for page in pages:
+                    self.save_page(page)
+                    self.pages.append(page)
+
+                self.llms_txt_detected = True
+                self.llms_txt_variant = 'explicit'
+                return True
+
+    # Auto-detection: Find ALL variants
+    detector = LlmsTxtDetector(self.base_url)
+    variants = detector.detect_all()
+
+    if not variants:
+        print("ℹ️  No llms.txt found, using HTML scraping")
+        return False
+
+    print(f"✅ Found {len(variants)} llms.txt variant(s)")
+
+    # Download ALL variants
+    downloaded = {}
+    for variant_info in variants:
+        url = variant_info['url']
+        variant = variant_info['variant']
+
+        print(f"  📥 Downloading {variant}...")
+        downloader = LlmsTxtDownloader(url)
+        content = downloader.download()
+
+        if content:
+            filename = downloader.get_proper_filename()
+            downloaded[variant] = {
+                'content': content,
+                'filename': filename,
+                'size': len(content)
+            }
+            print(f"     ✓ {filename} ({len(content)} chars)")
+
+    if not downloaded:
+        print("⚠️  Failed to download any variants, falling back to HTML scraping")
+        return False
+
+    # Save ALL variants to references/
+    os.makedirs(os.path.join(self.skill_dir, "references"), exist_ok=True)
+
+    for variant, data in downloaded.items():
+        filepath = os.path.join(self.skill_dir, "references", data['filename'])
+        with open(filepath, 'w', encoding='utf-8') as f:
+            f.write(data['content'])
+        print(f"  💾 Saved {data['filename']}")
+
+    # Parse LARGEST variant for skill building
+    largest = max(downloaded.items(), key=lambda x: x[1]['size'])
+    print(f"\n📄 Parsing {largest[1]['filename']} for skill building...")
+
+    parser = LlmsTxtParser(largest[1]['content'])
+    pages = parser.parse()
+
+    if not pages:
+        print("⚠️  Failed to parse llms.txt, falling back to HTML scraping")
+        return False
+
+    print(f"  ✓ Parsed {len(pages)} sections")
+
+    # Save pages for skill building
+    for page in pages:
+        self.save_page(page)
+        self.pages.append(page)
+
+    self.llms_txt_detected = True
+    self.llms_txt_variants = list(downloaded.keys())
+
+    return True
+```
+
+**Step 4: Add llms_txt_variants attribute to __init__**
+
+```python
+# cli/doc_scraper.py (in __init__ method, after llms_txt_variant line)
+
+self.llms_txt_variants = []  # Track all downloaded variants
+```
+
+**Step 5: Run test to verify it passes**
+
+Run: `source .venv/bin/activate && pytest tests/test_integration.py::TestFullLlmsTxtWorkflow::test_multi_variant_download -v`
+
+Expected: PASS
+
+**Step 6: Commit**
+
+```bash
+git add cli/doc_scraper.py tests/test_integration.py
+git commit -m "feat: download all llms.txt variants with proper .md extension"
+```
+
+---
+
+## Task 4: Remove Content Truncation
+
+**Files:**
+- Modify: `cli/doc_scraper.py:714-730` (create_reference_file method)
+
+**Step 1: Write failing test for no truncation**
+
+```python
+# tests/test_integration.py (add new test)
+
+def test_no_content_truncation():
+    """Test that content is NOT truncated in reference files"""
+    from unittest.mock import Mock
+    import tempfile
+    import os
+
+    config = {
+        'name': 'test-no-truncate',
+        'base_url': 'https://example.com/docs'
+    }
+
+    # Create scraper with long content
+    scraper = DocumentationScraper(config, dry_run=False)
+
+    # Create page with content > 2500 chars
+    long_content = "x" * 5000
+    long_code = "y" * 1000
+
+    pages = [{
+        'title': 'Long Page',
+        'url': 'https://example.com/long',
+        'content': long_content,
+        'code_samples': [
+            {'code': long_code, 'language': 'python'}
+        ],
+        'headings': []
+    }]
+
+    # Create reference file
+    scraper.create_reference_file('test', pages)
+
+    # Verify no truncation
+    ref_file = os.path.join(scraper.skill_dir, 'references', 'test.md')
+    with open(ref_file, 'r') as f:
+        content = f.read()
+
+    assert long_content in content  # Full content included
+    assert long_code in content     # Full code included
+    assert '[Content truncated]' not in content
+    assert '...' not in content or content.count('...') == 0
+```
+
+**Step 2: Run test to verify it fails**
+
+Run: `source .venv/bin/activate && pytest tests/test_integration.py::test_no_content_truncation -v`
+
+Expected: FAIL - content contains "[Content truncated]" or "..."
+
+**Step 3: Remove truncation from create_reference_file()**
+
+```python
+# cli/doc_scraper.py (modify create_reference_file method, lines 712-731)
+
+# OLD (line 714-716):
+#     if page.get('content'):
+#         content = page['content'][:2500]
+#         if len(page['content']) > 2500:
+#             content += "\n\n*[Content truncated]*"
+
+# NEW (replace with):
+    if page.get('content'):
+        content = page['content']  # NO TRUNCATION
+        lines.append(content)
+        lines.append("")
+
+# OLD (line 728-730):
+#     lines.append(code[:600])
+#     if len(code) > 600:
+#         lines.append("...")
+
+# NEW (replace with):
+    lines.append(code)  # NO TRUNCATION
+    # No "..." suffix
+```
+
+**Complete replacement of lines 712-731:**
+
+```python
+# cli/doc_scraper.py:712-731 (complete replacement)
+
+        # Content (NO TRUNCATION)
+        if page.get('content'):
+            lines.append(page['content'])
+            lines.append("")
+
+        # Code examples with language (NO TRUNCATION)
+        if page.get('code_samples'):
+            lines.append("**Examples:**\n")
+            for i, sample in enumerate(page['code_samples'][:4], 1):
+                lang = sample.get('language', 'unknown')
+                code = sample.get('code', sample if isinstance(sample, str) else '')
+                lines.append(f"Example {i} ({lang}):")
+                lines.append(f"```{lang}")
+                lines.append(code)  # Full code, no truncation
+                lines.append("```\n")
+```
+
+**Step 4: Run test to verify it passes**
+
+Run: `source .venv/bin/activate && pytest tests/test_integration.py::test_no_content_truncation -v`
+
+Expected: PASS
+
+**Step 5: Run full test suite to check for regressions**
+
+Run: `source .venv/bin/activate && pytest tests/ -v`
+
+Expected: All 201+ tests pass
+
+**Step 6: Commit**
+
+```bash
+git add cli/doc_scraper.py tests/test_integration.py
+git commit -m "feat: remove content truncation in reference files"
+```
+
+---
+
+## Task 5: Update Documentation
+
+**Files:**
+- Modify: `docs/plans/2025-10-24-active-skills-design.md`
+- Modify: `CHANGELOG.md`
+
+**Step 1: Update design doc status**
+
+```markdown
+# docs/plans/2025-10-24-active-skills-design.md (update header)
+
+**Status:** Phase 1 Implemented ✅
+```
+
+**Step 2: Add CHANGELOG entry**
+
+```markdown
+# CHANGELOG.md (add new section at top)
+
+## [Unreleased]
+
+### Added - Phase 1: Active Skills Foundation
+- Multi-variant llms.txt detection: downloads all 3 variants (full, standard, small)
+- Automatic .txt → .md file extension conversion
+- No content truncation: preserves complete documentation
+- `detect_all()` method for finding all llms.txt variants
+- `get_proper_filename()` for correct .md naming
+
+### Changed
+- `_try_llms_txt()` now downloads all available variants instead of just one
+- Reference files now contain complete content (no 2500 char limit)
+- Code samples now include full code (no 600 char limit)
+
+### Fixed
+- File extension bug: llms.txt files now saved as .md
+- Content loss: 0% truncation (was 36%)
+```
+
+**Step 3: Commit**
+
+```bash
+git add docs/plans/2025-10-24-active-skills-design.md CHANGELOG.md
+git commit -m "docs: update status for Phase 1 completion"
+```
+
+---
+
+## Task 6: Manual Verification
+
+**Files:**
+- None (manual testing)
+
+**Step 1: Test with Hono config**
+
+Run: `source .venv/bin/activate && python3 cli/doc_scraper.py --config configs/hono.json`
+
+**Expected output:**
+```
+🔍 Checking for llms.txt at https://hono.dev/docs...
+📌 Using explicit llms_txt_url from config: https://hono.dev/llms-full.txt
+  💾 Saved llms-full.md (319000 chars)
+📄 Parsing llms-full.md for skill building...
+  ✓ Parsed 93 sections
+✅ Used llms.txt (explicit) - skipping HTML scraping
+```
+
+**Step 2: Verify all 3 files exist with correct extensions**
+
+Run: `ls -lah output/hono/references/llms*.md`
+
+Expected:
+```
+llms-full.md    319k
+llms.md         5.4k
+llms-small.md   176k
+```
+
+**Step 3: Verify no truncation in reference files**
+
+Run: `grep -c "Content truncated" output/hono/references/*.md`
+
+Expected: 0 matches (no truncation messages)
+
+**Step 4: Check file sizes are correct**
+
+Run: `wc -c output/hono/references/llms-full.md`
+
+Expected: Should match original download size (~319k), not reduced to 203k
+
+**Step 5: Verify all tests still pass**
+
+Run: `source .venv/bin/activate && pytest tests/ -v`
+
+Expected: All tests pass (201+)
+
+---
+
+## Completion Checklist
+
+- [ ] Task 1: Multi-variant detection (detect_all)
+- [ ] Task 2: File extension renaming (get_proper_filename)
+- [ ] Task 3: Download all variants (_try_llms_txt)
+- [ ] Task 4: Remove truncation (create_reference_file)
+- [ ] Task 5: Update documentation
+- [ ] Task 6: Manual verification
+- [ ] All tests passing
+- [ ] No regressions in existing functionality
+
+---
+
+## Success Criteria
+
+**Technical:**
+- ✅ All 3 variants downloaded when available
+- ✅ Files saved with .md extension (not .txt)
+- ✅ 0% content truncation (was 36%)
+- ✅ All existing tests pass
+- ✅ New tests cover all changes
+
+**User Experience:**
+- ✅ Hono skill has all 3 files: llms-full.md, llms.md, llms-small.md
+- ✅ Reference files contain complete documentation
+- ✅ No "[Content truncated]" messages in output
+
+---
+
+## Related Skills
+
+- @superpowers:test-driven-development - Used throughout for TDD approach
+- @superpowers:verification-before-completion - Used in Task 6 for manual verification
+
+---
+
+## Notes
+
+- This plan implements Phase 1 from `docs/plans/2025-10-24-active-skills-design.md`
+- Phase 2 (Catalog System) and Phase 3 (Active Scripts) will be separate plans
+- All changes maintain backward compatibility with existing HTML scraping
+- File extension fix (.txt → .md) is critical for proper skill functionality
+
+---
+
+## Estimated Time
+
+- Task 1: 15 minutes
+- Task 2: 15 minutes
+- Task 3: 30 minutes
+- Task 4: 20 minutes
+- Task 5: 10 minutes
+- Task 6: 15 minutes
+
+**Total: ~1.5 hours**

+ 11 - 0
libs/external/Skill_Seekers-development/example-mcp-config.json

@@ -0,0 +1,11 @@
+{
+  "mcpServers": {
+    "skill-seeker": {
+      "command": "python3",
+      "args": [
+        "/mnt/1ece809a-2821-4f10-aecb-fcdf34760c0b/Git/Skill_Seekers/mcp/server.py"
+      ],
+      "cwd": "/mnt/1ece809a-2821-4f10-aecb-fcdf34760c0b/Git/Skill_Seekers"
+    }
+  }
+}

+ 13 - 0
libs/external/Skill_Seekers-development/mypy.ini

@@ -0,0 +1,13 @@
+[mypy]
+python_version = 3.10
+warn_return_any = False
+warn_unused_configs = True
+disallow_untyped_defs = False
+check_untyped_defs = True
+ignore_missing_imports = True
+no_implicit_optional = True
+show_error_codes = True
+
+# Gradual typing - be lenient for now
+disallow_incomplete_defs = False
+disallow_untyped_calls = False

+ 149 - 0
libs/external/Skill_Seekers-development/pyproject.toml

@@ -0,0 +1,149 @@
+[build-system]
+requires = ["setuptools>=61.0", "wheel"]
+build-backend = "setuptools.build_meta"
+
+[project]
+name = "skill-seekers"
+version = "2.1.1"
+description = "Convert documentation websites, GitHub repositories, and PDFs into Claude AI skills"
+readme = "README.md"
+requires-python = ">=3.10"
+license = {text = "MIT"}
+authors = [
+    {name = "Yusuf Karaaslan"}
+]
+keywords = [
+    "claude",
+    "ai",
+    "documentation",
+    "scraping",
+    "skills",
+    "llm",
+    "mcp",
+    "automation"
+]
+classifiers = [
+    "Development Status :: 4 - Beta",
+    "Intended Audience :: Developers",
+    "License :: OSI Approved :: MIT License",
+    "Operating System :: OS Independent",
+    "Programming Language :: Python :: 3",
+    "Programming Language :: Python :: 3.10",
+    "Programming Language :: Python :: 3.11",
+    "Programming Language :: Python :: 3.12",
+    "Programming Language :: Python :: 3.13",
+    "Topic :: Software Development :: Documentation",
+    "Topic :: Software Development :: Libraries :: Python Modules",
+    "Topic :: Text Processing :: Markup :: Markdown",
+]
+
+# Core dependencies
+dependencies = [
+    "requests>=2.32.5",
+    "beautifulsoup4>=4.14.2",
+    "PyGithub>=2.5.0",
+    "mcp>=1.18.0",
+    "httpx>=0.28.1",
+    "httpx-sse>=0.4.3",
+    "PyMuPDF>=1.24.14",
+    "Pillow>=11.0.0",
+    "pytesseract>=0.3.13",
+    "pydantic>=2.12.3",
+    "pydantic-settings>=2.11.0",
+    "python-dotenv>=1.1.1",
+    "jsonschema>=4.25.1",
+    "click>=8.3.0",
+    "Pygments>=2.19.2",
+]
+
+[project.optional-dependencies]
+# Development dependencies
+dev = [
+    "pytest>=8.4.2",
+    "pytest-cov>=7.0.0",
+    "coverage>=7.11.0",
+]
+
+# MCP server dependencies (included by default, but optional)
+mcp = [
+    "mcp>=1.18.0",
+    "httpx>=0.28.1",
+    "httpx-sse>=0.4.3",
+    "uvicorn>=0.38.0",
+    "starlette>=0.48.0",
+    "sse-starlette>=3.0.2",
+]
+
+# All optional dependencies combined
+all = [
+    "pytest>=8.4.2",
+    "pytest-cov>=7.0.0",
+    "coverage>=7.11.0",
+    "mcp>=1.18.0",
+    "httpx>=0.28.1",
+    "httpx-sse>=0.4.3",
+    "uvicorn>=0.38.0",
+    "starlette>=0.48.0",
+    "sse-starlette>=3.0.2",
+]
+
+[project.urls]
+Homepage = "https://github.com/yusufkaraaslan/Skill_Seekers"
+Repository = "https://github.com/yusufkaraaslan/Skill_Seekers"
+"Bug Tracker" = "https://github.com/yusufkaraaslan/Skill_Seekers/issues"
+Documentation = "https://github.com/yusufkaraaslan/Skill_Seekers#readme"
+
+[project.scripts]
+# Main unified CLI
+skill-seekers = "skill_seekers.cli.main:main"
+
+# Individual tool entry points
+skill-seekers-scrape = "skill_seekers.cli.doc_scraper:main"
+skill-seekers-github = "skill_seekers.cli.github_scraper:main"
+skill-seekers-pdf = "skill_seekers.cli.pdf_scraper:main"
+skill-seekers-unified = "skill_seekers.cli.unified_scraper:main"
+skill-seekers-enhance = "skill_seekers.cli.enhance_skill_local:main"
+skill-seekers-package = "skill_seekers.cli.package_skill:main"
+skill-seekers-upload = "skill_seekers.cli.upload_skill:main"
+skill-seekers-estimate = "skill_seekers.cli.estimate_pages:main"
+
+[tool.setuptools]
+packages = ["skill_seekers", "skill_seekers.cli", "skill_seekers.mcp", "skill_seekers.mcp.tools"]
+
+[tool.setuptools.package-dir]
+"" = "src"
+
+[tool.setuptools.package-data]
+skill_seekers = ["py.typed"]
+
+[tool.pytest.ini_options]
+testpaths = ["tests"]
+python_files = ["test_*.py"]
+python_classes = ["Test*"]
+python_functions = ["test_*"]
+addopts = "-v --tb=short --strict-markers"
+
+[tool.coverage.run]
+source = ["src/skill_seekers"]
+omit = ["*/tests/*", "*/__pycache__/*", "*/venv/*"]
+
+[tool.coverage.report]
+exclude_lines = [
+    "pragma: no cover",
+    "def __repr__",
+    "raise AssertionError",
+    "raise NotImplementedError",
+    "if __name__ == .__main__.:",
+    "if TYPE_CHECKING:",
+    "@abstractmethod",
+]
+
+[tool.uv]
+dev-dependencies = [
+    "pytest>=8.4.2",
+    "pytest-cov>=7.0.0",
+    "coverage>=7.11.0",
+]
+
+[tool.uv.sources]
+# Use PyPI for all dependencies

+ 42 - 0
libs/external/Skill_Seekers-development/requirements.txt

@@ -0,0 +1,42 @@
+annotated-types==0.7.0
+anyio==4.11.0
+attrs==25.4.0
+beautifulsoup4==4.14.2
+certifi==2025.10.5
+charset-normalizer==3.4.4
+click==8.3.0
+coverage==7.11.0
+h11==0.16.0
+httpcore==1.0.9
+httpx==0.28.1
+httpx-sse==0.4.3
+idna==3.11
+iniconfig==2.3.0
+jsonschema==4.25.1
+jsonschema-specifications==2025.9.1
+mcp==1.18.0
+packaging==25.0
+pluggy==1.6.0
+pydantic==2.12.3
+pydantic-settings==2.11.0
+pydantic_core==2.41.4
+PyGithub==2.5.0
+Pygments==2.19.2
+PyMuPDF==1.24.14
+Pillow==11.0.0
+pytesseract==0.3.13
+pytest==8.4.2
+pytest-cov==7.0.0
+python-dotenv==1.1.1
+python-multipart==0.0.20
+referencing==0.37.0
+requests==2.32.5
+rpds-py==0.27.1
+sniffio==1.3.1
+soupsieve==2.8
+sse-starlette==3.0.2
+starlette==0.48.0
+typing-inspection==0.4.2
+typing_extensions==4.15.0
+urllib3==2.5.0
+uvicorn==0.38.0

+ 266 - 0
libs/external/Skill_Seekers-development/setup_mcp.sh

@@ -0,0 +1,266 @@
+#!/bin/bash
+# Skill Seeker MCP Server - Quick Setup Script
+# This script automates the MCP server setup for Claude Code
+
+set -e  # Exit on error
+
+echo "=================================================="
+echo "Skill Seeker MCP Server - Quick Setup"
+echo "=================================================="
+echo ""
+
+# Colors for output
+GREEN='\033[0;32m'
+YELLOW='\033[1;33m'
+RED='\033[0;31m'
+NC='\033[0m' # No Color
+
+# Step 1: Check Python version
+echo "Step 1: Checking Python version..."
+if ! command -v python3 &> /dev/null; then
+    echo -e "${RED}❌ Error: python3 not found${NC}"
+    echo "Please install Python 3.7 or higher"
+    exit 1
+fi
+
+PYTHON_VERSION=$(python3 --version | cut -d' ' -f2)
+echo -e "${GREEN}✓${NC} Python $PYTHON_VERSION found"
+echo ""
+
+# Step 2: Get repository path
+REPO_PATH=$(pwd)
+echo "Step 2: Repository location"
+echo "Path: $REPO_PATH"
+echo ""
+
+# Step 3: Install dependencies
+echo "Step 3: Installing Python dependencies..."
+
+# Check if we're in a virtual environment
+if [[ -n "$VIRTUAL_ENV" ]]; then
+    echo -e "${GREEN}✓${NC} Virtual environment detected: $VIRTUAL_ENV"
+    PIP_INSTALL_CMD="pip install"
+elif [[ -d "venv" ]]; then
+    echo -e "${YELLOW}⚠${NC} Virtual environment found but not activated"
+    echo "Activating venv..."
+    source venv/bin/activate
+    PIP_INSTALL_CMD="pip install"
+else
+    echo -e "${YELLOW}⚠${NC} No virtual environment found"
+    echo "It's recommended to use a virtual environment to avoid conflicts."
+    echo ""
+    read -p "Would you like to create one now? (y/n) " -n 1 -r
+    echo ""
+
+    if [[ $REPLY =~ ^[Yy]$ ]]; then
+        echo "Creating virtual environment..."
+        python3 -m venv venv || {
+            echo -e "${RED}❌ Failed to create virtual environment${NC}"
+            echo "Falling back to system install..."
+            PIP_INSTALL_CMD="pip3 install --user --break-system-packages"
+        }
+
+        if [[ -d "venv" ]]; then
+            source venv/bin/activate
+            PIP_INSTALL_CMD="pip install"
+            echo -e "${GREEN}✓${NC} Virtual environment created and activated"
+        fi
+    else
+        echo "Proceeding with system install (using --user --break-system-packages)..."
+        echo -e "${YELLOW}Note:${NC} This may override system-managed packages"
+        PIP_INSTALL_CMD="pip3 install --user --break-system-packages"
+    fi
+fi
+
+echo "This will install: mcp, requests, beautifulsoup4"
+read -p "Continue? (y/n) " -n 1 -r
+echo ""
+
+if [[ $REPLY =~ ^[Yy]$ ]]; then
+    echo "Installing package in editable mode..."
+    $PIP_INSTALL_CMD -e . || {
+        echo -e "${RED}❌ Failed to install package${NC}"
+        exit 1
+    }
+
+    echo -e "${GREEN}✓${NC} Dependencies installed successfully"
+else
+    echo "Skipping dependency installation"
+fi
+echo ""
+
+# Step 4: Test MCP server
+echo "Step 4: Testing MCP server..."
+timeout 3 python3 src/skill_seekers/mcp/server.py 2>/dev/null || {
+    if [ $? -eq 124 ]; then
+        echo -e "${GREEN}✓${NC} MCP server starts correctly (timeout expected)"
+    else
+        echo -e "${YELLOW}⚠${NC} MCP server test inconclusive, but may still work"
+    fi
+}
+echo ""
+
+# Step 5: Optional - Run tests
+echo "Step 5: Run test suite? (optional)"
+read -p "Run MCP tests to verify everything works? (y/n) " -n 1 -r
+echo ""
+
+if [[ $REPLY =~ ^[Yy]$ ]]; then
+    # Check if pytest is installed
+    if ! command -v pytest &> /dev/null; then
+        echo "Installing pytest..."
+        $PIP_INSTALL_CMD pytest || {
+            echo -e "${YELLOW}⚠${NC} Could not install pytest, skipping tests"
+        }
+    fi
+
+    if command -v pytest &> /dev/null; then
+        echo "Running MCP server tests..."
+        python3 -m pytest tests/test_mcp_server.py -v --tb=short || {
+            echo -e "${RED}❌ Some tests failed${NC}"
+            echo "The server may still work, but please check the errors above"
+        }
+    fi
+else
+    echo "Skipping tests"
+fi
+echo ""
+
+# Step 6: Configure Claude Code
+echo "Step 6: Configure Claude Code"
+echo "=================================================="
+echo ""
+echo "You need to add this configuration to Claude Code:"
+echo ""
+echo -e "${YELLOW}Configuration file:${NC} ~/.config/claude-code/mcp.json"
+echo ""
+echo "Add this JSON configuration (paths are auto-detected for YOUR system):"
+echo ""
+echo -e "${GREEN}{"
+echo "  \"mcpServers\": {"
+echo "    \"skill-seeker\": {"
+echo "      \"command\": \"python3\","
+echo "      \"args\": ["
+echo "        \"$REPO_PATH/src/skill_seekers/mcp/server.py\""
+echo "      ],"
+echo "      \"cwd\": \"$REPO_PATH\""
+echo "    }"
+echo "  }"
+echo -e "}${NC}"
+echo ""
+echo -e "${YELLOW}Note:${NC} The paths above are YOUR actual paths (not placeholders!)"
+echo ""
+
+# Ask if user wants auto-configure
+echo ""
+read -p "Auto-configure Claude Code now? (y/n) " -n 1 -r
+echo ""
+
+if [[ $REPLY =~ ^[Yy]$ ]]; then
+    # Check if config already exists
+    if [ -f ~/.config/claude-code/mcp.json ]; then
+        echo -e "${YELLOW}⚠ Warning: ~/.config/claude-code/mcp.json already exists${NC}"
+        echo "Current contents:"
+        cat ~/.config/claude-code/mcp.json
+        echo ""
+        read -p "Overwrite? (y/n) " -n 1 -r
+        echo ""
+        if [[ ! $REPLY =~ ^[Yy]$ ]]; then
+            echo "Skipping auto-configuration"
+            echo "Please manually add the skill-seeker server to your config"
+            exit 0
+        fi
+    fi
+
+    # Create config directory
+    mkdir -p ~/.config/claude-code
+
+    # Write configuration with actual expanded path
+    cat > ~/.config/claude-code/mcp.json << EOF
+{
+  "mcpServers": {
+    "skill-seeker": {
+      "command": "python3",
+      "args": [
+        "$REPO_PATH/src/skill_seekers/mcp/server.py"
+      ],
+      "cwd": "$REPO_PATH"
+    }
+  }
+}
+EOF
+
+    echo -e "${GREEN}✓${NC} Configuration written to ~/.config/claude-code/mcp.json"
+    echo ""
+    echo "Configuration contents:"
+    cat ~/.config/claude-code/mcp.json
+    echo ""
+
+    # Verify the path exists
+    if [ -f "$REPO_PATH/src/skill_seekers/mcp/server.py" ]; then
+        echo -e "${GREEN}✓${NC} Verified: MCP server file exists at $REPO_PATH/src/skill_seekers/mcp/server.py"
+    else
+        echo -e "${RED}❌ Warning: MCP server not found at $REPO_PATH/src/skill_seekers/mcp/server.py${NC}"
+        echo "Please check the path!"
+    fi
+else
+    echo "Skipping auto-configuration"
+    echo "Please manually configure Claude Code using the JSON above"
+    echo ""
+    echo "IMPORTANT: Replace \$REPO_PATH with the actual path: $REPO_PATH"
+fi
+echo ""
+
+# Step 7: Test the configuration
+if [ -f ~/.config/claude-code/mcp.json ]; then
+    echo "Step 7: Testing MCP configuration..."
+    echo "Checking if paths are correct..."
+
+    # Extract the configured path
+    if command -v jq &> /dev/null; then
+        CONFIGURED_PATH=$(jq -r '.mcpServers["skill-seeker"].args[0]' ~/.config/claude-code/mcp.json 2>/dev/null || echo "")
+        if [ -n "$CONFIGURED_PATH" ] && [ -f "$CONFIGURED_PATH" ]; then
+            echo -e "${GREEN}✓${NC} MCP server path is valid: $CONFIGURED_PATH"
+        elif [ -n "$CONFIGURED_PATH" ]; then
+            echo -e "${YELLOW}⚠${NC} Warning: Configured path doesn't exist: $CONFIGURED_PATH"
+        fi
+    else
+        echo "Install 'jq' for config validation: brew install jq (macOS) or apt install jq (Linux)"
+    fi
+fi
+echo ""
+
+# Step 8: Final instructions
+echo "=================================================="
+echo "Setup Complete!"
+echo "=================================================="
+echo ""
+echo "Next steps:"
+echo ""
+echo "  1. ${YELLOW}Restart Claude Code${NC} (quit and reopen, don't just close window)"
+echo "  2. In Claude Code, test with: ${GREEN}\"List all available configs\"${NC}"
+echo "  3. You should see 9 Skill Seeker tools available"
+echo ""
+echo "Available MCP Tools:"
+echo "  • generate_config   - Create new config files"
+echo "  • estimate_pages    - Estimate scraping time"
+echo "  • scrape_docs       - Scrape documentation"
+echo "  • package_skill     - Create .zip files"
+echo "  • list_configs      - Show available configs"
+echo "  • validate_config   - Validate config files"
+echo ""
+echo "Example commands to try in Claude Code:"
+echo "  • ${GREEN}List all available configs${NC}"
+echo "  • ${GREEN}Validate configs/react.json${NC}"
+echo "  • ${GREEN}Generate config for Tailwind at https://tailwindcss.com/docs${NC}"
+echo ""
+echo "Documentation:"
+echo "  • MCP Setup Guide: ${YELLOW}docs/MCP_SETUP.md${NC}"
+echo "  • Full docs: ${YELLOW}README.md${NC}"
+echo ""
+echo "Troubleshooting:"
+echo "  • Check logs: ~/Library/Logs/Claude Code/ (macOS)"
+echo "  • Test server: python3 src/skill_seekers/mcp/server.py"
+echo "  • Run tests: python3 -m pytest tests/test_mcp_server.py -v"
+echo ""
+echo "Happy skill creating! 🚀"

+ 22 - 0
libs/external/Skill_Seekers-development/src/skill_seekers/__init__.py

@@ -0,0 +1,22 @@
+"""
+Skill Seekers - Convert documentation, GitHub repos, and PDFs into Claude AI skills.
+
+This package provides tools for automatically scraping, organizing, and packaging
+documentation from various sources into uploadable Claude AI skills.
+"""
+
+__version__ = "2.0.0"
+__author__ = "Yusuf Karaaslan"
+__license__ = "MIT"
+
+# Expose main components for easier imports
+from skill_seekers.cli import __version__ as cli_version
+from skill_seekers.mcp import __version__ as mcp_version
+
+__all__ = [
+    "__version__",
+    "__author__",
+    "__license__",
+    "cli_version",
+    "mcp_version",
+]

+ 39 - 0
libs/external/Skill_Seekers-development/src/skill_seekers/cli/__init__.py

@@ -0,0 +1,39 @@
+"""Skill Seekers CLI tools package.
+
+This package provides command-line tools for converting documentation
+websites into Claude AI skills.
+
+Main modules:
+    - doc_scraper: Main documentation scraping and skill building tool
+    - llms_txt_detector: Detect llms.txt files at documentation URLs
+    - llms_txt_downloader: Download llms.txt content
+    - llms_txt_parser: Parse llms.txt markdown content
+    - pdf_scraper: Extract documentation from PDF files
+    - enhance_skill: AI-powered skill enhancement (API-based)
+    - enhance_skill_local: AI-powered skill enhancement (local)
+    - estimate_pages: Estimate page count before scraping
+    - package_skill: Package skills into .zip files
+    - upload_skill: Upload skills to Claude
+    - utils: Shared utility functions
+"""
+
+from .llms_txt_detector import LlmsTxtDetector
+from .llms_txt_downloader import LlmsTxtDownloader
+from .llms_txt_parser import LlmsTxtParser
+
+try:
+    from .utils import open_folder, read_reference_files
+except ImportError:
+    # utils.py might not exist in all configurations
+    open_folder = None
+    read_reference_files = None
+
+__version__ = "2.0.0"
+
+__all__ = [
+    "LlmsTxtDetector",
+    "LlmsTxtDownloader",
+    "LlmsTxtParser",
+    "open_folder",
+    "read_reference_files",
+]

+ 500 - 0
libs/external/Skill_Seekers-development/src/skill_seekers/cli/code_analyzer.py

@@ -0,0 +1,500 @@
+#!/usr/bin/env python3
+"""
+Code Analyzer for GitHub Repositories
+
+Extracts code signatures at configurable depth levels:
+- surface: File tree only (existing behavior)
+- deep: Parse files for signatures, parameters, types
+- full: Complete AST analysis (future enhancement)
+
+Supports multiple languages with language-specific parsers.
+"""
+
+import ast
+import re
+import logging
+from typing import Dict, List, Any, Optional
+from dataclasses import dataclass, asdict
+
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class Parameter:
+    """Represents a function parameter."""
+    name: str
+    type_hint: Optional[str] = None
+    default: Optional[str] = None
+
+
+@dataclass
+class FunctionSignature:
+    """Represents a function/method signature."""
+    name: str
+    parameters: List[Parameter]
+    return_type: Optional[str] = None
+    docstring: Optional[str] = None
+    line_number: Optional[int] = None
+    is_async: bool = False
+    is_method: bool = False
+    decorators: List[str] = None
+
+    def __post_init__(self):
+        if self.decorators is None:
+            self.decorators = []
+
+
+@dataclass
+class ClassSignature:
+    """Represents a class signature."""
+    name: str
+    base_classes: List[str]
+    methods: List[FunctionSignature]
+    docstring: Optional[str] = None
+    line_number: Optional[int] = None
+
+
+class CodeAnalyzer:
+    """
+    Analyzes code at different depth levels.
+    """
+
+    def __init__(self, depth: str = 'surface'):
+        """
+        Initialize code analyzer.
+
+        Args:
+            depth: Analysis depth ('surface', 'deep', 'full')
+        """
+        self.depth = depth
+
+    def analyze_file(self, file_path: str, content: str, language: str) -> Dict[str, Any]:
+        """
+        Analyze a single file based on depth level.
+
+        Args:
+            file_path: Path to file in repository
+            content: File content as string
+            language: Programming language (Python, JavaScript, etc.)
+
+        Returns:
+            Dict containing extracted signatures
+        """
+        if self.depth == 'surface':
+            return {}  # Surface level doesn't analyze individual files
+
+        logger.debug(f"Analyzing {file_path} (language: {language}, depth: {self.depth})")
+
+        try:
+            if language == 'Python':
+                return self._analyze_python(content, file_path)
+            elif language in ['JavaScript', 'TypeScript']:
+                return self._analyze_javascript(content, file_path)
+            elif language in ['C', 'C++']:
+                return self._analyze_cpp(content, file_path)
+            else:
+                logger.debug(f"No analyzer for language: {language}")
+                return {}
+        except Exception as e:
+            logger.warning(f"Error analyzing {file_path}: {e}")
+            return {}
+
+    def _analyze_python(self, content: str, file_path: str) -> Dict[str, Any]:
+        """Analyze Python file using AST."""
+        try:
+            tree = ast.parse(content)
+        except SyntaxError as e:
+            logger.debug(f"Syntax error in {file_path}: {e}")
+            return {}
+
+        classes = []
+        functions = []
+
+        for node in ast.walk(tree):
+            if isinstance(node, ast.ClassDef):
+                class_sig = self._extract_python_class(node)
+                classes.append(asdict(class_sig))
+            elif isinstance(node, ast.FunctionDef) or isinstance(node, ast.AsyncFunctionDef):
+                # Only top-level functions (not methods)
+                # Fix AST parser to check isinstance(parent.body, list) before 'in' operator
+                is_method = False
+                try:
+                    is_method = any(isinstance(parent, ast.ClassDef)
+                                  for parent in ast.walk(tree)
+                                  if hasattr(parent, 'body') and isinstance(parent.body, list) and node in parent.body)
+                except (TypeError, AttributeError):
+                    # If body is not iterable or check fails, assume it's a top-level function
+                    is_method = False
+
+                if not is_method:
+                    func_sig = self._extract_python_function(node)
+                    functions.append(asdict(func_sig))
+
+        return {
+            'classes': classes,
+            'functions': functions
+        }
+
+    def _extract_python_class(self, node: ast.ClassDef) -> ClassSignature:
+        """Extract class signature from AST node."""
+        # Extract base classes
+        bases = []
+        for base in node.bases:
+            if isinstance(base, ast.Name):
+                bases.append(base.id)
+            elif isinstance(base, ast.Attribute):
+                bases.append(f"{base.value.id}.{base.attr}" if hasattr(base.value, 'id') else base.attr)
+
+        # Extract methods
+        methods = []
+        for item in node.body:
+            if isinstance(item, (ast.FunctionDef, ast.AsyncFunctionDef)):
+                method_sig = self._extract_python_function(item, is_method=True)
+                methods.append(method_sig)
+
+        # Extract docstring
+        docstring = ast.get_docstring(node)
+
+        return ClassSignature(
+            name=node.name,
+            base_classes=bases,
+            methods=methods,
+            docstring=docstring,
+            line_number=node.lineno
+        )
+
+    def _extract_python_function(self, node, is_method: bool = False) -> FunctionSignature:
+        """Extract function signature from AST node."""
+        # Extract parameters
+        params = []
+        for arg in node.args.args:
+            param_type = None
+            if arg.annotation:
+                param_type = ast.unparse(arg.annotation) if hasattr(ast, 'unparse') else None
+
+            params.append(Parameter(
+                name=arg.arg,
+                type_hint=param_type
+            ))
+
+        # Extract defaults
+        defaults = node.args.defaults
+        if defaults:
+            # Defaults are aligned to the end of params
+            num_no_default = len(params) - len(defaults)
+            for i, default in enumerate(defaults):
+                param_idx = num_no_default + i
+                if param_idx < len(params):
+                    try:
+                        params[param_idx].default = ast.unparse(default) if hasattr(ast, 'unparse') else str(default)
+                    except:
+                        params[param_idx].default = "..."
+
+        # Extract return type
+        return_type = None
+        if node.returns:
+            try:
+                return_type = ast.unparse(node.returns) if hasattr(ast, 'unparse') else None
+            except:
+                pass
+
+        # Extract decorators
+        decorators = []
+        for decorator in node.decorator_list:
+            try:
+                if hasattr(ast, 'unparse'):
+                    decorators.append(ast.unparse(decorator))
+                elif isinstance(decorator, ast.Name):
+                    decorators.append(decorator.id)
+            except:
+                pass
+
+        # Extract docstring
+        docstring = ast.get_docstring(node)
+
+        return FunctionSignature(
+            name=node.name,
+            parameters=params,
+            return_type=return_type,
+            docstring=docstring,
+            line_number=node.lineno,
+            is_async=isinstance(node, ast.AsyncFunctionDef),
+            is_method=is_method,
+            decorators=decorators
+        )
+
+    def _analyze_javascript(self, content: str, file_path: str) -> Dict[str, Any]:
+        """
+        Analyze JavaScript/TypeScript file using regex patterns.
+
+        Note: This is a simplified approach. For production, consider using
+        a proper JS/TS parser like esprima or ts-morph.
+        """
+        classes = []
+        functions = []
+
+        # Extract class definitions
+        class_pattern = r'class\s+(\w+)(?:\s+extends\s+(\w+))?\s*\{'
+        for match in re.finditer(class_pattern, content):
+            class_name = match.group(1)
+            base_class = match.group(2) if match.group(2) else None
+
+            # Try to extract methods (simplified)
+            class_block_start = match.end()
+            # This is a simplification - proper parsing would track braces
+            class_block_end = content.find('}', class_block_start)
+            if class_block_end != -1:
+                class_body = content[class_block_start:class_block_end]
+                methods = self._extract_js_methods(class_body)
+            else:
+                methods = []
+
+            classes.append({
+                'name': class_name,
+                'base_classes': [base_class] if base_class else [],
+                'methods': methods,
+                'docstring': None,
+                'line_number': content[:match.start()].count('\n') + 1
+            })
+
+        # Extract top-level functions
+        func_pattern = r'(?:async\s+)?function\s+(\w+)\s*\(([^)]*)\)'
+        for match in re.finditer(func_pattern, content):
+            func_name = match.group(1)
+            params_str = match.group(2)
+            is_async = 'async' in match.group(0)
+
+            params = self._parse_js_parameters(params_str)
+
+            functions.append({
+                'name': func_name,
+                'parameters': params,
+                'return_type': None,  # JS doesn't have type annotations (unless TS)
+                'docstring': None,
+                'line_number': content[:match.start()].count('\n') + 1,
+                'is_async': is_async,
+                'is_method': False,
+                'decorators': []
+            })
+
+        # Extract arrow functions assigned to const/let
+        arrow_pattern = r'(?:const|let|var)\s+(\w+)\s*=\s*(?:async\s+)?\(([^)]*)\)\s*=>'
+        for match in re.finditer(arrow_pattern, content):
+            func_name = match.group(1)
+            params_str = match.group(2)
+            is_async = 'async' in match.group(0)
+
+            params = self._parse_js_parameters(params_str)
+
+            functions.append({
+                'name': func_name,
+                'parameters': params,
+                'return_type': None,
+                'docstring': None,
+                'line_number': content[:match.start()].count('\n') + 1,
+                'is_async': is_async,
+                'is_method': False,
+                'decorators': []
+            })
+
+        return {
+            'classes': classes,
+            'functions': functions
+        }
+
+    def _extract_js_methods(self, class_body: str) -> List[Dict]:
+        """Extract method signatures from class body."""
+        methods = []
+
+        # Match method definitions
+        method_pattern = r'(?:async\s+)?(\w+)\s*\(([^)]*)\)'
+        for match in re.finditer(method_pattern, class_body):
+            method_name = match.group(1)
+            params_str = match.group(2)
+            is_async = 'async' in match.group(0)
+
+            # Skip constructor keyword detection
+            if method_name in ['if', 'for', 'while', 'switch']:
+                continue
+
+            params = self._parse_js_parameters(params_str)
+
+            methods.append({
+                'name': method_name,
+                'parameters': params,
+                'return_type': None,
+                'docstring': None,
+                'line_number': None,
+                'is_async': is_async,
+                'is_method': True,
+                'decorators': []
+            })
+
+        return methods
+
+    def _parse_js_parameters(self, params_str: str) -> List[Dict]:
+        """Parse JavaScript parameter string."""
+        params = []
+
+        if not params_str.strip():
+            return params
+
+        # Split by comma (simplified - doesn't handle complex default values)
+        param_list = [p.strip() for p in params_str.split(',')]
+
+        for param in param_list:
+            if not param:
+                continue
+
+            # Check for default value
+            if '=' in param:
+                name, default = param.split('=', 1)
+                name = name.strip()
+                default = default.strip()
+            else:
+                name = param
+                default = None
+
+            # Check for type annotation (TypeScript)
+            type_hint = None
+            if ':' in name:
+                name, type_hint = name.split(':', 1)
+                name = name.strip()
+                type_hint = type_hint.strip()
+
+            params.append({
+                'name': name,
+                'type_hint': type_hint,
+                'default': default
+            })
+
+        return params
+
+    def _analyze_cpp(self, content: str, file_path: str) -> Dict[str, Any]:
+        """
+        Analyze C/C++ header file using regex patterns.
+
+        Note: This is a simplified approach focusing on header files.
+        For production, consider using libclang or similar.
+        """
+        classes = []
+        functions = []
+
+        # Extract class definitions (simplified - doesn't handle nested classes)
+        class_pattern = r'class\s+(\w+)(?:\s*:\s*public\s+(\w+))?\s*\{'
+        for match in re.finditer(class_pattern, content):
+            class_name = match.group(1)
+            base_class = match.group(2) if match.group(2) else None
+
+            classes.append({
+                'name': class_name,
+                'base_classes': [base_class] if base_class else [],
+                'methods': [],  # Simplified - would need to parse class body
+                'docstring': None,
+                'line_number': content[:match.start()].count('\n') + 1
+            })
+
+        # Extract function declarations
+        func_pattern = r'(\w+(?:\s*\*|\s*&)?)\s+(\w+)\s*\(([^)]*)\)'
+        for match in re.finditer(func_pattern, content):
+            return_type = match.group(1).strip()
+            func_name = match.group(2)
+            params_str = match.group(3)
+
+            # Skip common keywords
+            if func_name in ['if', 'for', 'while', 'switch', 'return']:
+                continue
+
+            params = self._parse_cpp_parameters(params_str)
+
+            functions.append({
+                'name': func_name,
+                'parameters': params,
+                'return_type': return_type,
+                'docstring': None,
+                'line_number': content[:match.start()].count('\n') + 1,
+                'is_async': False,
+                'is_method': False,
+                'decorators': []
+            })
+
+        return {
+            'classes': classes,
+            'functions': functions
+        }
+
+    def _parse_cpp_parameters(self, params_str: str) -> List[Dict]:
+        """Parse C++ parameter string."""
+        params = []
+
+        if not params_str.strip() or params_str.strip() == 'void':
+            return params
+
+        # Split by comma (simplified)
+        param_list = [p.strip() for p in params_str.split(',')]
+
+        for param in param_list:
+            if not param:
+                continue
+
+            # Check for default value
+            default = None
+            if '=' in param:
+                param, default = param.rsplit('=', 1)
+                param = param.strip()
+                default = default.strip()
+
+            # Extract type and name (simplified)
+            # Format: "type name" or "type* name" or "type& name"
+            parts = param.split()
+            if len(parts) >= 2:
+                param_type = ' '.join(parts[:-1])
+                param_name = parts[-1]
+            else:
+                param_type = param
+                param_name = "unknown"
+
+            params.append({
+                'name': param_name,
+                'type_hint': param_type,
+                'default': default
+            })
+
+        return params
+
+
+if __name__ == '__main__':
+    # Test the analyzer
+    python_code = '''
+class Node2D:
+    """Base class for 2D nodes."""
+
+    def move_local_x(self, delta: float, snap: bool = False) -> None:
+        """Move node along local X axis."""
+        pass
+
+    async def tween_position(self, target: tuple, duration: float = 1.0):
+        """Animate position to target."""
+        pass
+
+def create_sprite(texture: str) -> Node2D:
+    """Create a new sprite node."""
+    return Node2D()
+'''
+
+    analyzer = CodeAnalyzer(depth='deep')
+    result = analyzer.analyze_file('test.py', python_code, 'Python')
+
+    print("Analysis Result:")
+    print(f"Classes: {len(result.get('classes', []))}")
+    print(f"Functions: {len(result.get('functions', []))}")
+
+    if result.get('classes'):
+        cls = result['classes'][0]
+        print(f"\nClass: {cls['name']}")
+        print(f"  Methods: {len(cls['methods'])}")
+        for method in cls['methods']:
+            params = ', '.join([f"{p['name']}: {p['type_hint']}" + (f" = {p['default']}" if p.get('default') else "")
+                               for p in method['parameters']])
+            print(f"    {method['name']}({params}) -> {method['return_type']}")

+ 376 - 0
libs/external/Skill_Seekers-development/src/skill_seekers/cli/config_validator.py

@@ -0,0 +1,376 @@
+#!/usr/bin/env python3
+"""
+Unified Config Validator
+
+Validates unified config format that supports multiple sources:
+- documentation (website scraping)
+- github (repository scraping)
+- pdf (PDF document scraping)
+
+Also provides backward compatibility detection for legacy configs.
+"""
+
+import json
+import logging
+from typing import Dict, Any, List, Optional, Union
+from pathlib import Path
+
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+
+class ConfigValidator:
+    """
+    Validates unified config format and provides backward compatibility.
+    """
+
+    # Valid source types
+    VALID_SOURCE_TYPES = {'documentation', 'github', 'pdf'}
+
+    # Valid merge modes
+    VALID_MERGE_MODES = {'rule-based', 'claude-enhanced'}
+
+    # Valid code analysis depth levels
+    VALID_DEPTH_LEVELS = {'surface', 'deep', 'full'}
+
+    def __init__(self, config_or_path: Union[Dict[str, Any], str]):
+        """
+        Initialize validator with config dict or file path.
+
+        Args:
+            config_or_path: Either a config dict or path to config JSON file
+        """
+        if isinstance(config_or_path, dict):
+            self.config_path = None
+            self.config = config_or_path
+        else:
+            self.config_path = config_or_path
+            self.config = self._load_config()
+        self.is_unified = self._detect_format()
+
+    def _load_config(self) -> Dict[str, Any]:
+        """Load JSON config file."""
+        try:
+            with open(self.config_path, 'r', encoding='utf-8') as f:
+                return json.load(f)
+        except FileNotFoundError:
+            raise ValueError(f"Config file not found: {self.config_path}")
+        except json.JSONDecodeError as e:
+            raise ValueError(f"Invalid JSON in config file: {e}")
+
+    def _detect_format(self) -> bool:
+        """
+        Detect if config is unified format or legacy.
+
+        Returns:
+            True if unified format (has 'sources' array)
+            False if legacy format
+        """
+        return 'sources' in self.config and isinstance(self.config['sources'], list)
+
+    def validate(self) -> bool:
+        """
+        Validate config based on detected format.
+
+        Returns:
+            True if valid
+
+        Raises:
+            ValueError if invalid with detailed error message
+        """
+        if self.is_unified:
+            return self._validate_unified()
+        else:
+            return self._validate_legacy()
+
+    def _validate_unified(self) -> bool:
+        """Validate unified config format."""
+        logger.info("Validating unified config format...")
+
+        # Required top-level fields
+        if 'name' not in self.config:
+            raise ValueError("Missing required field: 'name'")
+
+        if 'description' not in self.config:
+            raise ValueError("Missing required field: 'description'")
+
+        if 'sources' not in self.config:
+            raise ValueError("Missing required field: 'sources'")
+
+        # Validate sources array
+        sources = self.config['sources']
+
+        if not isinstance(sources, list):
+            raise ValueError("'sources' must be an array")
+
+        if len(sources) == 0:
+            raise ValueError("'sources' array cannot be empty")
+
+        # Validate merge_mode (optional)
+        merge_mode = self.config.get('merge_mode', 'rule-based')
+        if merge_mode not in self.VALID_MERGE_MODES:
+            raise ValueError(f"Invalid merge_mode: '{merge_mode}'. Must be one of {self.VALID_MERGE_MODES}")
+
+        # Validate each source
+        for i, source in enumerate(sources):
+            self._validate_source(source, i)
+
+        logger.info(f"✅ Unified config valid: {len(sources)} sources")
+        return True
+
+    def _validate_source(self, source: Dict[str, Any], index: int):
+        """Validate individual source configuration."""
+        # Check source has 'type' field
+        if 'type' not in source:
+            raise ValueError(f"Source {index}: Missing required field 'type'")
+
+        source_type = source['type']
+
+        if source_type not in self.VALID_SOURCE_TYPES:
+            raise ValueError(
+                f"Source {index}: Invalid type '{source_type}'. "
+                f"Must be one of {self.VALID_SOURCE_TYPES}"
+            )
+
+        # Type-specific validation
+        if source_type == 'documentation':
+            self._validate_documentation_source(source, index)
+        elif source_type == 'github':
+            self._validate_github_source(source, index)
+        elif source_type == 'pdf':
+            self._validate_pdf_source(source, index)
+
+    def _validate_documentation_source(self, source: Dict[str, Any], index: int):
+        """Validate documentation source configuration."""
+        if 'base_url' not in source:
+            raise ValueError(f"Source {index} (documentation): Missing required field 'base_url'")
+
+        # Optional but recommended fields
+        if 'selectors' not in source:
+            logger.warning(f"Source {index} (documentation): No 'selectors' specified, using defaults")
+
+        if 'max_pages' in source and not isinstance(source['max_pages'], int):
+            raise ValueError(f"Source {index} (documentation): 'max_pages' must be an integer")
+
+    def _validate_github_source(self, source: Dict[str, Any], index: int):
+        """Validate GitHub source configuration."""
+        if 'repo' not in source:
+            raise ValueError(f"Source {index} (github): Missing required field 'repo'")
+
+        # Validate repo format (owner/repo)
+        repo = source['repo']
+        if '/' not in repo:
+            raise ValueError(
+                f"Source {index} (github): Invalid repo format '{repo}'. "
+                f"Must be 'owner/repo' (e.g., 'facebook/react')"
+            )
+
+        # Validate code_analysis_depth if specified
+        if 'code_analysis_depth' in source:
+            depth = source['code_analysis_depth']
+            if depth not in self.VALID_DEPTH_LEVELS:
+                raise ValueError(
+                    f"Source {index} (github): Invalid code_analysis_depth '{depth}'. "
+                    f"Must be one of {self.VALID_DEPTH_LEVELS}"
+                )
+
+        # Validate max_issues if specified
+        if 'max_issues' in source and not isinstance(source['max_issues'], int):
+            raise ValueError(f"Source {index} (github): 'max_issues' must be an integer")
+
+    def _validate_pdf_source(self, source: Dict[str, Any], index: int):
+        """Validate PDF source configuration."""
+        if 'path' not in source:
+            raise ValueError(f"Source {index} (pdf): Missing required field 'path'")
+
+        # Check if file exists
+        pdf_path = source['path']
+        if not Path(pdf_path).exists():
+            logger.warning(f"Source {index} (pdf): File not found: {pdf_path}")
+
+    def _validate_legacy(self) -> bool:
+        """
+        Validate legacy config format (backward compatibility).
+
+        Legacy configs are the old format used by doc_scraper, github_scraper, pdf_scraper.
+        """
+        logger.info("Detected legacy config format (backward compatible)")
+
+        # Detect which legacy type based on fields
+        if 'base_url' in self.config:
+            logger.info("Legacy type: documentation")
+        elif 'repo' in self.config:
+            logger.info("Legacy type: github")
+        elif 'pdf' in self.config or 'path' in self.config:
+            logger.info("Legacy type: pdf")
+        else:
+            raise ValueError("Cannot detect legacy config type (missing base_url, repo, or pdf)")
+
+        return True
+
+    def convert_legacy_to_unified(self) -> Dict[str, Any]:
+        """
+        Convert legacy config to unified format.
+
+        Returns:
+            Unified config dict
+        """
+        if self.is_unified:
+            logger.info("Config already in unified format")
+            return self.config
+
+        logger.info("Converting legacy config to unified format...")
+
+        # Detect legacy type and convert
+        if 'base_url' in self.config:
+            return self._convert_legacy_documentation()
+        elif 'repo' in self.config:
+            return self._convert_legacy_github()
+        elif 'pdf' in self.config or 'path' in self.config:
+            return self._convert_legacy_pdf()
+        else:
+            raise ValueError("Cannot convert: unknown legacy format")
+
+    def _convert_legacy_documentation(self) -> Dict[str, Any]:
+        """Convert legacy documentation config to unified."""
+        unified = {
+            'name': self.config.get('name', 'unnamed'),
+            'description': self.config.get('description', 'Documentation skill'),
+            'merge_mode': 'rule-based',
+            'sources': [
+                {
+                    'type': 'documentation',
+                    **{k: v for k, v in self.config.items()
+                       if k not in ['name', 'description']}
+                }
+            ]
+        }
+        return unified
+
+    def _convert_legacy_github(self) -> Dict[str, Any]:
+        """Convert legacy GitHub config to unified."""
+        unified = {
+            'name': self.config.get('name', 'unnamed'),
+            'description': self.config.get('description', 'GitHub repository skill'),
+            'merge_mode': 'rule-based',
+            'sources': [
+                {
+                    'type': 'github',
+                    **{k: v for k, v in self.config.items()
+                       if k not in ['name', 'description']}
+                }
+            ]
+        }
+        return unified
+
+    def _convert_legacy_pdf(self) -> Dict[str, Any]:
+        """Convert legacy PDF config to unified."""
+        unified = {
+            'name': self.config.get('name', 'unnamed'),
+            'description': self.config.get('description', 'PDF document skill'),
+            'merge_mode': 'rule-based',
+            'sources': [
+                {
+                    'type': 'pdf',
+                    **{k: v for k, v in self.config.items()
+                       if k not in ['name', 'description']}
+                }
+            ]
+        }
+        return unified
+
+    def get_sources_by_type(self, source_type: str) -> List[Dict[str, Any]]:
+        """
+        Get all sources of a specific type.
+
+        Args:
+            source_type: 'documentation', 'github', or 'pdf'
+
+        Returns:
+            List of sources matching the type
+        """
+        if not self.is_unified:
+            # For legacy, convert and get sources
+            unified = self.convert_legacy_to_unified()
+            sources = unified['sources']
+        else:
+            sources = self.config['sources']
+
+        return [s for s in sources if s.get('type') == source_type]
+
+    def has_multiple_sources(self) -> bool:
+        """Check if config has multiple sources (requires merging)."""
+        if not self.is_unified:
+            return False
+        return len(self.config['sources']) > 1
+
+    def needs_api_merge(self) -> bool:
+        """
+        Check if config needs API merging.
+
+        Returns True if both documentation and github sources exist
+        with API extraction enabled.
+        """
+        if not self.has_multiple_sources():
+            return False
+
+        has_docs_api = any(
+            s.get('type') == 'documentation' and s.get('extract_api', True)
+            for s in self.config['sources']
+        )
+
+        has_github_code = any(
+            s.get('type') == 'github' and s.get('include_code', False)
+            for s in self.config['sources']
+        )
+
+        return has_docs_api and has_github_code
+
+
+def validate_config(config_path: str) -> ConfigValidator:
+    """
+    Validate config file and return validator instance.
+
+    Args:
+        config_path: Path to config JSON file
+
+    Returns:
+        ConfigValidator instance
+
+    Raises:
+        ValueError if config is invalid
+    """
+    validator = ConfigValidator(config_path)
+    validator.validate()
+    return validator
+
+
+if __name__ == '__main__':
+    import sys
+
+    if len(sys.argv) < 2:
+        print("Usage: python config_validator.py <config.json>")
+        sys.exit(1)
+
+    config_file = sys.argv[1]
+
+    try:
+        validator = validate_config(config_file)
+
+        print(f"\n✅ Config valid!")
+        print(f"   Format: {'Unified' if validator.is_unified else 'Legacy'}")
+        print(f"   Name: {validator.config.get('name')}")
+
+        if validator.is_unified:
+            sources = validator.config['sources']
+            print(f"   Sources: {len(sources)}")
+            for i, source in enumerate(sources):
+                print(f"     {i+1}. {source['type']}")
+
+            if validator.needs_api_merge():
+                merge_mode = validator.config.get('merge_mode', 'rule-based')
+                print(f"   ⚠️  API merge required (mode: {merge_mode})")
+
+    except ValueError as e:
+        print(f"\n❌ Config invalid: {e}")
+        sys.exit(1)

+ 513 - 0
libs/external/Skill_Seekers-development/src/skill_seekers/cli/conflict_detector.py

@@ -0,0 +1,513 @@
+#!/usr/bin/env python3
+"""
+Conflict Detector for Multi-Source Skills
+
+Detects conflicts between documentation and code:
+- missing_in_docs: API exists in code but not documented
+- missing_in_code: API documented but doesn't exist in code
+- signature_mismatch: Different parameters/types between docs and code
+- description_mismatch: Docs say one thing, code comments say another
+
+Used by unified scraper to identify discrepancies before merging.
+"""
+
+import json
+import logging
+from typing import Dict, List, Any, Optional, Tuple
+from dataclasses import dataclass, asdict
+from difflib import SequenceMatcher
+
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class Conflict:
+    """Represents a conflict between documentation and code."""
+    type: str  # 'missing_in_docs', 'missing_in_code', 'signature_mismatch', 'description_mismatch'
+    severity: str  # 'low', 'medium', 'high'
+    api_name: str
+    docs_info: Optional[Dict[str, Any]] = None
+    code_info: Optional[Dict[str, Any]] = None
+    difference: Optional[str] = None
+    suggestion: Optional[str] = None
+
+
+class ConflictDetector:
+    """
+    Detects conflicts between documentation and code sources.
+    """
+
+    def __init__(self, docs_data: Dict[str, Any], github_data: Dict[str, Any]):
+        """
+        Initialize conflict detector.
+
+        Args:
+            docs_data: Data from documentation scraper
+            github_data: Data from GitHub scraper with code analysis
+        """
+        self.docs_data = docs_data
+        self.github_data = github_data
+
+        # Extract API information from both sources
+        self.docs_apis = self._extract_docs_apis()
+        self.code_apis = self._extract_code_apis()
+
+        logger.info(f"Loaded {len(self.docs_apis)} APIs from documentation")
+        logger.info(f"Loaded {len(self.code_apis)} APIs from code")
+
+    def _extract_docs_apis(self) -> Dict[str, Dict[str, Any]]:
+        """
+        Extract API information from documentation data.
+
+        Returns:
+            Dict mapping API name to API info
+        """
+        apis = {}
+
+        # Documentation structure varies, but typically has 'pages' or 'references'
+        pages = self.docs_data.get('pages', {})
+
+        # Handle both dict and list formats
+        if isinstance(pages, dict):
+            # Format: {url: page_data, ...}
+            for url, page_data in pages.items():
+                content = page_data.get('content', '')
+                title = page_data.get('title', '')
+
+                # Simple heuristic: if title or URL contains "api", "reference", "class", "function"
+                # it might be an API page
+                if any(keyword in title.lower() or keyword in url.lower()
+                       for keyword in ['api', 'reference', 'class', 'function', 'method']):
+
+                    # Extract API signatures from content (simplified)
+                    extracted_apis = self._parse_doc_content_for_apis(content, url)
+                    apis.update(extracted_apis)
+        elif isinstance(pages, list):
+            # Format: [{url: '...', apis: [...]}, ...]
+            for page in pages:
+                url = page.get('url', '')
+                page_apis = page.get('apis', [])
+
+                # If APIs are already extracted in the page data
+                for api in page_apis:
+                    api_name = api.get('name', '')
+                    if api_name:
+                        apis[api_name] = {
+                            'parameters': api.get('parameters', []),
+                            'return_type': api.get('return_type', 'Any'),
+                            'source_url': url
+                        }
+
+        return apis
+
+    def _parse_doc_content_for_apis(self, content: str, source_url: str) -> Dict[str, Dict]:
+        """
+        Parse documentation content to extract API signatures.
+
+        This is a simplified approach - real implementation would need
+        to understand the documentation format (Sphinx, JSDoc, etc.)
+        """
+        apis = {}
+
+        # Look for function/method signatures in code blocks
+        # Common patterns:
+        # - function_name(param1, param2)
+        # - ClassName.method_name(param1, param2)
+        # - def function_name(param1: type, param2: type) -> return_type
+
+        import re
+
+        # Pattern for common API signatures
+        patterns = [
+            # Python style: def name(params) -> return
+            r'def\s+(\w+)\s*\(([^)]*)\)(?:\s*->\s*(\w+))?',
+            # JavaScript style: function name(params)
+            r'function\s+(\w+)\s*\(([^)]*)\)',
+            # C++ style: return_type name(params)
+            r'(\w+)\s+(\w+)\s*\(([^)]*)\)',
+            # Method style: ClassName.method_name(params)
+            r'(\w+)\.(\w+)\s*\(([^)]*)\)'
+        ]
+
+        for pattern in patterns:
+            for match in re.finditer(pattern, content):
+                groups = match.groups()
+
+                # Parse based on pattern matched
+                if 'def' in pattern:
+                    # Python function
+                    name = groups[0]
+                    params_str = groups[1]
+                    return_type = groups[2] if len(groups) > 2 else None
+                elif 'function' in pattern:
+                    # JavaScript function
+                    name = groups[0]
+                    params_str = groups[1]
+                    return_type = None
+                elif '.' in pattern:
+                    # Class method
+                    class_name = groups[0]
+                    method_name = groups[1]
+                    name = f"{class_name}.{method_name}"
+                    params_str = groups[2] if len(groups) > 2 else groups[1]
+                    return_type = None
+                else:
+                    # C++ function
+                    return_type = groups[0]
+                    name = groups[1]
+                    params_str = groups[2]
+
+                # Parse parameters
+                params = self._parse_param_string(params_str)
+
+                apis[name] = {
+                    'name': name,
+                    'parameters': params,
+                    'return_type': return_type,
+                    'source': source_url,
+                    'raw_signature': match.group(0)
+                }
+
+        return apis
+
+    def _parse_param_string(self, params_str: str) -> List[Dict]:
+        """Parse parameter string into list of parameter dicts."""
+        if not params_str.strip():
+            return []
+
+        params = []
+        for param in params_str.split(','):
+            param = param.strip()
+            if not param:
+                continue
+
+            # Try to extract name and type
+            param_info = {'name': param, 'type': None, 'default': None}
+
+            # Check for type annotation (: type)
+            if ':' in param:
+                parts = param.split(':', 1)
+                param_info['name'] = parts[0].strip()
+                type_part = parts[1].strip()
+
+                # Check for default value (= value)
+                if '=' in type_part:
+                    type_str, default_str = type_part.split('=', 1)
+                    param_info['type'] = type_str.strip()
+                    param_info['default'] = default_str.strip()
+                else:
+                    param_info['type'] = type_part
+
+            # Check for default without type (= value)
+            elif '=' in param:
+                parts = param.split('=', 1)
+                param_info['name'] = parts[0].strip()
+                param_info['default'] = parts[1].strip()
+
+            params.append(param_info)
+
+        return params
+
+    def _extract_code_apis(self) -> Dict[str, Dict[str, Any]]:
+        """
+        Extract API information from GitHub code analysis.
+
+        Returns:
+            Dict mapping API name to API info
+        """
+        apis = {}
+
+        code_analysis = self.github_data.get('code_analysis', {})
+        if not code_analysis:
+            return apis
+
+        # Support both 'files' and 'analyzed_files' keys
+        files = code_analysis.get('files', code_analysis.get('analyzed_files', []))
+
+        for file_info in files:
+            file_path = file_info.get('file', 'unknown')
+
+            # Extract classes and their methods
+            for class_info in file_info.get('classes', []):
+                class_name = class_info['name']
+
+                # Add class itself
+                apis[class_name] = {
+                    'name': class_name,
+                    'type': 'class',
+                    'source': file_path,
+                    'line': class_info.get('line_number'),
+                    'base_classes': class_info.get('base_classes', []),
+                    'docstring': class_info.get('docstring')
+                }
+
+                # Add methods
+                for method in class_info.get('methods', []):
+                    method_name = f"{class_name}.{method['name']}"
+                    apis[method_name] = {
+                        'name': method_name,
+                        'type': 'method',
+                        'parameters': method.get('parameters', []),
+                        'return_type': method.get('return_type'),
+                        'source': file_path,
+                        'line': method.get('line_number'),
+                        'docstring': method.get('docstring'),
+                        'is_async': method.get('is_async', False)
+                    }
+
+            # Extract standalone functions
+            for func_info in file_info.get('functions', []):
+                func_name = func_info['name']
+                apis[func_name] = {
+                    'name': func_name,
+                    'type': 'function',
+                    'parameters': func_info.get('parameters', []),
+                    'return_type': func_info.get('return_type'),
+                    'source': file_path,
+                    'line': func_info.get('line_number'),
+                    'docstring': func_info.get('docstring'),
+                    'is_async': func_info.get('is_async', False)
+                }
+
+        return apis
+
+    def detect_all_conflicts(self) -> List[Conflict]:
+        """
+        Detect all types of conflicts.
+
+        Returns:
+            List of Conflict objects
+        """
+        logger.info("Detecting conflicts between documentation and code...")
+
+        conflicts = []
+
+        # 1. Find APIs missing in documentation
+        conflicts.extend(self._find_missing_in_docs())
+
+        # 2. Find APIs missing in code
+        conflicts.extend(self._find_missing_in_code())
+
+        # 3. Find signature mismatches
+        conflicts.extend(self._find_signature_mismatches())
+
+        logger.info(f"Found {len(conflicts)} conflicts total")
+
+        return conflicts
+
+    def _find_missing_in_docs(self) -> List[Conflict]:
+        """Find APIs that exist in code but not in documentation."""
+        conflicts = []
+
+        for api_name, code_info in self.code_apis.items():
+            # Simple name matching (can be enhanced with fuzzy matching)
+            if api_name not in self.docs_apis:
+                # Check if it's a private/internal API (often not documented)
+                is_private = api_name.startswith('_') or '__' in api_name
+                severity = 'low' if is_private else 'medium'
+
+                conflicts.append(Conflict(
+                    type='missing_in_docs',
+                    severity=severity,
+                    api_name=api_name,
+                    code_info=code_info,
+                    difference=f"API exists in code ({code_info['source']}) but not found in documentation",
+                    suggestion="Add documentation for this API" if not is_private else "Consider if this internal API should be documented"
+                ))
+
+        logger.info(f"Found {len(conflicts)} APIs missing in documentation")
+        return conflicts
+
+    def _find_missing_in_code(self) -> List[Conflict]:
+        """Find APIs that are documented but don't exist in code."""
+        conflicts = []
+
+        for api_name, docs_info in self.docs_apis.items():
+            if api_name not in self.code_apis:
+                conflicts.append(Conflict(
+                    type='missing_in_code',
+                    severity='high',  # This is serious - documented but doesn't exist
+                    api_name=api_name,
+                    docs_info=docs_info,
+                    difference=f"API documented ({docs_info.get('source', 'unknown')}) but not found in code",
+                    suggestion="Update documentation to remove this API, or add it to codebase"
+                ))
+
+        logger.info(f"Found {len(conflicts)} APIs missing in code")
+        return conflicts
+
+    def _find_signature_mismatches(self) -> List[Conflict]:
+        """Find APIs where signature differs between docs and code."""
+        conflicts = []
+
+        # Find APIs that exist in both
+        common_apis = set(self.docs_apis.keys()) & set(self.code_apis.keys())
+
+        for api_name in common_apis:
+            docs_info = self.docs_apis[api_name]
+            code_info = self.code_apis[api_name]
+
+            # Compare signatures
+            mismatch = self._compare_signatures(docs_info, code_info)
+
+            if mismatch:
+                conflicts.append(Conflict(
+                    type='signature_mismatch',
+                    severity=mismatch['severity'],
+                    api_name=api_name,
+                    docs_info=docs_info,
+                    code_info=code_info,
+                    difference=mismatch['difference'],
+                    suggestion=mismatch['suggestion']
+                ))
+
+        logger.info(f"Found {len(conflicts)} signature mismatches")
+        return conflicts
+
+    def _compare_signatures(self, docs_info: Dict, code_info: Dict) -> Optional[Dict]:
+        """
+        Compare signatures between docs and code.
+
+        Returns:
+            Dict with mismatch details if conflict found, None otherwise
+        """
+        docs_params = docs_info.get('parameters', [])
+        code_params = code_info.get('parameters', [])
+
+        # Compare parameter counts
+        if len(docs_params) != len(code_params):
+            return {
+                'severity': 'medium',
+                'difference': f"Parameter count mismatch: docs has {len(docs_params)}, code has {len(code_params)}",
+                'suggestion': f"Documentation shows {len(docs_params)} parameters, but code has {len(code_params)}"
+            }
+
+        # Compare parameter names and types
+        for i, (doc_param, code_param) in enumerate(zip(docs_params, code_params)):
+            doc_name = doc_param.get('name', '')
+            code_name = code_param.get('name', '')
+
+            # Parameter name mismatch
+            if doc_name != code_name:
+                # Use fuzzy matching for slight variations
+                similarity = SequenceMatcher(None, doc_name, code_name).ratio()
+                if similarity < 0.8:  # Not similar enough
+                    return {
+                        'severity': 'medium',
+                        'difference': f"Parameter {i+1} name mismatch: '{doc_name}' in docs vs '{code_name}' in code",
+                        'suggestion': f"Update documentation to use parameter name '{code_name}'"
+                    }
+
+            # Type mismatch
+            doc_type = doc_param.get('type')
+            code_type = code_param.get('type_hint')
+
+            if doc_type and code_type and doc_type != code_type:
+                return {
+                    'severity': 'low',
+                    'difference': f"Parameter '{doc_name}' type mismatch: '{doc_type}' in docs vs '{code_type}' in code",
+                    'suggestion': f"Verify correct type for parameter '{doc_name}'"
+                }
+
+        # Compare return types if both have them
+        docs_return = docs_info.get('return_type')
+        code_return = code_info.get('return_type')
+
+        if docs_return and code_return and docs_return != code_return:
+            return {
+                'severity': 'low',
+                'difference': f"Return type mismatch: '{docs_return}' in docs vs '{code_return}' in code",
+                'suggestion': "Verify correct return type"
+            }
+
+        return None
+
+    def generate_summary(self, conflicts: List[Conflict]) -> Dict[str, Any]:
+        """
+        Generate summary statistics for conflicts.
+
+        Args:
+            conflicts: List of Conflict objects
+
+        Returns:
+            Summary dict with statistics
+        """
+        summary = {
+            'total': len(conflicts),
+            'by_type': {},
+            'by_severity': {},
+            'apis_affected': len(set(c.api_name for c in conflicts))
+        }
+
+        # Count by type
+        for conflict_type in ['missing_in_docs', 'missing_in_code', 'signature_mismatch', 'description_mismatch']:
+            count = sum(1 for c in conflicts if c.type == conflict_type)
+            summary['by_type'][conflict_type] = count
+
+        # Count by severity
+        for severity in ['low', 'medium', 'high']:
+            count = sum(1 for c in conflicts if c.severity == severity)
+            summary['by_severity'][severity] = count
+
+        return summary
+
+    def save_conflicts(self, conflicts: List[Conflict], output_path: str):
+        """
+        Save conflicts to JSON file.
+
+        Args:
+            conflicts: List of Conflict objects
+            output_path: Path to output JSON file
+        """
+        data = {
+            'conflicts': [asdict(c) for c in conflicts],
+            'summary': self.generate_summary(conflicts)
+        }
+
+        with open(output_path, 'w', encoding='utf-8') as f:
+            json.dump(data, f, indent=2, ensure_ascii=False)
+
+        logger.info(f"Conflicts saved to: {output_path}")
+
+
+if __name__ == '__main__':
+    import sys
+
+    if len(sys.argv) < 3:
+        print("Usage: python conflict_detector.py <docs_data.json> <github_data.json>")
+        sys.exit(1)
+
+    docs_file = sys.argv[1]
+    github_file = sys.argv[2]
+
+    # Load data
+    with open(docs_file, 'r') as f:
+        docs_data = json.load(f)
+
+    with open(github_file, 'r') as f:
+        github_data = json.load(f)
+
+    # Detect conflicts
+    detector = ConflictDetector(docs_data, github_data)
+    conflicts = detector.detect_all_conflicts()
+
+    # Print summary
+    summary = detector.generate_summary(conflicts)
+    print("\n📊 Conflict Summary:")
+    print(f"   Total conflicts: {summary['total']}")
+    print(f"   APIs affected: {summary['apis_affected']}")
+    print("\n   By Type:")
+    for conflict_type, count in summary['by_type'].items():
+        if count > 0:
+            print(f"     {conflict_type}: {count}")
+    print("\n   By Severity:")
+    for severity, count in summary['by_severity'].items():
+        if count > 0:
+            emoji = '🔴' if severity == 'high' else '🟡' if severity == 'medium' else '🟢'
+            print(f"     {emoji} {severity}: {count}")
+
+    # Save to file
+    output_file = 'conflicts.json'
+    detector.save_conflicts(conflicts, output_file)
+    print(f"\n✅ Full report saved to: {output_file}")

+ 72 - 0
libs/external/Skill_Seekers-development/src/skill_seekers/cli/constants.py

@@ -0,0 +1,72 @@
+"""Configuration constants for Skill Seekers CLI.
+
+This module centralizes all magic numbers and configuration values used
+across the CLI tools to improve maintainability and clarity.
+"""
+
+# ===== SCRAPING CONFIGURATION =====
+
+# Default scraping limits
+DEFAULT_RATE_LIMIT = 0.5  # seconds between requests
+DEFAULT_MAX_PAGES = 500   # maximum pages to scrape
+DEFAULT_CHECKPOINT_INTERVAL = 1000  # pages between checkpoints
+DEFAULT_ASYNC_MODE = False  # use async mode for parallel scraping (opt-in)
+
+# Content analysis limits
+CONTENT_PREVIEW_LENGTH = 500  # characters to check for categorization
+MAX_PAGES_WARNING_THRESHOLD = 10000  # warn if config exceeds this
+
+# Quality thresholds
+MIN_CATEGORIZATION_SCORE = 2  # minimum score for category assignment
+URL_MATCH_POINTS = 3  # points for URL keyword match
+TITLE_MATCH_POINTS = 2  # points for title keyword match
+CONTENT_MATCH_POINTS = 1  # points for content keyword match
+
+# ===== ENHANCEMENT CONFIGURATION =====
+
+# API-based enhancement limits (uses Anthropic API)
+API_CONTENT_LIMIT = 100000  # max characters for API enhancement
+API_PREVIEW_LIMIT = 40000   # max characters for preview
+
+# Local enhancement limits (uses Claude Code Max)
+LOCAL_CONTENT_LIMIT = 50000  # max characters for local enhancement
+LOCAL_PREVIEW_LIMIT = 20000  # max characters for preview
+
+# ===== PAGE ESTIMATION =====
+
+# Estimation and discovery settings
+DEFAULT_MAX_DISCOVERY = 1000  # default max pages to discover
+DISCOVERY_THRESHOLD = 10000   # threshold for warnings
+
+# ===== FILE LIMITS =====
+
+# Output and processing limits
+MAX_REFERENCE_FILES = 100  # maximum reference files per skill
+MAX_CODE_BLOCKS_PER_PAGE = 5  # maximum code blocks to extract per page
+
+# ===== EXPORT CONSTANTS =====
+
+__all__ = [
+    # Scraping
+    'DEFAULT_RATE_LIMIT',
+    'DEFAULT_MAX_PAGES',
+    'DEFAULT_CHECKPOINT_INTERVAL',
+    'DEFAULT_ASYNC_MODE',
+    'CONTENT_PREVIEW_LENGTH',
+    'MAX_PAGES_WARNING_THRESHOLD',
+    'MIN_CATEGORIZATION_SCORE',
+    'URL_MATCH_POINTS',
+    'TITLE_MATCH_POINTS',
+    'CONTENT_MATCH_POINTS',
+    # Enhancement
+    'API_CONTENT_LIMIT',
+    'API_PREVIEW_LIMIT',
+    'LOCAL_CONTENT_LIMIT',
+    'LOCAL_PREVIEW_LIMIT',
+    # Estimation
+    'DEFAULT_MAX_DISCOVERY',
+    'DISCOVERY_THRESHOLD',
+    # Limits
+    'MAX_REFERENCE_FILES',
+    'MAX_CODE_BLOCKS_PER_PAGE',
+]

+ 1822 - 0
libs/external/Skill_Seekers-development/src/skill_seekers/cli/doc_scraper.py

@@ -0,0 +1,1822 @@
+#!/usr/bin/env python3
+"""
+Documentation to Claude Skill Converter
+Single tool to scrape any documentation and create high-quality Claude skills.
+
+Usage:
+    skill-seekers scrape --interactive
+    skill-seekers scrape --config configs/godot.json
+    skill-seekers scrape --url https://react.dev/ --name react
+"""
+
+import os
+import sys
+import json
+import time
+import re
+import argparse
+import hashlib
+import logging
+import asyncio
+import requests
+import httpx
+from pathlib import Path
+from urllib.parse import urljoin, urlparse
+from bs4 import BeautifulSoup
+from collections import deque, defaultdict
+from typing import Optional, Dict, List, Tuple, Set, Deque, Any
+
+# Add parent directory to path for imports when run as script
+sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+
+from skill_seekers.cli.llms_txt_detector import LlmsTxtDetector
+from skill_seekers.cli.llms_txt_parser import LlmsTxtParser
+from skill_seekers.cli.llms_txt_downloader import LlmsTxtDownloader
+from skill_seekers.cli.constants import (
+    DEFAULT_RATE_LIMIT,
+    DEFAULT_MAX_PAGES,
+    DEFAULT_CHECKPOINT_INTERVAL,
+    DEFAULT_ASYNC_MODE,
+    CONTENT_PREVIEW_LENGTH,
+    MAX_PAGES_WARNING_THRESHOLD,
+    MIN_CATEGORIZATION_SCORE
+)
+
+# Configure logging
+logger = logging.getLogger(__name__)
+
+
+def setup_logging(verbose: bool = False, quiet: bool = False) -> None:
+    """Configure logging based on verbosity level.
+
+    Args:
+        verbose: Enable DEBUG level logging
+        quiet: Enable WARNING level logging only
+    """
+    if quiet:
+        level = logging.WARNING
+    elif verbose:
+        level = logging.DEBUG
+    else:
+        level = logging.INFO
+
+    logging.basicConfig(
+        level=level,
+        format='%(message)s',
+        force=True
+    )
+
+
+class DocToSkillConverter:
+    def __init__(self, config: Dict[str, Any], dry_run: bool = False, resume: bool = False) -> None:
+        self.config = config
+        self.name = config['name']
+        self.base_url = config['base_url']
+        self.dry_run = dry_run
+        self.resume = resume
+
+        # Paths
+        self.data_dir = f"output/{self.name}_data"
+        self.skill_dir = f"output/{self.name}"
+        self.checkpoint_file = f"{self.data_dir}/checkpoint.json"
+
+        # Checkpoint config
+        checkpoint_config = config.get('checkpoint', {})
+        self.checkpoint_enabled = checkpoint_config.get('enabled', False)
+        self.checkpoint_interval = checkpoint_config.get('interval', DEFAULT_CHECKPOINT_INTERVAL)
+
+        # llms.txt detection state
+        skip_llms_txt_value = config.get('skip_llms_txt', False)
+        if not isinstance(skip_llms_txt_value, bool):
+            logger.warning(
+                "Invalid value for 'skip_llms_txt': %r (expected bool). Defaulting to False.",
+                skip_llms_txt_value
+            )
+            self.skip_llms_txt = False
+        else:
+            self.skip_llms_txt = skip_llms_txt_value
+        self.llms_txt_detected = False
+        self.llms_txt_variant = None
+        self.llms_txt_variants: List[str] = []  # Track all downloaded variants
+
+        # Parallel scraping config
+        self.workers = config.get('workers', 1)
+        self.async_mode = config.get('async_mode', DEFAULT_ASYNC_MODE)
+
+        # State
+        self.visited_urls: set[str] = set()
+        # Support multiple starting URLs
+        start_urls = config.get('start_urls', [self.base_url])
+        self.pending_urls = deque(start_urls)
+        self.pages: List[Dict[str, Any]] = []
+        self.pages_scraped = 0
+
+        # Thread-safe lock for parallel scraping
+        if self.workers > 1:
+            import threading
+            self.lock = threading.Lock()
+
+        # Create directories (unless dry-run)
+        if not dry_run:
+            os.makedirs(f"{self.data_dir}/pages", exist_ok=True)
+            os.makedirs(f"{self.skill_dir}/references", exist_ok=True)
+            os.makedirs(f"{self.skill_dir}/scripts", exist_ok=True)
+            os.makedirs(f"{self.skill_dir}/assets", exist_ok=True)
+
+        # Load checkpoint if resuming
+        if resume and not dry_run:
+            self.load_checkpoint()
+    
+    def is_valid_url(self, url: str) -> bool:
+        """Check if URL should be scraped based on patterns.
+
+        Args:
+            url (str): URL to validate
+
+        Returns:
+            bool: True if URL matches include patterns and doesn't match exclude patterns
+        """
+        if not url.startswith(self.base_url):
+            return False
+
+        # Include patterns
+        includes = self.config.get('url_patterns', {}).get('include', [])
+        if includes and not any(pattern in url for pattern in includes):
+            return False
+
+        # Exclude patterns
+        excludes = self.config.get('url_patterns', {}).get('exclude', [])
+        if any(pattern in url for pattern in excludes):
+            return False
+
+        return True
+
+    def save_checkpoint(self) -> None:
+        """Save progress checkpoint"""
+        if not self.checkpoint_enabled or self.dry_run:
+            return
+
+        checkpoint_data = {
+            "config": self.config,
+            "visited_urls": list(self.visited_urls),
+            "pending_urls": list(self.pending_urls),
+            "pages_scraped": self.pages_scraped,
+            "last_updated": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
+            "checkpoint_interval": self.checkpoint_interval
+        }
+
+        try:
+            with open(self.checkpoint_file, 'w') as f:
+                json.dump(checkpoint_data, f, indent=2)
+            logger.info("  💾 Checkpoint saved (%d pages)", self.pages_scraped)
+        except Exception as e:
+            logger.warning("  ⚠️  Failed to save checkpoint: %s", e)
+
+    def load_checkpoint(self) -> None:
+        """Load progress from checkpoint"""
+        if not os.path.exists(self.checkpoint_file):
+            logger.info("ℹ️  No checkpoint found, starting fresh")
+            return
+
+        try:
+            with open(self.checkpoint_file, 'r') as f:
+                checkpoint_data = json.load(f)
+
+            self.visited_urls = set(checkpoint_data["visited_urls"])
+            self.pending_urls = deque(checkpoint_data["pending_urls"])
+            self.pages_scraped = checkpoint_data["pages_scraped"]
+
+            logger.info("✅ Resumed from checkpoint")
+            logger.info("   Pages already scraped: %d", self.pages_scraped)
+            logger.info("   URLs visited: %d", len(self.visited_urls))
+            logger.info("   URLs pending: %d", len(self.pending_urls))
+            logger.info("   Last updated: %s", checkpoint_data['last_updated'])
+            logger.info("")
+
+        except Exception as e:
+            logger.warning("⚠️  Failed to load checkpoint: %s", e)
+            logger.info("   Starting fresh")
+
+    def clear_checkpoint(self) -> None:
+        """Remove checkpoint file"""
+        if os.path.exists(self.checkpoint_file):
+            try:
+                os.remove(self.checkpoint_file)
+                logger.info("✅ Checkpoint cleared")
+            except Exception as e:
+                logger.warning("⚠️  Failed to clear checkpoint: %s", e)
+
+    def extract_content(self, soup: Any, url: str) -> Dict[str, Any]:
+        """Extract content with improved code and pattern detection"""
+        page = {
+            'url': url,
+            'title': '',
+            'content': '',
+            'headings': [],
+            'code_samples': [],
+            'patterns': [],  # NEW: Extract common patterns
+            'links': []
+        }
+        
+        selectors = self.config.get('selectors', {})
+        
+        # Extract title
+        title_elem = soup.select_one(selectors.get('title', 'title'))
+        if title_elem:
+            page['title'] = self.clean_text(title_elem.get_text())
+        
+        # Find main content
+        main_selector = selectors.get('main_content', 'div[role="main"]')
+        main = soup.select_one(main_selector)
+        
+        if not main:
+            logger.warning("⚠ No content: %s", url)
+            return page
+        
+        # Extract headings with better structure
+        for h in main.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6']):
+            text = self.clean_text(h.get_text())
+            if text:
+                page['headings'].append({
+                    'level': h.name,
+                    'text': text,
+                    'id': h.get('id', '')
+                })
+        
+        # Extract code with language detection
+        code_selector = selectors.get('code_blocks', 'pre code')
+        for code_elem in main.select(code_selector):
+            code = code_elem.get_text()
+            if len(code.strip()) > 10:
+                # Try to detect language
+                lang = self.detect_language(code_elem, code)
+                page['code_samples'].append({
+                    'code': code.strip(),
+                    'language': lang
+                })
+        
+        # Extract patterns (NEW: common code patterns)
+        page['patterns'] = self.extract_patterns(main, page['code_samples'])
+        
+        # Extract paragraphs
+        paragraphs = []
+        for p in main.find_all('p'):
+            text = self.clean_text(p.get_text())
+            if text and len(text) > 20:  # Skip very short paragraphs
+                paragraphs.append(text)
+        
+        page['content'] = '\n\n'.join(paragraphs)
+
+        # Extract links from entire page (not just main content)
+        # This allows discovery of navigation links outside the main content area
+        for link in soup.find_all('a', href=True):
+            href = urljoin(url, link['href'])
+            # Strip anchor fragments to avoid treating #anchors as separate pages
+            href = href.split('#')[0]
+            if self.is_valid_url(href) and href not in page['links']:
+                page['links'].append(href)
+
+        return page
+
+    def _extract_language_from_classes(self, classes):
+        """Extract language from class list
+
+        Supports multiple patterns:
+        - language-{lang} (e.g., "language-python")
+        - lang-{lang} (e.g., "lang-javascript")
+        - brush: {lang} (e.g., "brush: java")
+        - bare language name (e.g., "python", "java")
+
+        """
+        # Define common programming languages
+        known_languages = [
+            "javascript", "java", "xml", "html", "python", "bash", "cpp", "typescript",
+            "go", "rust", "php", "ruby", "swift", "kotlin", "csharp", "c", "sql",
+            "yaml", "json", "markdown", "css", "scss", "sass", "jsx", "tsx", "vue",
+            "shell", "powershell", "r", "scala", "dart", "perl", "lua", "elixir"
+        ]
+
+        for cls in classes:
+            # Clean special characters (except word chars and hyphens)
+            cls = re.sub(r'[^\w-]', '', cls)
+
+            if 'language-' in cls:
+                return cls.replace('language-', '')
+
+            if 'lang-' in cls:
+                return cls.replace('lang-', '')
+
+            # Check for brush: pattern (e.g., "brush: java")
+            if 'brush' in cls.lower():
+                lang = cls.lower().replace('brush', '').strip()
+                if lang in known_languages:
+                    return lang
+
+            # Check for bare language name
+            if cls in known_languages:
+                return cls
+
+        return None
+
+    def detect_language(self, elem, code):
+        """Detect programming language from code block"""
+
+        # Check element classes
+        lang = self._extract_language_from_classes(elem.get('class', []))
+        if lang:
+            return lang
+
+        # Check parent pre element
+        parent = elem.parent
+        if parent and parent.name == 'pre':
+            lang = self._extract_language_from_classes(parent.get('class', []))
+            if lang:
+                return lang
+
+        # Heuristic detection
+        if 'import ' in code and 'from ' in code:
+            return 'python'
+        if 'const ' in code or 'let ' in code or '=>' in code:
+            return 'javascript'
+        if 'func ' in code and 'var ' in code:
+            return 'gdscript'
+        if 'def ' in code and ':' in code:
+            return 'python'
+        if '#include' in code or 'int main' in code:
+            return 'cpp'
+        # C# detection
+        if 'using System' in code or 'namespace ' in code:
+            return 'csharp'
+        if '{ get; set; }' in code:
+            return 'csharp'
+        if any(keyword in code for keyword in ['public class ', 'private class ', 'internal class ', 'public static void ']):
+            return 'csharp'
+
+        return 'unknown'
+    
+    def extract_patterns(self, main: Any, code_samples: List[Dict[str, Any]]) -> List[Dict[str, str]]:
+        """Extract common coding patterns (NEW FEATURE)"""
+        patterns = []
+        
+        # Look for "Example:" or "Pattern:" sections
+        for elem in main.find_all(['p', 'div']):
+            text = elem.get_text().lower()
+            if any(word in text for word in ['example:', 'pattern:', 'usage:', 'typical use']):
+                # Get the code that follows
+                next_code = elem.find_next(['pre', 'code'])
+                if next_code:
+                    patterns.append({
+                        'description': self.clean_text(elem.get_text()),
+                        'code': next_code.get_text().strip()
+                    })
+        
+        return patterns[:5]  # Limit to 5 most relevant patterns
+    
+    def clean_text(self, text: str) -> str:
+        """Clean text content"""
+        text = re.sub(r'\s+', ' ', text)
+        return text.strip()
+    
+    def save_page(self, page: Dict[str, Any]) -> None:
+        """Save page data"""
+        url_hash = hashlib.md5(page['url'].encode()).hexdigest()[:10]
+        safe_title = re.sub(r'[^\w\s-]', '', page['title'])[:50]
+        safe_title = re.sub(r'[-\s]+', '_', safe_title)
+        
+        filename = f"{safe_title}_{url_hash}.json"
+        filepath = os.path.join(self.data_dir, "pages", filename)
+        
+        with open(filepath, 'w', encoding='utf-8') as f:
+            json.dump(page, f, indent=2, ensure_ascii=False)
+    
+    def scrape_page(self, url: str) -> None:
+        """Scrape a single page with thread-safe operations.
+
+        Args:
+            url (str): URL to scrape
+
+        Returns:
+            dict or None: Page data dict on success, None on failure
+
+        Note:
+            Uses threading locks when workers > 1 for thread safety
+        """
+        try:
+            # Scraping part (no lock needed - independent)
+            headers = {'User-Agent': 'Mozilla/5.0 (Documentation Scraper)'}
+            response = requests.get(url, headers=headers, timeout=30)
+            response.raise_for_status()
+
+            soup = BeautifulSoup(response.content, 'html.parser')
+            page = self.extract_content(soup, url)
+
+            # Thread-safe operations (lock required)
+            if self.workers > 1:
+                with self.lock:
+                    logger.info("  %s", url)
+                    self.save_page(page)
+                    self.pages.append(page)
+
+                    # Add new URLs
+                    for link in page['links']:
+                        if link not in self.visited_urls and link not in self.pending_urls:
+                            self.pending_urls.append(link)
+            else:
+                # Single-threaded mode (no lock needed)
+                logger.info("  %s", url)
+                self.save_page(page)
+                self.pages.append(page)
+
+                # Add new URLs
+                for link in page['links']:
+                    if link not in self.visited_urls and link not in self.pending_urls:
+                        self.pending_urls.append(link)
+
+            # Rate limiting
+            rate_limit = self.config.get('rate_limit', DEFAULT_RATE_LIMIT)
+            if rate_limit > 0:
+                time.sleep(rate_limit)
+
+        except Exception as e:
+            if self.workers > 1:
+                with self.lock:
+                    logger.error("  ✗ Error scraping %s: %s: %s", url, type(e).__name__, e)
+            else:
+                logger.error("  ✗ Error scraping page: %s: %s", type(e).__name__, e)
+                logger.error("     URL: %s", url)
+
+    async def scrape_page_async(self, url: str, semaphore: asyncio.Semaphore, client: httpx.AsyncClient) -> None:
+        """Scrape a single page asynchronously.
+
+        Args:
+            url: URL to scrape
+            semaphore: Asyncio semaphore for concurrency control
+            client: Shared httpx AsyncClient for connection pooling
+
+        Note:
+            Uses asyncio.Lock for async-safe operations instead of threading.Lock
+        """
+        async with semaphore:  # Limit concurrent requests
+            try:
+                # Async HTTP request
+                headers = {'User-Agent': 'Mozilla/5.0 (Documentation Scraper)'}
+                response = await client.get(url, headers=headers, timeout=30.0)
+                response.raise_for_status()
+
+                # BeautifulSoup parsing (still synchronous, but fast)
+                soup = BeautifulSoup(response.content, 'html.parser')
+                page = self.extract_content(soup, url)
+
+                # Async-safe operations (no lock needed - single event loop)
+                logger.info("  %s", url)
+                self.save_page(page)
+                self.pages.append(page)
+
+                # Add new URLs
+                for link in page['links']:
+                    if link not in self.visited_urls and link not in self.pending_urls:
+                        self.pending_urls.append(link)
+
+                # Rate limiting
+                rate_limit = self.config.get('rate_limit', DEFAULT_RATE_LIMIT)
+                if rate_limit > 0:
+                    await asyncio.sleep(rate_limit)
+
+            except Exception as e:
+                logger.error("  ✗ Error scraping %s: %s: %s", url, type(e).__name__, e)
+
+    def _try_llms_txt(self) -> bool:
+        """
+        Try to use llms.txt instead of HTML scraping.
+        Downloads ALL available variants and stores with .md extension.
+
+        Returns:
+            True if llms.txt was found and processed successfully
+        """
+        logger.info("\n🔍 Checking for llms.txt at %s...", self.base_url)
+
+        # Check for explicit config URL first
+        explicit_url = self.config.get('llms_txt_url')
+        if explicit_url:
+            logger.info("\n📌 Using explicit llms_txt_url from config: %s", explicit_url)
+
+            # Download explicit file first
+            downloader = LlmsTxtDownloader(explicit_url)
+            content = downloader.download()
+
+            if content:
+                # Save explicit file with proper .md extension
+                filename = downloader.get_proper_filename()
+                filepath = os.path.join(self.skill_dir, "references", filename)
+                os.makedirs(os.path.dirname(filepath), exist_ok=True)
+
+                with open(filepath, 'w', encoding='utf-8') as f:
+                    f.write(content)
+                logger.info("  💾 Saved %s (%d chars)", filename, len(content))
+
+                # Also try to detect and download ALL other variants
+                detector = LlmsTxtDetector(self.base_url)
+                variants = detector.detect_all()
+
+                if variants:
+                    logger.info("\n🔍 Found %d total variant(s), downloading remaining...", len(variants))
+                    for variant_info in variants:
+                        url = variant_info['url']
+                        variant = variant_info['variant']
+
+                        # Skip the explicit one we already downloaded
+                        if url == explicit_url:
+                            continue
+
+                        logger.info("  📥 Downloading %s...", variant)
+                        extra_downloader = LlmsTxtDownloader(url)
+                        extra_content = extra_downloader.download()
+
+                        if extra_content:
+                            extra_filename = extra_downloader.get_proper_filename()
+                            extra_filepath = os.path.join(self.skill_dir, "references", extra_filename)
+                            with open(extra_filepath, 'w', encoding='utf-8') as f:
+                                f.write(extra_content)
+                            logger.info("     ✓ %s (%d chars)", extra_filename, len(extra_content))
+
+                # Parse explicit file for skill building
+                parser = LlmsTxtParser(content)
+                pages = parser.parse()
+
+                if pages:
+                    for page in pages:
+                        self.save_page(page)
+                        self.pages.append(page)
+
+                    self.llms_txt_detected = True
+                    self.llms_txt_variant = 'explicit'
+                    return True
+
+        # Auto-detection: Find ALL variants
+        detector = LlmsTxtDetector(self.base_url)
+        variants = detector.detect_all()
+
+        if not variants:
+            logger.info("ℹ️  No llms.txt found, using HTML scraping")
+            return False
+
+        logger.info("✅ Found %d llms.txt variant(s)", len(variants))
+
+        # Download ALL variants
+        downloaded = {}
+        for variant_info in variants:
+            url = variant_info['url']
+            variant = variant_info['variant']
+
+            logger.info("  📥 Downloading %s...", variant)
+            downloader = LlmsTxtDownloader(url)
+            content = downloader.download()
+
+            if content:
+                filename = downloader.get_proper_filename()
+                downloaded[variant] = {
+                    'content': content,
+                    'filename': filename,
+                    'size': len(content)
+                }
+                logger.info("     ✓ %s (%d chars)", filename, len(content))
+
+        if not downloaded:
+            logger.warning("⚠️  Failed to download any variants, falling back to HTML scraping")
+            return False
+
+        # Save ALL variants to references/
+        os.makedirs(os.path.join(self.skill_dir, "references"), exist_ok=True)
+
+        for variant, data in downloaded.items():
+            filepath = os.path.join(self.skill_dir, "references", data['filename'])
+            with open(filepath, 'w', encoding='utf-8') as f:
+                f.write(data['content'])
+            logger.info("  💾 Saved %s", data['filename'])
+
+        # Parse LARGEST variant for skill building
+        largest = max(downloaded.items(), key=lambda x: x[1]['size'])
+        logger.info("\n📄 Parsing %s for skill building...", largest[1]['filename'])
+
+        parser = LlmsTxtParser(largest[1]['content'])
+        pages = parser.parse()
+
+        if not pages:
+            logger.warning("⚠️  Failed to parse llms.txt, falling back to HTML scraping")
+            return False
+
+        logger.info("  ✓ Parsed %d sections", len(pages))
+
+        # Save pages for skill building
+        for page in pages:
+            self.save_page(page)
+            self.pages.append(page)
+
+        self.llms_txt_detected = True
+        self.llms_txt_variants = list(downloaded.keys())
+
+        return True
+
+    def scrape_all(self) -> None:
+        """Scrape all pages (supports llms.txt and HTML scraping)
+
+        Routes to async version if async_mode is enabled in config.
+        """
+        # Route to async version if enabled
+        if self.async_mode:
+            asyncio.run(self.scrape_all_async())
+            return
+
+        # Try llms.txt first (unless dry-run or explicitly disabled)
+        if not self.dry_run and not self.skip_llms_txt:
+            llms_result = self._try_llms_txt()
+            if llms_result:
+                logger.info("\n✅ Used llms.txt (%s) - skipping HTML scraping", self.llms_txt_variant)
+                self.save_summary()
+                return
+
+        # HTML scraping (sync/thread-based logic)
+        logger.info("\n" + "=" * 60)
+        if self.dry_run:
+            logger.info("DRY RUN: %s", self.name)
+        else:
+            logger.info("SCRAPING: %s", self.name)
+        logger.info("=" * 60)
+        logger.info("Base URL: %s", self.base_url)
+
+        if self.dry_run:
+            logger.info("Mode: Preview only (no actual scraping)\n")
+        else:
+            logger.info("Output: %s", self.data_dir)
+            if self.workers > 1:
+                logger.info("Workers: %d parallel threads", self.workers)
+            logger.info("")
+
+        max_pages = self.config.get('max_pages', DEFAULT_MAX_PAGES)
+
+        # Handle unlimited mode
+        if max_pages is None or max_pages == -1:
+            logger.warning("⚠️  UNLIMITED MODE: No page limit (will scrape all pages)\n")
+            unlimited = True
+        else:
+            unlimited = False
+
+        # Dry run: preview first 20 URLs
+        preview_limit = 20 if self.dry_run else max_pages
+
+        # Single-threaded mode (original sequential logic)
+        if self.workers <= 1:
+            while self.pending_urls and (unlimited or len(self.visited_urls) < preview_limit):
+                url = self.pending_urls.popleft()
+
+                if url in self.visited_urls:
+                    continue
+
+                self.visited_urls.add(url)
+
+                if self.dry_run:
+                    # Just show what would be scraped
+                    logger.info("  [Preview] %s", url)
+                    try:
+                        headers = {'User-Agent': 'Mozilla/5.0 (Documentation Scraper - Dry Run)'}
+                        response = requests.get(url, headers=headers, timeout=10)
+                        soup = BeautifulSoup(response.content, 'html.parser')
+
+                        main_selector = self.config.get('selectors', {}).get('main_content', 'div[role="main"]')
+                        main = soup.select_one(main_selector)
+
+                        if main:
+                            for link in main.find_all('a', href=True):
+                                href = urljoin(url, link['href'])
+                                if self.is_valid_url(href) and href not in self.visited_urls:
+                                    self.pending_urls.append(href)
+                    except Exception as e:
+                        # Failed to extract links in fast mode, continue anyway
+                        logger.warning("⚠️  Warning: Could not extract links from %s: %s", url, e)
+                else:
+                    self.scrape_page(url)
+                    self.pages_scraped += 1
+
+                    if self.checkpoint_enabled and self.pages_scraped % self.checkpoint_interval == 0:
+                        self.save_checkpoint()
+
+                if len(self.visited_urls) % 10 == 0:
+                    logger.info("  [%d pages]", len(self.visited_urls))
+
+        # Multi-threaded mode (parallel scraping)
+        else:
+            from concurrent.futures import ThreadPoolExecutor, as_completed
+
+            logger.info("🚀 Starting parallel scraping with %d workers\n", self.workers)
+
+            with ThreadPoolExecutor(max_workers=self.workers) as executor:
+                futures = []
+
+                while self.pending_urls and (unlimited or len(self.visited_urls) < preview_limit):
+                    # Get next batch of URLs (thread-safe)
+                    batch = []
+                    batch_size = min(self.workers * 2, len(self.pending_urls))
+
+                    with self.lock:
+                        for _ in range(batch_size):
+                            if not self.pending_urls:
+                                break
+                            url = self.pending_urls.popleft()
+
+                            if url not in self.visited_urls:
+                                self.visited_urls.add(url)
+                                batch.append(url)
+
+                    # Submit batch to executor
+                    for url in batch:
+                        if unlimited or len(self.visited_urls) <= preview_limit:
+                            future = executor.submit(self.scrape_page, url)
+                            futures.append(future)
+
+                    # Wait for some to complete before submitting more
+                    completed = 0
+                    for future in as_completed(futures[:batch_size]):
+                        # Check for exceptions
+                        try:
+                            future.result()  # Raises exception if scrape_page failed
+                        except Exception as e:
+                            with self.lock:
+                                logger.warning("  ⚠️  Worker exception: %s", e)
+
+                        completed += 1
+
+                        with self.lock:
+                            self.pages_scraped += 1
+
+                            if self.checkpoint_enabled and self.pages_scraped % self.checkpoint_interval == 0:
+                                self.save_checkpoint()
+
+                            if self.pages_scraped % 10 == 0:
+                                logger.info("  [%d pages scraped]", self.pages_scraped)
+
+                    # Remove completed futures
+                    futures = [f for f in futures if not f.done()]
+
+                # Wait for remaining futures
+                for future in as_completed(futures):
+                    # Check for exceptions
+                    try:
+                        future.result()
+                    except Exception as e:
+                        with self.lock:
+                            logger.warning("  ⚠️  Worker exception: %s", e)
+
+                    with self.lock:
+                        self.pages_scraped += 1
+
+        if self.dry_run:
+            logger.info("\n✅ Dry run complete: would scrape ~%d pages", len(self.visited_urls))
+            if len(self.visited_urls) >= preview_limit:
+                logger.info("   (showing first %d, actual scraping may find more)", preview_limit)
+            logger.info("\n💡 To actually scrape, run without --dry-run")
+        else:
+            logger.info("\n✅ Scraped %d pages", len(self.visited_urls))
+            self.save_summary()
+
+    async def scrape_all_async(self) -> None:
+        """Scrape all pages asynchronously (async/await version).
+
+        This method provides significantly better performance for parallel scraping
+        compared to thread-based scraping, with lower memory overhead and better
+        CPU utilization.
+
+        Performance: ~2-3x faster than sync mode with same worker count.
+        """
+        # Try llms.txt first (unless dry-run or explicitly disabled)
+        if not self.dry_run and not self.skip_llms_txt:
+            llms_result = self._try_llms_txt()
+            if llms_result:
+                logger.info("\n✅ Used llms.txt (%s) - skipping HTML scraping", self.llms_txt_variant)
+                self.save_summary()
+                return
+
+        # HTML scraping (async version)
+        logger.info("\n" + "=" * 60)
+        if self.dry_run:
+            logger.info("DRY RUN (ASYNC): %s", self.name)
+        else:
+            logger.info("SCRAPING (ASYNC): %s", self.name)
+        logger.info("=" * 60)
+        logger.info("Base URL: %s", self.base_url)
+
+        if self.dry_run:
+            logger.info("Mode: Preview only (no actual scraping)\n")
+        else:
+            logger.info("Output: %s", self.data_dir)
+            logger.info("Workers: %d concurrent tasks (async)", self.workers)
+            logger.info("")
+
+        max_pages = self.config.get('max_pages', DEFAULT_MAX_PAGES)
+
+        # Handle unlimited mode
+        if max_pages is None or max_pages == -1:
+            logger.warning("⚠️  UNLIMITED MODE: No page limit (will scrape all pages)\n")
+            unlimited = True
+            preview_limit = float('inf')
+        else:
+            unlimited = False
+            preview_limit = 20 if self.dry_run else max_pages
+
+        # Create semaphore for concurrency control
+        semaphore = asyncio.Semaphore(self.workers)
+
+        # Create shared HTTP client with connection pooling
+        async with httpx.AsyncClient(
+            timeout=30.0,
+            limits=httpx.Limits(max_connections=self.workers * 2)
+        ) as client:
+            tasks = []
+
+            while self.pending_urls and (unlimited or len(self.visited_urls) < preview_limit):
+                # Get next batch of URLs
+                batch = []
+                batch_size = min(self.workers * 2, len(self.pending_urls))
+
+                for _ in range(batch_size):
+                    if not self.pending_urls:
+                        break
+                    url = self.pending_urls.popleft()
+
+                    if url not in self.visited_urls:
+                        self.visited_urls.add(url)
+                        batch.append(url)
+
+                # Create async tasks for batch
+                for url in batch:
+                    if unlimited or len(self.visited_urls) <= preview_limit:
+                        if self.dry_run:
+                            logger.info("  [Preview] %s", url)
+                        else:
+                            task = asyncio.create_task(
+                                self.scrape_page_async(url, semaphore, client)
+                            )
+                            tasks.append(task)
+
+                # Wait for batch to complete before continuing
+                if tasks:
+                    await asyncio.gather(*tasks, return_exceptions=True)
+                    tasks = []
+                    self.pages_scraped = len(self.visited_urls)
+
+                    # Progress indicator
+                    if self.pages_scraped % 10 == 0 and not self.dry_run:
+                        logger.info("  [%d pages scraped]", self.pages_scraped)
+
+                    # Checkpoint saving
+                    if not self.dry_run and self.checkpoint_enabled:
+                        if self.pages_scraped % self.checkpoint_interval == 0:
+                            self.save_checkpoint()
+
+            # Wait for any remaining tasks
+            if tasks:
+                await asyncio.gather(*tasks, return_exceptions=True)
+
+        if self.dry_run:
+            logger.info("\n✅ Dry run complete: would scrape ~%d pages", len(self.visited_urls))
+            if len(self.visited_urls) >= preview_limit:
+                logger.info("   (showing first %d, actual scraping may find more)", int(preview_limit))
+            logger.info("\n💡 To actually scrape, run without --dry-run")
+        else:
+            logger.info("\n✅ Scraped %d pages (async mode)", len(self.visited_urls))
+            self.save_summary()
+
+    def save_summary(self) -> None:
+        """Save scraping summary"""
+        summary = {
+            'name': self.name,
+            'total_pages': len(self.pages),
+            'base_url': self.base_url,
+            'llms_txt_detected': self.llms_txt_detected,
+            'llms_txt_variant': self.llms_txt_variant,
+            'pages': [{'title': p['title'], 'url': p['url']} for p in self.pages]
+        }
+
+        with open(f"{self.data_dir}/summary.json", 'w', encoding='utf-8') as f:
+            json.dump(summary, f, indent=2, ensure_ascii=False)
+    
+    def load_scraped_data(self) -> List[Dict[str, Any]]:
+        """Load previously scraped data"""
+        pages = []
+        pages_dir = Path(self.data_dir) / "pages"
+        
+        if not pages_dir.exists():
+            return []
+        
+        for json_file in pages_dir.glob("*.json"):
+            try:
+                with open(json_file, 'r', encoding='utf-8') as f:
+                    pages.append(json.load(f))
+            except Exception as e:
+                logger.error("⚠️  Error loading scraped data file %s: %s: %s", json_file, type(e).__name__, e)
+                logger.error("   Suggestion: File may be corrupted, consider re-scraping with --fresh")
+        
+        return pages
+    
+    def smart_categorize(self, pages: List[Dict[str, Any]]) -> Dict[str, List[Dict[str, Any]]]:
+        """Improved categorization with better pattern matching"""
+        category_defs = self.config.get('categories', {})
+        
+        # Default smart categories if none provided
+        if not category_defs:
+            category_defs = self.infer_categories(pages)
+
+        categories: Dict[str, List[Dict[str, Any]]] = {cat: [] for cat in category_defs.keys()}
+        categories['other'] = []
+        
+        for page in pages:
+            url = page['url'].lower()
+            title = page['title'].lower()
+            content = page.get('content', '').lower()[:CONTENT_PREVIEW_LENGTH]  # Check first N chars for categorization
+            
+            categorized = False
+            
+            # Match against keywords
+            for cat, keywords in category_defs.items():
+                score = 0
+                for keyword in keywords:
+                    keyword = keyword.lower()
+                    if keyword in url:
+                        score += 3
+                    if keyword in title:
+                        score += 2
+                    if keyword in content:
+                        score += 1
+                
+                if score >= MIN_CATEGORIZATION_SCORE:  # Threshold for categorization
+                    categories[cat].append(page)
+                    categorized = True
+                    break
+            
+            if not categorized:
+                categories['other'].append(page)
+        
+        # Remove empty categories
+        categories = {k: v for k, v in categories.items() if v}
+        
+        return categories
+    
+    def infer_categories(self, pages: List[Dict[str, Any]]) -> Dict[str, List[str]]:
+        """Infer categories from URL patterns (IMPROVED)"""
+        url_segments: defaultdict[str, int] = defaultdict(int)
+        
+        for page in pages:
+            path = urlparse(page['url']).path
+            segments = [s for s in path.split('/') if s and s not in ['en', 'stable', 'latest', 'docs']]
+            
+            for seg in segments:
+                url_segments[seg] += 1
+        
+        # Top segments become categories
+        top_segments = sorted(url_segments.items(), key=lambda x: x[1], reverse=True)[:8]
+        
+        categories = {}
+        for seg, count in top_segments:
+            if count >= 3:  # At least 3 pages
+                categories[seg] = [seg]
+        
+        # Add common defaults
+        if 'tutorial' not in categories and any('tutorial' in url for url in [p['url'] for p in pages]):
+            categories['tutorials'] = ['tutorial', 'guide', 'getting-started']
+        
+        if 'api' not in categories and any('api' in url or 'reference' in url for url in [p['url'] for p in pages]):
+            categories['api'] = ['api', 'reference', 'class']
+        
+        return categories
+    
+    def generate_quick_reference(self, pages: List[Dict[str, Any]]) -> List[Dict[str, str]]:
+        """Generate quick reference from common patterns (NEW FEATURE)"""
+        quick_ref = []
+        
+        # Collect all patterns
+        all_patterns = []
+        for page in pages:
+            all_patterns.extend(page.get('patterns', []))
+        
+        # Get most common code patterns
+        seen_codes = set()
+        for pattern in all_patterns:
+            code = pattern['code']
+            if code not in seen_codes and len(code) < 300:
+                quick_ref.append(pattern)
+                seen_codes.add(code)
+                if len(quick_ref) >= 15:
+                    break
+        
+        return quick_ref
+    
+    def create_reference_file(self, category: str, pages: List[Dict[str, Any]]) -> None:
+        """Create enhanced reference file"""
+        if not pages:
+            return
+        
+        lines = []
+        lines.append(f"# {self.name.title()} - {category.replace('_', ' ').title()}\n")
+        lines.append(f"**Pages:** {len(pages)}\n")
+        lines.append("---\n")
+        
+        for page in pages:
+            lines.append(f"## {page['title']}\n")
+            lines.append(f"**URL:** {page['url']}\n")
+            
+            # Table of contents from headings
+            if page.get('headings'):
+                lines.append("**Contents:**")
+                for h in page['headings'][:10]:
+                    level = int(h['level'][1]) if len(h['level']) > 1 else 1
+                    indent = "  " * max(0, level - 2)
+                    lines.append(f"{indent}- {h['text']}")
+                lines.append("")
+            
+            # Content (NO TRUNCATION)
+            if page.get('content'):
+                lines.append(page['content'])
+                lines.append("")
+
+            # Code examples with language (NO TRUNCATION)
+            if page.get('code_samples'):
+                lines.append("**Examples:**\n")
+                for i, sample in enumerate(page['code_samples'][:4], 1):
+                    lang = sample.get('language', 'unknown')
+                    code = sample.get('code', sample if isinstance(sample, str) else '')
+                    lines.append(f"Example {i} ({lang}):")
+                    lines.append(f"```{lang}")
+                    lines.append(code)  # Full code, no truncation
+                    lines.append("```\n")
+            
+            lines.append("---\n")
+        
+        filepath = os.path.join(self.skill_dir, "references", f"{category}.md")
+        with open(filepath, 'w', encoding='utf-8') as f:
+            f.write('\n'.join(lines))
+
+        logger.info("  ✓ %s.md (%d pages)", category, len(pages))
+    
+    def create_enhanced_skill_md(self, categories: Dict[str, List[Dict[str, Any]]], quick_ref: List[Dict[str, str]]) -> None:
+        """Create SKILL.md with actual examples (IMPROVED)"""
+        description = self.config.get('description', f'Comprehensive assistance with {self.name}')
+        
+        # Extract actual code examples from docs
+        example_codes = []
+        for pages in categories.values():
+            for page in pages[:3]:  # First 3 pages per category
+                for sample in page.get('code_samples', [])[:2]:  # First 2 samples per page
+                    code = sample.get('code', sample if isinstance(sample, str) else '')
+                    lang = sample.get('language', 'unknown')
+                    if len(code) < 200 and lang != 'unknown':
+                        example_codes.append((lang, code))
+                    if len(example_codes) >= 10:
+                        break
+                if len(example_codes) >= 10:
+                    break
+            if len(example_codes) >= 10:
+                break
+        
+        content = f"""---
+name: {self.name}
+description: {description}
+---
+
+# {self.name.title()} Skill
+
+Comprehensive assistance with {self.name} development, generated from official documentation.
+
+## When to Use This Skill
+
+This skill should be triggered when:
+- Working with {self.name}
+- Asking about {self.name} features or APIs
+- Implementing {self.name} solutions
+- Debugging {self.name} code
+- Learning {self.name} best practices
+
+## Quick Reference
+
+### Common Patterns
+
+"""
+        
+        # Add actual quick reference patterns
+        if quick_ref:
+            for i, pattern in enumerate(quick_ref[:8], 1):
+                content += f"**Pattern {i}:** {pattern.get('description', 'Example pattern')}\n\n"
+                content += "```\n"
+                content += pattern.get('code', '')[:300]
+                content += "\n```\n\n"
+        else:
+            content += "*Quick reference patterns will be added as you use the skill.*\n\n"
+        
+        # Add example codes from docs
+        if example_codes:
+            content += "### Example Code Patterns\n\n"
+            for i, (lang, code) in enumerate(example_codes[:5], 1):
+                content += f"**Example {i}** ({lang}):\n```{lang}\n{code}\n```\n\n"
+        
+        content += f"""## Reference Files
+
+This skill includes comprehensive documentation in `references/`:
+
+"""
+        
+        for cat in sorted(categories.keys()):
+            content += f"- **{cat}.md** - {cat.replace('_', ' ').title()} documentation\n"
+        
+        content += """
+Use `view` to read specific reference files when detailed information is needed.
+
+## Working with This Skill
+
+### For Beginners
+Start with the getting_started or tutorials reference files for foundational concepts.
+
+### For Specific Features
+Use the appropriate category reference file (api, guides, etc.) for detailed information.
+
+### For Code Examples
+The quick reference section above contains common patterns extracted from the official docs.
+
+## Resources
+
+### references/
+Organized documentation extracted from official sources. These files contain:
+- Detailed explanations
+- Code examples with language annotations
+- Links to original documentation
+- Table of contents for quick navigation
+
+### scripts/
+Add helper scripts here for common automation tasks.
+
+### assets/
+Add templates, boilerplate, or example projects here.
+
+## Notes
+
+- This skill was automatically generated from official documentation
+- Reference files preserve the structure and examples from source docs
+- Code examples include language detection for better syntax highlighting
+- Quick reference patterns are extracted from common usage examples in the docs
+
+## Updating
+
+To refresh this skill with updated documentation:
+1. Re-run the scraper with the same configuration
+2. The skill will be rebuilt with the latest information
+"""
+        
+        filepath = os.path.join(self.skill_dir, "SKILL.md")
+        with open(filepath, 'w', encoding='utf-8') as f:
+            f.write(content)
+
+        logger.info("  ✓ SKILL.md (enhanced with %d examples)", len(example_codes))
+    
+    def create_index(self, categories: Dict[str, List[Dict[str, Any]]]) -> None:
+        """Create navigation index"""
+        lines = []
+        lines.append(f"# {self.name.title()} Documentation Index\n")
+        lines.append("## Categories\n")
+        
+        for cat, pages in sorted(categories.items()):
+            lines.append(f"### {cat.replace('_', ' ').title()}")
+            lines.append(f"**File:** `{cat}.md`")
+            lines.append(f"**Pages:** {len(pages)}\n")
+        
+        filepath = os.path.join(self.skill_dir, "references", "index.md")
+        with open(filepath, 'w', encoding='utf-8') as f:
+            f.write('\n'.join(lines))
+
+        logger.info("  ✓ index.md")
+    
+    def build_skill(self) -> bool:
+        """Build the skill from scraped data.
+
+        Loads scraped JSON files, categorizes pages, extracts patterns,
+        and generates SKILL.md and reference files.
+
+        Returns:
+            bool: True if build succeeded, False otherwise
+        """
+        logger.info("\n" + "=" * 60)
+        logger.info("BUILDING SKILL: %s", self.name)
+        logger.info("=" * 60 + "\n")
+
+        # Load data
+        logger.info("Loading scraped data...")
+        pages = self.load_scraped_data()
+
+        if not pages:
+            logger.error("✗ No scraped data found!")
+            return False
+
+        logger.info("  ✓ Loaded %d pages\n", len(pages))
+
+        # Categorize
+        logger.info("Categorizing pages...")
+        categories = self.smart_categorize(pages)
+        logger.info("  ✓ Created %d categories\n", len(categories))
+
+        # Generate quick reference
+        logger.info("Generating quick reference...")
+        quick_ref = self.generate_quick_reference(pages)
+        logger.info("  ✓ Extracted %d patterns\n", len(quick_ref))
+
+        # Create reference files
+        logger.info("Creating reference files...")
+        for cat, cat_pages in categories.items():
+            self.create_reference_file(cat, cat_pages)
+
+        # Create index
+        self.create_index(categories)
+        logger.info("")
+
+        # Create enhanced SKILL.md
+        logger.info("Creating SKILL.md...")
+        self.create_enhanced_skill_md(categories, quick_ref)
+
+        logger.info("\n✅ Skill built: %s/", self.skill_dir)
+        return True
+
+
+def validate_config(config: Dict[str, Any]) -> Tuple[List[str], List[str]]:
+    """Validate configuration structure and values.
+
+    Args:
+        config (dict): Configuration dictionary to validate
+
+    Returns:
+        tuple: (errors, warnings) where each is a list of strings
+
+    Example:
+        >>> errors, warnings = validate_config({'name': 'test', 'base_url': 'https://example.com'})
+        >>> if errors:
+        ...     print("Invalid config:", errors)
+    """
+    errors = []
+    warnings = []
+
+    # Required fields
+    required_fields = ['name', 'base_url']
+    for field in required_fields:
+        if field not in config:
+            errors.append(f"Missing required field: '{field}'")
+
+    # Validate name (alphanumeric, hyphens, underscores only)
+    if 'name' in config:
+        if not re.match(r'^[a-zA-Z0-9_-]+$', config['name']):
+            errors.append(f"Invalid name: '{config['name']}' (use only letters, numbers, hyphens, underscores)")
+
+    # Validate base_url
+    if 'base_url' in config:
+        if not config['base_url'].startswith(('http://', 'https://')):
+            errors.append(f"Invalid base_url: '{config['base_url']}' (must start with http:// or https://)")
+
+    # Validate selectors structure
+    if 'selectors' in config:
+        if not isinstance(config['selectors'], dict):
+            errors.append("'selectors' must be a dictionary")
+        else:
+            recommended_selectors = ['main_content', 'title', 'code_blocks']
+            for selector in recommended_selectors:
+                if selector not in config['selectors']:
+                    warnings.append(f"Missing recommended selector: '{selector}'")
+    else:
+        warnings.append("Missing 'selectors' section (recommended)")
+
+    # Validate url_patterns
+    if 'url_patterns' in config:
+        if not isinstance(config['url_patterns'], dict):
+            errors.append("'url_patterns' must be a dictionary")
+        else:
+            for key in ['include', 'exclude']:
+                if key in config['url_patterns']:
+                    if not isinstance(config['url_patterns'][key], list):
+                        errors.append(f"'url_patterns.{key}' must be a list")
+
+    # Validate categories
+    if 'categories' in config:
+        if not isinstance(config['categories'], dict):
+            errors.append("'categories' must be a dictionary")
+        else:
+            for cat_name, keywords in config['categories'].items():
+                if not isinstance(keywords, list):
+                    errors.append(f"'categories.{cat_name}' must be a list of keywords")
+
+    # Validate rate_limit
+    if 'rate_limit' in config:
+        try:
+            rate = float(config['rate_limit'])
+            if rate < 0:
+                errors.append(f"'rate_limit' must be non-negative (got {rate})")
+            elif rate > 10:
+                warnings.append(f"'rate_limit' is very high ({rate}s) - this may slow down scraping significantly")
+        except (ValueError, TypeError):
+            errors.append(f"'rate_limit' must be a number (got {config['rate_limit']})")
+
+    # Validate max_pages
+    if 'max_pages' in config:
+        max_p_value = config['max_pages']
+
+        # Allow None for unlimited
+        if max_p_value is None:
+            warnings.append("'max_pages' is None (unlimited) - this will scrape ALL pages. Use with caution!")
+        else:
+            try:
+                max_p = int(max_p_value)
+                # Allow -1 for unlimited
+                if max_p == -1:
+                    warnings.append("'max_pages' is -1 (unlimited) - this will scrape ALL pages. Use with caution!")
+                elif max_p < 1:
+                    errors.append(f"'max_pages' must be at least 1 or -1 for unlimited (got {max_p})")
+                elif max_p > MAX_PAGES_WARNING_THRESHOLD:
+                    warnings.append(f"'max_pages' is very high ({max_p}) - scraping may take a very long time")
+            except (ValueError, TypeError):
+                errors.append(f"'max_pages' must be an integer, -1, or null (got {config['max_pages']})")
+
+    # Validate start_urls if present
+    if 'start_urls' in config:
+        if not isinstance(config['start_urls'], list):
+            errors.append("'start_urls' must be a list")
+        else:
+            for url in config['start_urls']:
+                if not url.startswith(('http://', 'https://')):
+                    errors.append(f"Invalid start_url: '{url}' (must start with http:// or https://)")
+
+    return errors, warnings
+
+
+def load_config(config_path: str) -> Dict[str, Any]:
+    """Load and validate configuration from JSON file.
+
+    Args:
+        config_path (str): Path to JSON configuration file
+
+    Returns:
+        dict: Validated configuration dictionary
+
+    Raises:
+        SystemExit: If config is invalid or file not found
+
+    Example:
+        >>> config = load_config('configs/react.json')
+        >>> print(config['name'])
+        'react'
+    """
+    try:
+        with open(config_path, 'r') as f:
+            config = json.load(f)
+    except json.JSONDecodeError as e:
+        logger.error("❌ Error: Invalid JSON in config file: %s", config_path)
+        logger.error("   Details: %s", e)
+        logger.error("   Suggestion: Check syntax at line %d, column %d", e.lineno, e.colno)
+        sys.exit(1)
+    except FileNotFoundError:
+        logger.error("❌ Error: Config file not found: %s", config_path)
+        logger.error("   Suggestion: Create a config file or use an existing one from configs/")
+        logger.error("   Available configs: react.json, vue.json, django.json, godot.json")
+        sys.exit(1)
+
+    # Validate config
+    errors, warnings = validate_config(config)
+
+    # Show warnings (non-blocking)
+    if warnings:
+        logger.warning("⚠️  Configuration warnings in %s:", config_path)
+        for warning in warnings:
+            logger.warning("   - %s", warning)
+        logger.info("")
+
+    # Show errors (blocking)
+    if errors:
+        logger.error("❌ Configuration validation errors in %s:", config_path)
+        for error in errors:
+            logger.error("   - %s", error)
+        logger.error("\n   Suggestion: Fix the above errors or check configs/ for working examples")
+        sys.exit(1)
+
+    return config
+
+
+def interactive_config() -> Dict[str, Any]:
+    """Interactive configuration wizard for creating new configs.
+
+    Prompts user for all required configuration fields step-by-step
+    and returns a complete configuration dictionary.
+
+    Returns:
+        dict: Complete configuration dictionary with user-provided values
+
+    Example:
+        >>> config = interactive_config()
+        # User enters: name=react, url=https://react.dev, etc.
+        >>> config['name']
+        'react'
+    """
+    logger.info("\n" + "="*60)
+    logger.info("Documentation to Skill Converter")
+    logger.info("="*60 + "\n")
+
+    config: Dict[str, Any] = {}
+    
+    # Basic info
+    config['name'] = input("Skill name (e.g., 'react', 'godot'): ").strip()
+    config['description'] = input("Skill description: ").strip()
+    config['base_url'] = input("Base URL (e.g., https://docs.example.com/): ").strip()
+    
+    if not config['base_url'].endswith('/'):
+        config['base_url'] += '/'
+    
+    # Selectors
+    logger.info("\nCSS Selectors (press Enter for defaults):")
+    selectors = {}
+    selectors['main_content'] = input("  Main content [div[role='main']]: ").strip() or "div[role='main']"
+    selectors['title'] = input("  Title [title]: ").strip() or "title"
+    selectors['code_blocks'] = input("  Code blocks [pre code]: ").strip() or "pre code"
+    config['selectors'] = selectors
+    
+    # URL patterns
+    logger.info("\nURL Patterns (comma-separated, optional):")
+    include = input("  Include: ").strip()
+    exclude = input("  Exclude: ").strip()
+    config['url_patterns'] = {
+        'include': [p.strip() for p in include.split(',') if p.strip()],
+        'exclude': [p.strip() for p in exclude.split(',') if p.strip()]
+    }
+    
+    # Settings
+    rate = input(f"\nRate limit (seconds) [{DEFAULT_RATE_LIMIT}]: ").strip()
+    config['rate_limit'] = float(rate) if rate else DEFAULT_RATE_LIMIT
+
+    max_p = input(f"Max pages [{DEFAULT_MAX_PAGES}]: ").strip()
+    config['max_pages'] = int(max_p) if max_p else DEFAULT_MAX_PAGES
+    
+    return config
+
+
+def check_existing_data(name: str) -> Tuple[bool, int]:
+    """Check if scraped data already exists for a skill.
+
+    Args:
+        name (str): Skill name to check
+
+    Returns:
+        tuple: (exists, page_count) where exists is bool and page_count is int
+
+    Example:
+        >>> exists, count = check_existing_data('react')
+        >>> if exists:
+        ...     print(f"Found {count} existing pages")
+    """
+    data_dir = f"output/{name}_data"
+    if os.path.exists(data_dir) and os.path.exists(f"{data_dir}/summary.json"):
+        with open(f"{data_dir}/summary.json", 'r') as f:
+            summary = json.load(f)
+        return True, summary.get('total_pages', 0)
+    return False, 0
+
+
+def setup_argument_parser() -> argparse.ArgumentParser:
+    """Setup and configure command-line argument parser.
+
+    Creates an ArgumentParser with all CLI options for the doc scraper tool,
+    including configuration, scraping, enhancement, and performance options.
+
+    Returns:
+        argparse.ArgumentParser: Configured argument parser
+
+    Example:
+        >>> parser = setup_argument_parser()
+        >>> args = parser.parse_args(['--config', 'configs/react.json'])
+        >>> print(args.config)
+        configs/react.json
+    """
+    parser = argparse.ArgumentParser(
+        description='Convert documentation websites to Claude skills',
+        formatter_class=argparse.RawDescriptionHelpFormatter
+    )
+
+    parser.add_argument('--interactive', '-i', action='store_true',
+                       help='Interactive configuration mode')
+    parser.add_argument('--config', '-c', type=str,
+                       help='Load configuration from file (e.g., configs/godot.json)')
+    parser.add_argument('--name', type=str,
+                       help='Skill name')
+    parser.add_argument('--url', type=str,
+                       help='Base documentation URL')
+    parser.add_argument('--description', '-d', type=str,
+                       help='Skill description')
+    parser.add_argument('--skip-scrape', action='store_true',
+                       help='Skip scraping, use existing data')
+    parser.add_argument('--dry-run', action='store_true',
+                       help='Preview what will be scraped without actually scraping')
+    parser.add_argument('--enhance', action='store_true',
+                       help='Enhance SKILL.md using Claude API after building (requires API key)')
+    parser.add_argument('--enhance-local', action='store_true',
+                       help='Enhance SKILL.md using Claude Code (no API key needed, runs in background)')
+    parser.add_argument('--interactive-enhancement', action='store_true',
+                       help='Open terminal window for enhancement (use with --enhance-local)')
+    parser.add_argument('--api-key', type=str,
+                       help='Anthropic API key for --enhance (or set ANTHROPIC_API_KEY)')
+    parser.add_argument('--resume', action='store_true',
+                       help='Resume from last checkpoint (for interrupted scrapes)')
+    parser.add_argument('--fresh', action='store_true',
+                       help='Clear checkpoint and start fresh')
+    parser.add_argument('--rate-limit', '-r', type=float, metavar='SECONDS',
+                       help=f'Override rate limit in seconds (default: from config or {DEFAULT_RATE_LIMIT}). Use 0 for no delay.')
+    parser.add_argument('--workers', '-w', type=int, metavar='N',
+                       help='Number of parallel workers for faster scraping (default: 1, max: 10)')
+    parser.add_argument('--async', dest='async_mode', action='store_true',
+                       help='Enable async mode for better parallel performance (2-3x faster than threads)')
+    parser.add_argument('--no-rate-limit', action='store_true',
+                       help='Disable rate limiting completely (same as --rate-limit 0)')
+    parser.add_argument('--verbose', '-v', action='store_true',
+                       help='Enable verbose output (DEBUG level logging)')
+    parser.add_argument('--quiet', '-q', action='store_true',
+                       help='Minimize output (WARNING level logging only)')
+
+    return parser
+
+
+def get_configuration(args: argparse.Namespace) -> Dict[str, Any]:
+    """Load or create configuration from command-line arguments.
+
+    Handles three configuration modes:
+    1. Load from JSON file (--config)
+    2. Interactive configuration wizard (--interactive or missing args)
+    3. Quick mode from command-line arguments (--name, --url)
+
+    Also applies CLI overrides for rate limiting and worker count.
+
+    Args:
+        args: Parsed command-line arguments from argparse
+
+    Returns:
+        dict: Configuration dictionary with all required fields
+
+    Example:
+        >>> args = parser.parse_args(['--name', 'react', '--url', 'https://react.dev'])
+        >>> config = get_configuration(args)
+        >>> print(config['name'])
+        react
+    """
+    # Get base configuration
+    if args.config:
+        config = load_config(args.config)
+    elif args.interactive or not (args.name and args.url):
+        config = interactive_config()
+    else:
+        config = {
+            'name': args.name,
+            'description': args.description or f'Comprehensive assistance with {args.name}',
+            'base_url': args.url,
+            'selectors': {
+                'main_content': "div[role='main']",
+                'title': 'title',
+                'code_blocks': 'pre code'
+            },
+            'url_patterns': {'include': [], 'exclude': []},
+            'rate_limit': DEFAULT_RATE_LIMIT,
+            'max_pages': DEFAULT_MAX_PAGES
+        }
+
+    # Apply CLI overrides for rate limiting
+    if args.no_rate_limit:
+        config['rate_limit'] = 0
+        logger.info("⚡ Rate limiting disabled")
+    elif args.rate_limit is not None:
+        config['rate_limit'] = args.rate_limit
+        if args.rate_limit == 0:
+            logger.info("⚡ Rate limiting disabled")
+        else:
+            logger.info("⚡ Rate limit override: %ss per page", args.rate_limit)
+
+    # Apply CLI overrides for worker count
+    if args.workers:
+        # Validate workers count
+        if args.workers < 1:
+            logger.error("❌ Error: --workers must be at least 1 (got %d)", args.workers)
+            logger.error("   Suggestion: Use --workers 1 (default) or omit the flag")
+            sys.exit(1)
+        if args.workers > 10:
+            logger.warning("⚠️  Warning: --workers capped at 10 (requested %d)", args.workers)
+            args.workers = 10
+        config['workers'] = args.workers
+        if args.workers > 1:
+            logger.info("🚀 Parallel scraping enabled: %d workers", args.workers)
+
+    # Apply CLI override for async mode
+    if args.async_mode:
+        config['async_mode'] = True
+        if config.get('workers', 1) > 1:
+            logger.info("⚡ Async mode enabled (2-3x faster than threads)")
+        else:
+            logger.warning("⚠️  Async mode enabled but workers=1. Consider using --workers 4 for better performance")
+
+    return config
+
+
+def execute_scraping_and_building(config: Dict[str, Any], args: argparse.Namespace) -> Optional['DocToSkillConverter']:
+    """Execute the scraping and skill building process.
+
+    Handles dry run mode, existing data checks, scraping with checkpoints,
+    keyboard interrupts, and skill building. This is the core workflow
+    orchestration for the scraping phase.
+
+    Args:
+        config (dict): Configuration dictionary with scraping parameters
+        args: Parsed command-line arguments
+
+    Returns:
+        DocToSkillConverter: The converter instance after scraping/building,
+                            or None if process was aborted
+
+    Example:
+        >>> config = {'name': 'react', 'base_url': 'https://react.dev'}
+        >>> converter = execute_scraping_and_building(config, args)
+        >>> if converter:
+        ...     print("Scraping complete!")
+    """
+    # Dry run mode - preview only
+    if args.dry_run:
+        logger.info("\n" + "=" * 60)
+        logger.info("DRY RUN MODE")
+        logger.info("=" * 60)
+        logger.info("This will show what would be scraped without saving anything.\n")
+
+        converter = DocToSkillConverter(config, dry_run=True)
+        converter.scrape_all()
+
+        logger.info("\n📋 Configuration Summary:")
+        logger.info("   Name: %s", config['name'])
+        logger.info("   Base URL: %s", config['base_url'])
+        logger.info("   Max pages: %d", config.get('max_pages', DEFAULT_MAX_PAGES))
+        logger.info("   Rate limit: %ss", config.get('rate_limit', DEFAULT_RATE_LIMIT))
+        logger.info("   Categories: %d", len(config.get('categories', {})))
+        return None
+
+    # Check for existing data
+    exists, page_count = check_existing_data(config['name'])
+
+    if exists and not args.skip_scrape and not args.fresh:
+        # Check force_rescrape flag from config
+        if config.get('force_rescrape', False):
+            # Auto-delete cached data and rescrape
+            logger.info("\n✓ Found existing data: %d pages", page_count)
+            logger.info("  force_rescrape enabled - deleting cached data and rescaping")
+            import shutil
+            data_dir = f"output/{config['name']}_data"
+            if os.path.exists(data_dir):
+                shutil.rmtree(data_dir)
+                logger.info(f"  Deleted: {data_dir}")
+        else:
+            # Only prompt if force_rescrape is False
+            logger.info("\n✓ Found existing data: %d pages", page_count)
+            response = input("Use existing data? (y/n): ").strip().lower()
+            if response == 'y':
+                args.skip_scrape = True
+    elif exists and args.fresh:
+        logger.info("\n✓ Found existing data: %d pages", page_count)
+        logger.info("  --fresh flag set, will re-scrape from scratch")
+
+    # Create converter
+    converter = DocToSkillConverter(config, resume=args.resume)
+
+    # Handle fresh start (clear checkpoint)
+    if args.fresh:
+        converter.clear_checkpoint()
+
+    # Scrape or skip
+    if not args.skip_scrape:
+        try:
+            converter.scrape_all()
+            # Save final checkpoint
+            if converter.checkpoint_enabled:
+                converter.save_checkpoint()
+                logger.info("\n💾 Final checkpoint saved")
+                # Clear checkpoint after successful completion
+                converter.clear_checkpoint()
+                logger.info("✅ Scraping complete - checkpoint cleared")
+        except KeyboardInterrupt:
+            logger.warning("\n\nScraping interrupted.")
+            if converter.checkpoint_enabled:
+                converter.save_checkpoint()
+                logger.info("💾 Progress saved to checkpoint")
+                logger.info("   Resume with: --config %s --resume", args.config if args.config else 'config.json')
+            response = input("Continue with skill building? (y/n): ").strip().lower()
+            if response != 'y':
+                return None
+    else:
+        logger.info("\n⏭️  Skipping scrape, using existing data")
+
+    # Build skill
+    success = converter.build_skill()
+
+    if not success:
+        sys.exit(1)
+
+    return converter
+
+
+def execute_enhancement(config: Dict[str, Any], args: argparse.Namespace) -> None:
+    """Execute optional SKILL.md enhancement with Claude.
+
+    Supports two enhancement modes:
+    1. API-based enhancement (requires ANTHROPIC_API_KEY)
+    2. Local enhancement using Claude Code (no API key needed)
+
+    Prints appropriate messages and suggestions based on whether
+    enhancement was requested and whether it succeeded.
+
+    Args:
+        config (dict): Configuration dictionary with skill name
+        args: Parsed command-line arguments with enhancement flags
+
+    Example:
+        >>> execute_enhancement(config, args)
+        # Runs enhancement if --enhance or --enhance-local flag is set
+    """
+    import subprocess
+
+    # Optional enhancement with Claude API
+    if args.enhance:
+        logger.info("\n" + "=" * 60)
+        logger.info("ENHANCING SKILL.MD WITH CLAUDE API")
+        logger.info("=" * 60 + "\n")
+
+        try:
+            enhance_cmd = ['python3', 'cli/enhance_skill.py', f'output/{config["name"]}/']
+            if args.api_key:
+                enhance_cmd.extend(['--api-key', args.api_key])
+
+            result = subprocess.run(enhance_cmd, check=True)
+            if result.returncode == 0:
+                logger.info("\n✅ Enhancement complete!")
+        except subprocess.CalledProcessError:
+            logger.warning("\n⚠ Enhancement failed, but skill was still built")
+        except FileNotFoundError:
+            logger.warning("\n⚠ enhance_skill.py not found. Run manually:")
+            logger.info("  skill-seekers-enhance output/%s/", config['name'])
+
+    # Optional enhancement with Claude Code (local, no API key)
+    if args.enhance_local:
+        logger.info("\n" + "=" * 60)
+        if args.interactive_enhancement:
+            logger.info("ENHANCING SKILL.MD WITH CLAUDE CODE (INTERACTIVE)")
+        else:
+            logger.info("ENHANCING SKILL.MD WITH CLAUDE CODE (HEADLESS)")
+        logger.info("=" * 60 + "\n")
+
+        try:
+            enhance_cmd = ['skill-seekers-enhance', f'output/{config["name"]}/']
+            if args.interactive_enhancement:
+                enhance_cmd.append('--interactive-enhancement')
+
+            result = subprocess.run(enhance_cmd, check=True)
+
+            if result.returncode == 0:
+                logger.info("\n✅ Enhancement complete!")
+        except subprocess.CalledProcessError:
+            logger.warning("\n⚠ Enhancement failed, but skill was still built")
+        except FileNotFoundError:
+            logger.warning("\n⚠ skill-seekers-enhance command not found. Run manually:")
+            logger.info("  skill-seekers-enhance output/%s/", config['name'])
+
+    # Print packaging instructions
+    logger.info("\n📦 Package your skill:")
+    logger.info("  skill-seekers-package output/%s/", config['name'])
+
+    # Suggest enhancement if not done
+    if not args.enhance and not args.enhance_local:
+        logger.info("\n💡 Optional: Enhance SKILL.md with Claude:")
+        logger.info("  Local (recommended):  skill-seekers-enhance output/%s/", config['name'])
+        logger.info("                        or re-run with: --enhance-local")
+        logger.info("  API-based:            skill-seekers-enhance-api output/%s/", config['name'])
+        logger.info("                        or re-run with: --enhance")
+        logger.info("\n💡 Tip: Use --interactive-enhancement with --enhance-local to open terminal window")
+
+
+def main() -> None:
+    parser = setup_argument_parser()
+    args = parser.parse_args()
+
+    # Setup logging based on verbosity flags
+    setup_logging(verbose=args.verbose, quiet=args.quiet)
+
+    config = get_configuration(args)
+
+    # Execute scraping and building
+    converter = execute_scraping_and_building(config, args)
+
+    # Exit if dry run or aborted
+    if converter is None:
+        return
+
+    # Execute enhancement and print instructions
+    execute_enhancement(config, args)
+
+
+if __name__ == "__main__":
+    main()

+ 273 - 0
libs/external/Skill_Seekers-development/src/skill_seekers/cli/enhance_skill.py

@@ -0,0 +1,273 @@
+#!/usr/bin/env python3
+"""
+SKILL.md Enhancement Script
+Uses Claude API to improve SKILL.md by analyzing reference documentation.
+
+Usage:
+    skill-seekers enhance output/steam-inventory/
+    skill-seekers enhance output/react/
+    skill-seekers enhance output/godot/ --api-key YOUR_API_KEY
+"""
+
+import os
+import sys
+import json
+import argparse
+from pathlib import Path
+
+# Add parent directory to path for imports when run as script
+sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+
+from skill_seekers.cli.constants import API_CONTENT_LIMIT, API_PREVIEW_LIMIT
+from skill_seekers.cli.utils import read_reference_files
+
+try:
+    import anthropic
+except ImportError:
+    print("❌ Error: anthropic package not installed")
+    print("Install with: pip3 install anthropic")
+    sys.exit(1)
+
+
+class SkillEnhancer:
+    def __init__(self, skill_dir, api_key=None):
+        self.skill_dir = Path(skill_dir)
+        self.references_dir = self.skill_dir / "references"
+        self.skill_md_path = self.skill_dir / "SKILL.md"
+
+        # Get API key
+        self.api_key = api_key or os.environ.get('ANTHROPIC_API_KEY')
+        if not self.api_key:
+            raise ValueError(
+                "No API key provided. Set ANTHROPIC_API_KEY environment variable "
+                "or use --api-key argument"
+            )
+
+        self.client = anthropic.Anthropic(api_key=self.api_key)
+
+    def read_current_skill_md(self):
+        """Read existing SKILL.md"""
+        if not self.skill_md_path.exists():
+            return None
+        return self.skill_md_path.read_text(encoding='utf-8')
+
+    def enhance_skill_md(self, references, current_skill_md):
+        """Use Claude to enhance SKILL.md"""
+
+        # Build prompt
+        prompt = self._build_enhancement_prompt(references, current_skill_md)
+
+        print("\n🤖 Asking Claude to enhance SKILL.md...")
+        print(f"   Input: {len(prompt):,} characters")
+
+        try:
+            message = self.client.messages.create(
+                model="claude-sonnet-4-20250514",
+                max_tokens=4096,
+                temperature=0.3,
+                messages=[{
+                    "role": "user",
+                    "content": prompt
+                }]
+            )
+
+            enhanced_content = message.content[0].text
+            return enhanced_content
+
+        except Exception as e:
+            print(f"❌ Error calling Claude API: {e}")
+            return None
+
+    def _build_enhancement_prompt(self, references, current_skill_md):
+        """Build the prompt for Claude"""
+
+        # Extract skill name and description
+        skill_name = self.skill_dir.name
+
+        prompt = f"""You are enhancing a Claude skill's SKILL.md file. This skill is about: {skill_name}
+
+I've scraped documentation and organized it into reference files. Your job is to create an EXCELLENT SKILL.md that will help Claude use this documentation effectively.
+
+CURRENT SKILL.MD:
+{'```markdown' if current_skill_md else '(none - create from scratch)'}
+{current_skill_md or 'No existing SKILL.md'}
+{'```' if current_skill_md else ''}
+
+REFERENCE DOCUMENTATION:
+"""
+
+        for filename, content in references.items():
+            prompt += f"\n\n## {filename}\n```markdown\n{content[:30000]}\n```\n"
+
+        prompt += """
+
+YOUR TASK:
+Create an enhanced SKILL.md that includes:
+
+1. **Clear "When to Use This Skill" section** - Be specific about trigger conditions
+2. **Excellent Quick Reference section** - Extract 5-10 of the BEST, most practical code examples from the reference docs
+   - Choose SHORT, clear examples that demonstrate common tasks
+   - Include both simple and intermediate examples
+   - Annotate examples with clear descriptions
+   - Use proper language tags (cpp, python, javascript, json, etc.)
+3. **Detailed Reference Files description** - Explain what's in each reference file
+4. **Practical "Working with This Skill" section** - Give users clear guidance on how to navigate the skill
+5. **Key Concepts section** (if applicable) - Explain core concepts
+6. **Keep the frontmatter** (---\nname: ...\n---) intact
+
+IMPORTANT:
+- Extract REAL examples from the reference docs, don't make them up
+- Prioritize SHORT, clear examples (5-20 lines max)
+- Make it actionable and practical
+- Don't be too verbose - be concise but useful
+- Maintain the markdown structure for Claude skills
+- Keep code examples properly formatted with language tags
+
+OUTPUT:
+Return ONLY the complete SKILL.md content, starting with the frontmatter (---).
+"""
+
+        return prompt
+
+    def save_enhanced_skill_md(self, content):
+        """Save the enhanced SKILL.md"""
+        # Backup original
+        if self.skill_md_path.exists():
+            backup_path = self.skill_md_path.with_suffix('.md.backup')
+            self.skill_md_path.rename(backup_path)
+            print(f"  💾 Backed up original to: {backup_path.name}")
+
+        # Save enhanced version
+        self.skill_md_path.write_text(content, encoding='utf-8')
+        print(f"  ✅ Saved enhanced SKILL.md")
+
+    def run(self):
+        """Main enhancement workflow"""
+        print(f"\n{'='*60}")
+        print(f"ENHANCING SKILL: {self.skill_dir.name}")
+        print(f"{'='*60}\n")
+
+        # Read reference files
+        print("📖 Reading reference documentation...")
+        references = read_reference_files(
+            self.skill_dir,
+            max_chars=API_CONTENT_LIMIT,
+            preview_limit=API_PREVIEW_LIMIT
+        )
+
+        if not references:
+            print("❌ No reference files found to analyze")
+            return False
+
+        print(f"  ✓ Read {len(references)} reference files")
+        total_size = sum(len(c) for c in references.values())
+        print(f"  ✓ Total size: {total_size:,} characters\n")
+
+        # Read current SKILL.md
+        current_skill_md = self.read_current_skill_md()
+        if current_skill_md:
+            print(f"  ℹ Found existing SKILL.md ({len(current_skill_md)} chars)")
+        else:
+            print(f"  ℹ No existing SKILL.md, will create new one")
+
+        # Enhance with Claude
+        enhanced = self.enhance_skill_md(references, current_skill_md)
+
+        if not enhanced:
+            print("❌ Enhancement failed")
+            return False
+
+        print(f"  ✓ Generated enhanced SKILL.md ({len(enhanced)} chars)\n")
+
+        # Save
+        print("💾 Saving enhanced SKILL.md...")
+        self.save_enhanced_skill_md(enhanced)
+
+        print(f"\n✅ Enhancement complete!")
+        print(f"\nNext steps:")
+        print(f"  1. Review: {self.skill_md_path}")
+        print(f"  2. If you don't like it, restore backup: {self.skill_md_path.with_suffix('.md.backup')}")
+        print(f"  3. Package your skill:")
+        print(f"     skill-seekers package {self.skill_dir}/")
+
+        return True
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description='Enhance SKILL.md using Claude API',
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  # Using ANTHROPIC_API_KEY environment variable
+  export ANTHROPIC_API_KEY=sk-ant-...
+  skill-seekers enhance output/steam-inventory/
+
+  # Providing API key directly
+  skill-seekers enhance output/react/ --api-key sk-ant-...
+
+  # Show what would be done (dry run)
+  skill-seekers enhance output/godot/ --dry-run
+"""
+    )
+
+    parser.add_argument('skill_dir', type=str,
+                       help='Path to skill directory (e.g., output/steam-inventory/)')
+    parser.add_argument('--api-key', type=str,
+                       help='Anthropic API key (or set ANTHROPIC_API_KEY env var)')
+    parser.add_argument('--dry-run', action='store_true',
+                       help='Show what would be done without calling API')
+
+    args = parser.parse_args()
+
+    # Validate skill directory
+    skill_dir = Path(args.skill_dir)
+    if not skill_dir.exists():
+        print(f"❌ Error: Directory not found: {skill_dir}")
+        sys.exit(1)
+
+    if not skill_dir.is_dir():
+        print(f"❌ Error: Not a directory: {skill_dir}")
+        sys.exit(1)
+
+    # Dry run mode
+    if args.dry_run:
+        print(f"🔍 DRY RUN MODE")
+        print(f"   Would enhance: {skill_dir}")
+        print(f"   References: {skill_dir / 'references'}")
+        print(f"   SKILL.md: {skill_dir / 'SKILL.md'}")
+
+        refs_dir = skill_dir / "references"
+        if refs_dir.exists():
+            ref_files = list(refs_dir.glob("*.md"))
+            print(f"   Found {len(ref_files)} reference files:")
+            for rf in ref_files:
+                size = rf.stat().st_size
+                print(f"     - {rf.name} ({size:,} bytes)")
+
+        print("\nTo actually run enhancement:")
+        print(f"  skill-seekers enhance {skill_dir}")
+        return
+
+    # Create enhancer and run
+    try:
+        enhancer = SkillEnhancer(skill_dir, api_key=args.api_key)
+        success = enhancer.run()
+        sys.exit(0 if success else 1)
+
+    except ValueError as e:
+        print(f"❌ Error: {e}")
+        print("\nSet your API key:")
+        print("  export ANTHROPIC_API_KEY=sk-ant-...")
+        print("Or provide it directly:")
+        print(f"  skill-seekers enhance {skill_dir} --api-key sk-ant-...")
+        sys.exit(1)
+    except Exception as e:
+        print(f"❌ Unexpected error: {e}")
+        import traceback
+        traceback.print_exc()
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()

+ 451 - 0
libs/external/Skill_Seekers-development/src/skill_seekers/cli/enhance_skill_local.py

@@ -0,0 +1,451 @@
+#!/usr/bin/env python3
+"""
+SKILL.md Enhancement Script (Local - Using Claude Code)
+Opens a new terminal with Claude Code to enhance SKILL.md, then reports back.
+No API key needed - uses your existing Claude Code Max plan!
+
+Usage:
+    skill-seekers enhance output/steam-inventory/
+    skill-seekers enhance output/react/
+
+Terminal Selection:
+    The script automatically detects which terminal app to use:
+    1. SKILL_SEEKER_TERMINAL env var (highest priority)
+       Example: export SKILL_SEEKER_TERMINAL="Ghostty"
+    2. TERM_PROGRAM env var (current terminal)
+    3. Terminal.app (fallback)
+
+    Supported terminals: Ghostty, iTerm, Terminal, WezTerm
+"""
+
+import os
+import sys
+import time
+import subprocess
+import tempfile
+from pathlib import Path
+
+# Add parent directory to path for imports when run as script
+sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+
+from skill_seekers.cli.constants import LOCAL_CONTENT_LIMIT, LOCAL_PREVIEW_LIMIT
+from skill_seekers.cli.utils import read_reference_files
+
+
+def detect_terminal_app():
+    """Detect which terminal app to use with cascading priority.
+
+    Priority order:
+        1. SKILL_SEEKER_TERMINAL environment variable (explicit user preference)
+        2. TERM_PROGRAM environment variable (inherit current terminal)
+        3. Terminal.app (fallback default)
+
+    Returns:
+        tuple: (terminal_app_name, detection_method)
+            - terminal_app_name (str): Name of terminal app to launch (e.g., "Ghostty", "Terminal")
+            - detection_method (str): How the terminal was detected (for logging)
+
+    Examples:
+        >>> os.environ['SKILL_SEEKER_TERMINAL'] = 'Ghostty'
+        >>> detect_terminal_app()
+        ('Ghostty', 'SKILL_SEEKER_TERMINAL')
+
+        >>> os.environ['TERM_PROGRAM'] = 'iTerm.app'
+        >>> detect_terminal_app()
+        ('iTerm', 'TERM_PROGRAM')
+    """
+    # Map TERM_PROGRAM values to macOS app names
+    TERMINAL_MAP = {
+        'Apple_Terminal': 'Terminal',
+        'iTerm.app': 'iTerm',
+        'ghostty': 'Ghostty',
+        'WezTerm': 'WezTerm',
+    }
+
+    # Priority 1: Check SKILL_SEEKER_TERMINAL env var (explicit preference)
+    preferred_terminal = os.environ.get('SKILL_SEEKER_TERMINAL', '').strip()
+    if preferred_terminal:
+        return preferred_terminal, 'SKILL_SEEKER_TERMINAL'
+
+    # Priority 2: Check TERM_PROGRAM (inherit current terminal)
+    term_program = os.environ.get('TERM_PROGRAM', '').strip()
+    if term_program and term_program in TERMINAL_MAP:
+        return TERMINAL_MAP[term_program], 'TERM_PROGRAM'
+
+    # Priority 3: Fallback to Terminal.app
+    if term_program:
+        # TERM_PROGRAM is set but unknown
+        return 'Terminal', f'unknown TERM_PROGRAM ({term_program})'
+    else:
+        # No TERM_PROGRAM set
+        return 'Terminal', 'default'
+
+
+class LocalSkillEnhancer:
+    def __init__(self, skill_dir):
+        self.skill_dir = Path(skill_dir)
+        self.references_dir = self.skill_dir / "references"
+        self.skill_md_path = self.skill_dir / "SKILL.md"
+
+    def create_enhancement_prompt(self):
+        """Create the prompt file for Claude Code"""
+
+        # Read reference files
+        references = read_reference_files(
+            self.skill_dir,
+            max_chars=LOCAL_CONTENT_LIMIT,
+            preview_limit=LOCAL_PREVIEW_LIMIT
+        )
+
+        if not references:
+            print("❌ No reference files found")
+            return None
+
+        # Read current SKILL.md
+        current_skill_md = ""
+        if self.skill_md_path.exists():
+            current_skill_md = self.skill_md_path.read_text(encoding='utf-8')
+
+        # Build prompt
+        prompt = f"""I need you to enhance the SKILL.md file for the {self.skill_dir.name} skill.
+
+CURRENT SKILL.MD:
+{'-'*60}
+{current_skill_md if current_skill_md else '(No existing SKILL.md - create from scratch)'}
+{'-'*60}
+
+REFERENCE DOCUMENTATION:
+{'-'*60}
+"""
+
+        for filename, content in references.items():
+            prompt += f"\n## {filename}\n{content[:15000]}\n"
+
+        prompt += f"""
+{'-'*60}
+
+YOUR TASK:
+Create an EXCELLENT SKILL.md file that will help Claude use this documentation effectively.
+
+Requirements:
+1. **Clear "When to Use This Skill" section**
+   - Be SPECIFIC about trigger conditions
+   - List concrete use cases
+
+2. **Excellent Quick Reference section**
+   - Extract 5-10 of the BEST, most practical code examples from the reference docs
+   - Choose SHORT, clear examples (5-20 lines max)
+   - Include both simple and intermediate examples
+   - Use proper language tags (cpp, python, javascript, json, etc.)
+   - Add clear descriptions for each example
+
+3. **Detailed Reference Files description**
+   - Explain what's in each reference file
+   - Help users navigate the documentation
+
+4. **Practical "Working with This Skill" section**
+   - Clear guidance for beginners, intermediate, and advanced users
+   - Navigation tips
+
+5. **Key Concepts section** (if applicable)
+   - Explain core concepts
+   - Define important terminology
+
+IMPORTANT:
+- Extract REAL examples from the reference docs above
+- Prioritize SHORT, clear examples
+- Make it actionable and practical
+- Keep the frontmatter (---\\nname: ...\\n---) intact
+- Use proper markdown formatting
+
+SAVE THE RESULT:
+Save the complete enhanced SKILL.md to: {self.skill_md_path.absolute()}
+
+First, backup the original to: {self.skill_md_path.with_suffix('.md.backup').absolute()}
+"""
+
+        return prompt
+
+    def run(self, headless=True, timeout=600):
+        """Main enhancement workflow
+
+        Args:
+            headless: If True, run claude directly without opening terminal (default: True)
+            timeout: Maximum time to wait for enhancement in seconds (default: 600 = 10 minutes)
+        """
+        print(f"\n{'='*60}")
+        print(f"LOCAL ENHANCEMENT: {self.skill_dir.name}")
+        print(f"{'='*60}\n")
+
+        # Validate
+        if not self.skill_dir.exists():
+            print(f"❌ Directory not found: {self.skill_dir}")
+            return False
+
+        # Read reference files
+        print("📖 Reading reference documentation...")
+        references = read_reference_files(
+            self.skill_dir,
+            max_chars=LOCAL_CONTENT_LIMIT,
+            preview_limit=LOCAL_PREVIEW_LIMIT
+        )
+
+        if not references:
+            print("❌ No reference files found to analyze")
+            return False
+
+        print(f"  ✓ Read {len(references)} reference files")
+        total_size = sum(len(c) for c in references.values())
+        print(f"  ✓ Total size: {total_size:,} characters\n")
+
+        # Create prompt
+        print("📝 Creating enhancement prompt...")
+        prompt = self.create_enhancement_prompt()
+
+        if not prompt:
+            return False
+
+        # Save prompt to temp file
+        with tempfile.NamedTemporaryFile(mode='w', suffix='.txt', delete=False, encoding='utf-8') as f:
+            prompt_file = f.name
+            f.write(prompt)
+
+        print(f"  ✓ Prompt saved ({len(prompt):,} characters)\n")
+
+        # Headless mode: Run claude directly without opening terminal
+        if headless:
+            return self._run_headless(prompt_file, timeout)
+
+        # Terminal mode: Launch Claude Code in new terminal
+        print("🚀 Launching Claude Code in new terminal...")
+        print("   This will:")
+        print("   1. Open a new terminal window")
+        print("   2. Run Claude Code with the enhancement task")
+        print("   3. Claude will read the docs and enhance SKILL.md")
+        print("   4. Terminal will auto-close when done")
+        print()
+
+        # Create a shell script to run in the terminal
+        shell_script = f'''#!/bin/bash
+claude {prompt_file}
+echo ""
+echo "✅ Enhancement complete!"
+echo "Press any key to close..."
+read -n 1
+rm {prompt_file}
+'''
+
+        # Save shell script
+        with tempfile.NamedTemporaryFile(mode='w', suffix='.sh', delete=False) as f:
+            script_file = f.name
+            f.write(shell_script)
+
+        os.chmod(script_file, 0o755)
+
+        # Launch in new terminal (macOS specific)
+        if sys.platform == 'darwin':
+            # Detect which terminal app to use
+            terminal_app, detection_method = detect_terminal_app()
+
+            # Show detection info
+            if detection_method == 'SKILL_SEEKER_TERMINAL':
+                print(f"   Using terminal: {terminal_app} (from SKILL_SEEKER_TERMINAL)")
+            elif detection_method == 'TERM_PROGRAM':
+                print(f"   Using terminal: {terminal_app} (inherited from current terminal)")
+            elif detection_method.startswith('unknown TERM_PROGRAM'):
+                print(f"⚠️  {detection_method}")
+                print(f"   → Using Terminal.app as fallback")
+            else:
+                print(f"   Using terminal: {terminal_app} (default)")
+
+            try:
+                subprocess.Popen(['open', '-a', terminal_app, script_file])
+            except Exception as e:
+                print(f"⚠️  Error launching {terminal_app}: {e}")
+                print(f"\nManually run: {script_file}")
+                return False
+        else:
+            print("⚠️  Auto-launch only works on macOS")
+            print(f"\nManually run this command in a new terminal:")
+            print(f"  claude '{prompt_file}'")
+            print(f"\nThen delete the prompt file:")
+            print(f"  rm '{prompt_file}'")
+            return False
+
+        print("✅ New terminal launched with Claude Code!")
+        print()
+        print("📊 Status:")
+        print(f"  - Prompt file: {prompt_file}")
+        print(f"  - Skill directory: {self.skill_dir.absolute()}")
+        print(f"  - SKILL.md will be saved to: {self.skill_md_path.absolute()}")
+        print(f"  - Original backed up to: {self.skill_md_path.with_suffix('.md.backup').absolute()}")
+        print()
+        print("⏳ Wait for Claude Code to finish in the other terminal...")
+        print("   (Usually takes 30-60 seconds)")
+        print()
+        print("💡 When done:")
+        print(f"  1. Check the enhanced SKILL.md: {self.skill_md_path}")
+        print(f"  2. If you don't like it, restore: mv {self.skill_md_path.with_suffix('.md.backup')} {self.skill_md_path}")
+        print(f"  3. Package: skill-seekers package {self.skill_dir}/")
+
+        return True
+
+    def _run_headless(self, prompt_file, timeout):
+        """Run Claude enhancement in headless mode (no terminal window)
+
+        Args:
+            prompt_file: Path to prompt file
+            timeout: Maximum seconds to wait
+
+        Returns:
+            bool: True if enhancement succeeded
+        """
+        import time
+        from pathlib import Path
+
+        print("✨ Running Claude Code enhancement (headless mode)...")
+        print(f"   Timeout: {timeout} seconds ({timeout//60} minutes)")
+        print()
+
+        # Record initial state
+        initial_mtime = self.skill_md_path.stat().st_mtime if self.skill_md_path.exists() else 0
+        initial_size = self.skill_md_path.stat().st_size if self.skill_md_path.exists() else 0
+
+        # Start timer
+        start_time = time.time()
+
+        try:
+            # Run claude command directly (this WAITS for completion)
+            print("   Running: claude {prompt_file}")
+            print("   ⏳ Please wait...")
+            print()
+
+            result = subprocess.run(
+                ['claude', prompt_file],
+                capture_output=True,
+                text=True,
+                timeout=timeout
+            )
+
+            elapsed = time.time() - start_time
+
+            # Check if successful
+            if result.returncode == 0:
+                # Verify SKILL.md was actually updated
+                if self.skill_md_path.exists():
+                    new_mtime = self.skill_md_path.stat().st_mtime
+                    new_size = self.skill_md_path.stat().st_size
+
+                    if new_mtime > initial_mtime and new_size > initial_size:
+                        print(f"✅ Enhancement complete! ({elapsed:.1f} seconds)")
+                        print(f"   SKILL.md updated: {new_size:,} bytes")
+                        print()
+
+                        # Clean up prompt file
+                        try:
+                            os.unlink(prompt_file)
+                        except:
+                            pass
+
+                        return True
+                    else:
+                        print(f"⚠️  Claude finished but SKILL.md was not updated")
+                        print(f"   This might indicate an error during enhancement")
+                        print()
+                        return False
+                else:
+                    print(f"❌ SKILL.md not found after enhancement")
+                    return False
+            else:
+                print(f"❌ Claude Code returned error (exit code: {result.returncode})")
+                if result.stderr:
+                    print(f"   Error: {result.stderr[:200]}")
+                return False
+
+        except subprocess.TimeoutExpired:
+            elapsed = time.time() - start_time
+            print(f"\n⚠️  Enhancement timed out after {elapsed:.0f} seconds")
+            print(f"   Timeout limit: {timeout} seconds")
+            print()
+            print("   Possible reasons:")
+            print("   - Skill is very large (many references)")
+            print("   - Claude is taking longer than usual")
+            print("   - Network issues")
+            print()
+            print("   Try:")
+            print("   1. Use terminal mode: --interactive-enhancement")
+            print("   2. Reduce reference content")
+            print("   3. Try again later")
+
+            # Clean up
+            try:
+                os.unlink(prompt_file)
+            except:
+                pass
+
+            return False
+
+        except FileNotFoundError:
+            print("❌ 'claude' command not found")
+            print()
+            print("   Make sure Claude Code CLI is installed:")
+            print("   See: https://docs.claude.com/claude-code")
+            print()
+            print("   Try terminal mode instead: --interactive-enhancement")
+
+            return False
+
+        except Exception as e:
+            print(f"❌ Unexpected error: {e}")
+            return False
+
+
+def main():
+    import argparse
+
+    parser = argparse.ArgumentParser(
+        description="Enhance a skill with Claude Code (local)",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  # Headless mode (default - runs in background)
+  skill-seekers enhance output/react/
+
+  # Interactive mode (opens terminal window)
+  skill-seekers enhance output/react/ --interactive-enhancement
+
+  # Custom timeout
+  skill-seekers enhance output/react/ --timeout 1200
+"""
+    )
+
+    parser.add_argument(
+        'skill_directory',
+        help='Path to skill directory (e.g., output/react/)'
+    )
+
+    parser.add_argument(
+        '--interactive-enhancement',
+        action='store_true',
+        help='Open terminal window for enhancement (default: headless mode)'
+    )
+
+    parser.add_argument(
+        '--timeout',
+        type=int,
+        default=600,
+        help='Timeout in seconds for headless mode (default: 600 = 10 minutes)'
+    )
+
+    args = parser.parse_args()
+
+    # Run enhancement
+    enhancer = LocalSkillEnhancer(args.skill_directory)
+    headless = not args.interactive_enhancement  # Invert: default is headless
+    success = enhancer.run(headless=headless, timeout=args.timeout)
+
+    sys.exit(0 if success else 1)
+
+
+if __name__ == "__main__":
+    main()

+ 288 - 0
libs/external/Skill_Seekers-development/src/skill_seekers/cli/estimate_pages.py

@@ -0,0 +1,288 @@
+#!/usr/bin/env python3
+"""
+Page Count Estimator for Skill Seeker
+Quickly estimates how many pages a config will scrape without downloading content
+"""
+
+import sys
+import os
+import requests
+from bs4 import BeautifulSoup
+from urllib.parse import urljoin, urlparse
+import time
+import json
+
+# Add parent directory to path for imports when run as script
+sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+
+from skill_seekers.cli.constants import (
+    DEFAULT_RATE_LIMIT,
+    DEFAULT_MAX_DISCOVERY,
+    DISCOVERY_THRESHOLD
+)
+
+
+def estimate_pages(config, max_discovery=DEFAULT_MAX_DISCOVERY, timeout=30):
+    """
+    Estimate total pages that will be scraped
+
+    Args:
+        config: Configuration dictionary
+        max_discovery: Maximum pages to discover (safety limit, use -1 for unlimited)
+        timeout: Timeout for HTTP requests in seconds
+
+    Returns:
+        dict with estimation results
+    """
+    base_url = config['base_url']
+    start_urls = config.get('start_urls', [base_url])
+    url_patterns = config.get('url_patterns', {'include': [], 'exclude': []})
+    rate_limit = config.get('rate_limit', DEFAULT_RATE_LIMIT)
+
+    visited = set()
+    pending = list(start_urls)
+    discovered = 0
+
+    include_patterns = url_patterns.get('include', [])
+    exclude_patterns = url_patterns.get('exclude', [])
+
+    # Handle unlimited mode
+    unlimited = (max_discovery == -1 or max_discovery is None)
+
+    print(f"🔍 Estimating pages for: {config['name']}")
+    print(f"📍 Base URL: {base_url}")
+    print(f"🎯 Start URLs: {len(start_urls)}")
+    print(f"⏱️  Rate limit: {rate_limit}s")
+
+    if unlimited:
+        print(f"🔢 Max discovery: UNLIMITED (will discover all pages)")
+        print(f"⚠️  WARNING: This may take a long time!")
+    else:
+        print(f"🔢 Max discovery: {max_discovery}")
+
+    print()
+
+    start_time = time.time()
+
+    # Loop condition: stop if no more URLs, or if limit reached (when not unlimited)
+    while pending and (unlimited or discovered < max_discovery):
+        url = pending.pop(0)
+
+        # Skip if already visited
+        if url in visited:
+            continue
+
+        visited.add(url)
+        discovered += 1
+
+        # Progress indicator
+        if discovered % 10 == 0:
+            elapsed = time.time() - start_time
+            rate = discovered / elapsed if elapsed > 0 else 0
+            print(f"⏳ Discovered: {discovered} pages ({rate:.1f} pages/sec)", end='\r')
+
+        try:
+            # HEAD request first to check if page exists (faster)
+            head_response = requests.head(url, timeout=timeout, allow_redirects=True)
+
+            # Skip non-HTML content
+            content_type = head_response.headers.get('Content-Type', '')
+            if 'text/html' not in content_type:
+                continue
+
+            # Now GET the page to find links
+            response = requests.get(url, timeout=timeout)
+            response.raise_for_status()
+
+            soup = BeautifulSoup(response.content, 'html.parser')
+
+            # Find all links
+            for link in soup.find_all('a', href=True):
+                href = link['href']
+                full_url = urljoin(url, href)
+
+                # Normalize URL
+                parsed = urlparse(full_url)
+                full_url = f"{parsed.scheme}://{parsed.netloc}{parsed.path}"
+
+                # Check if URL is valid
+                if not is_valid_url(full_url, base_url, include_patterns, exclude_patterns):
+                    continue
+
+                # Add to pending if not visited
+                if full_url not in visited and full_url not in pending:
+                    pending.append(full_url)
+
+            # Rate limiting
+            time.sleep(rate_limit)
+
+        except requests.RequestException as e:
+            # Silently skip errors during estimation
+            pass
+        except Exception as e:
+            # Silently skip other errors
+            pass
+
+    elapsed = time.time() - start_time
+
+    # Results
+    results = {
+        'discovered': discovered,
+        'pending': len(pending),
+        'estimated_total': discovered + len(pending),
+        'elapsed_seconds': round(elapsed, 2),
+        'discovery_rate': round(discovered / elapsed if elapsed > 0 else 0, 2),
+        'hit_limit': (not unlimited) and (discovered >= max_discovery),
+        'unlimited': unlimited
+    }
+
+    return results
+
+
+def is_valid_url(url, base_url, include_patterns, exclude_patterns):
+    """Check if URL should be crawled"""
+    # Must be same domain
+    if not url.startswith(base_url.rstrip('/')):
+        return False
+
+    # Check exclude patterns first
+    if exclude_patterns:
+        for pattern in exclude_patterns:
+            if pattern in url:
+                return False
+
+    # Check include patterns (if specified)
+    if include_patterns:
+        for pattern in include_patterns:
+            if pattern in url:
+                return True
+        return False
+
+    # If no include patterns, accept by default
+    return True
+
+
+def print_results(results, config):
+    """Print estimation results"""
+    print()
+    print("=" * 70)
+    print("📊 ESTIMATION RESULTS")
+    print("=" * 70)
+    print()
+    print(f"Config: {config['name']}")
+    print(f"Base URL: {config['base_url']}")
+    print()
+    print(f"✅ Pages Discovered: {results['discovered']}")
+    print(f"⏳ Pages Pending: {results['pending']}")
+    print(f"📈 Estimated Total: {results['estimated_total']}")
+    print()
+    print(f"⏱️  Time Elapsed: {results['elapsed_seconds']}s")
+    print(f"⚡ Discovery Rate: {results['discovery_rate']} pages/sec")
+
+    if results.get('unlimited', False):
+        print()
+        print("✅ UNLIMITED MODE - Discovered all reachable pages")
+        print(f"   Total pages: {results['estimated_total']}")
+    elif results['hit_limit']:
+        print()
+        print("⚠️  Hit discovery limit - actual total may be higher")
+        print("   Increase max_discovery parameter for more accurate estimate")
+
+    print()
+    print("=" * 70)
+    print("💡 RECOMMENDATIONS")
+    print("=" * 70)
+    print()
+
+    estimated = results['estimated_total']
+    current_max = config.get('max_pages', 100)
+
+    if estimated <= current_max:
+        print(f"✅ Current max_pages ({current_max}) is sufficient")
+    else:
+        recommended = min(estimated + 50, DISCOVERY_THRESHOLD)  # Add 50 buffer, cap at threshold
+        print(f"⚠️  Current max_pages ({current_max}) may be too low")
+        print(f"📝 Recommended max_pages: {recommended}")
+        print(f"   (Estimated {estimated} + 50 buffer)")
+
+    # Estimate time for full scrape
+    rate_limit = config.get('rate_limit', DEFAULT_RATE_LIMIT)
+    estimated_time = (estimated * rate_limit) / 60  # in minutes
+
+    print()
+    print(f"⏱️  Estimated full scrape time: {estimated_time:.1f} minutes")
+    print(f"   (Based on rate_limit: {rate_limit}s)")
+
+    print()
+
+
+def load_config(config_path):
+    """Load configuration from JSON file"""
+    try:
+        with open(config_path, 'r') as f:
+            config = json.load(f)
+        return config
+    except FileNotFoundError:
+        print(f"❌ Error: Config file not found: {config_path}")
+        sys.exit(1)
+    except json.JSONDecodeError as e:
+        print(f"❌ Error: Invalid JSON in config file: {e}")
+        sys.exit(1)
+
+
+def main():
+    """Main entry point"""
+    import argparse
+
+    parser = argparse.ArgumentParser(
+        description='Estimate page count for Skill Seeker configs',
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  # Estimate pages for a config
+  skill-seekers estimate configs/react.json
+
+  # Estimate with higher discovery limit
+  skill-seekers estimate configs/godot.json --max-discovery 2000
+
+  # Quick estimate (stop at 100 pages)
+  skill-seekers estimate configs/vue.json --max-discovery 100
+        """
+    )
+
+    parser.add_argument('config', help='Path to config JSON file')
+    parser.add_argument('--max-discovery', '-m', type=int, default=DEFAULT_MAX_DISCOVERY,
+                       help=f'Maximum pages to discover (default: {DEFAULT_MAX_DISCOVERY}, use -1 for unlimited)')
+    parser.add_argument('--unlimited', '-u', action='store_true',
+                       help='Remove discovery limit - discover all pages (same as --max-discovery -1)')
+    parser.add_argument('--timeout', '-t', type=int, default=30,
+                       help='HTTP request timeout in seconds (default: 30)')
+
+    args = parser.parse_args()
+
+    # Handle unlimited flag
+    max_discovery = -1 if args.unlimited else args.max_discovery
+
+    # Load config
+    config = load_config(args.config)
+
+    # Run estimation
+    try:
+        results = estimate_pages(config, max_discovery, args.timeout)
+        print_results(results, config)
+
+        # Return exit code based on results
+        if results['hit_limit']:
+            return 2  # Warning: hit limit
+        return 0  # Success
+
+    except KeyboardInterrupt:
+        print("\n\n⚠️  Estimation interrupted by user")
+        return 1
+    except Exception as e:
+        print(f"\n\n❌ Error during estimation: {e}")
+        return 1
+
+
+if __name__ == '__main__':
+    sys.exit(main())

+ 274 - 0
libs/external/Skill_Seekers-development/src/skill_seekers/cli/generate_router.py

@@ -0,0 +1,274 @@
+#!/usr/bin/env python3
+"""
+Router Skill Generator
+
+Creates a router/hub skill that intelligently directs queries to specialized sub-skills.
+This is used for large documentation sites split into multiple focused skills.
+"""
+
+import json
+import sys
+import argparse
+from pathlib import Path
+from typing import Dict, List, Any, Tuple
+
+
+class RouterGenerator:
+    """Generates router skills that direct to specialized sub-skills"""
+
+    def __init__(self, config_paths: List[str], router_name: str = None):
+        self.config_paths = [Path(p) for p in config_paths]
+        self.configs = [self.load_config(p) for p in self.config_paths]
+        self.router_name = router_name or self.infer_router_name()
+        self.base_config = self.configs[0]  # Use first as template
+
+    def load_config(self, path: Path) -> Dict[str, Any]:
+        """Load a config file"""
+        try:
+            with open(path, 'r') as f:
+                return json.load(f)
+        except Exception as e:
+            print(f"❌ Error loading {path}: {e}")
+            sys.exit(1)
+
+    def infer_router_name(self) -> str:
+        """Infer router name from sub-skill names"""
+        # Find common prefix
+        names = [cfg['name'] for cfg in self.configs]
+        if not names:
+            return "router"
+
+        # Get common prefix before first dash
+        first_name = names[0]
+        if '-' in first_name:
+            return first_name.split('-')[0]
+        return first_name
+
+    def extract_routing_keywords(self) -> Dict[str, List[str]]:
+        """Extract keywords for routing to each skill"""
+        routing = {}
+
+        for config in self.configs:
+            name = config['name']
+            keywords = []
+
+            # Extract from categories
+            if 'categories' in config:
+                keywords.extend(config['categories'].keys())
+
+            # Extract from name (part after dash)
+            if '-' in name:
+                skill_topic = name.split('-', 1)[1]
+                keywords.append(skill_topic)
+
+            routing[name] = keywords
+
+        return routing
+
+    def generate_skill_md(self) -> str:
+        """Generate router SKILL.md content"""
+        routing_keywords = self.extract_routing_keywords()
+
+        skill_md = f"""# {self.router_name.replace('-', ' ').title()} Documentation (Router)
+
+## When to Use This Skill
+
+{self.base_config.get('description', f'Use for {self.router_name} development and programming.')}
+
+This is a router skill that directs your questions to specialized sub-skills for efficient, focused assistance.
+
+## How It Works
+
+This skill analyzes your question and activates the appropriate specialized skill(s):
+
+"""
+
+        # List sub-skills
+        for config in self.configs:
+            name = config['name']
+            desc = config.get('description', '')
+            # Remove router name prefix from description if present
+            if desc.startswith(f"{self.router_name.title()} -"):
+                desc = desc.split(' - ', 1)[1]
+
+            skill_md += f"### {name}\n{desc}\n\n"
+
+        # Routing logic
+        skill_md += """## Routing Logic
+
+The router analyzes your question for topic keywords and activates relevant skills:
+
+**Keywords → Skills:**
+"""
+
+        for skill_name, keywords in routing_keywords.items():
+            keyword_str = ", ".join(keywords)
+            skill_md += f"- {keyword_str} → **{skill_name}**\n"
+
+        # Quick reference
+        skill_md += f"""
+
+## Quick Reference
+
+For quick answers, this router provides basic overview information. For detailed documentation, the specialized skills contain comprehensive references.
+
+### Getting Started
+
+1. Ask your question naturally - mention the topic area
+2. The router will activate the appropriate skill(s)
+3. You'll receive focused, detailed answers from specialized documentation
+
+### Examples
+
+**Question:** "How do I create a 2D sprite?"
+**Activates:** {self.router_name}-2d skill
+
+**Question:** "GDScript function syntax"
+**Activates:** {self.router_name}-scripting skill
+
+**Question:** "Physics collision handling in 3D"
+**Activates:** {self.router_name}-3d + {self.router_name}-physics skills
+
+### All Available Skills
+
+"""
+
+        # List all skills
+        for config in self.configs:
+            skill_md += f"- **{config['name']}**\n"
+
+        skill_md += f"""
+
+## Need Help?
+
+Simply ask your question and mention the topic. The router will find the right specialized skill for you!
+
+---
+
+*This is a router skill. For complete documentation, see the specialized skills listed above.*
+"""
+
+        return skill_md
+
+    def create_router_config(self) -> Dict[str, Any]:
+        """Create router configuration"""
+        routing_keywords = self.extract_routing_keywords()
+
+        router_config = {
+            "name": self.router_name,
+            "description": self.base_config.get('description', f'{self.router_name.title()} documentation router'),
+            "base_url": self.base_config['base_url'],
+            "selectors": self.base_config.get('selectors', {}),
+            "url_patterns": self.base_config.get('url_patterns', {}),
+            "rate_limit": self.base_config.get('rate_limit', 0.5),
+            "max_pages": 500,  # Router only scrapes overview pages
+            "_router": True,
+            "_sub_skills": [cfg['name'] for cfg in self.configs],
+            "_routing_keywords": routing_keywords
+        }
+
+        return router_config
+
+    def generate(self, output_dir: Path = None) -> Tuple[Path, Path]:
+        """Generate router skill and config"""
+        if output_dir is None:
+            output_dir = self.config_paths[0].parent
+
+        output_dir = Path(output_dir)
+
+        # Generate SKILL.md
+        skill_md = self.generate_skill_md()
+        skill_path = output_dir.parent / f"output/{self.router_name}/SKILL.md"
+        skill_path.parent.mkdir(parents=True, exist_ok=True)
+
+        with open(skill_path, 'w') as f:
+            f.write(skill_md)
+
+        # Generate config
+        router_config = self.create_router_config()
+        config_path = output_dir / f"{self.router_name}.json"
+
+        with open(config_path, 'w') as f:
+            json.dump(router_config, f, indent=2)
+
+        return config_path, skill_path
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Generate router/hub skill for split documentation",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  # Generate router from multiple configs
+  python3 generate_router.py configs/godot-2d.json configs/godot-3d.json configs/godot-scripting.json
+
+  # Use glob pattern
+  python3 generate_router.py configs/godot-*.json
+
+  # Custom router name
+  python3 generate_router.py configs/godot-*.json --name godot-hub
+
+  # Custom output directory
+  python3 generate_router.py configs/godot-*.json --output-dir configs/routers/
+        """
+    )
+
+    parser.add_argument(
+        'configs',
+        nargs='+',
+        help='Sub-skill config files'
+    )
+
+    parser.add_argument(
+        '--name',
+        help='Router skill name (default: inferred from sub-skills)'
+    )
+
+    parser.add_argument(
+        '--output-dir',
+        help='Output directory (default: same as input configs)'
+    )
+
+    args = parser.parse_args()
+
+    # Filter out router configs (avoid recursion)
+    config_files = []
+    for path_str in args.configs:
+        path = Path(path_str)
+        if path.exists() and not path.stem.endswith('-router'):
+            config_files.append(path_str)
+
+    if not config_files:
+        print("❌ Error: No valid config files provided")
+        sys.exit(1)
+
+    print(f"\n{'='*60}")
+    print("ROUTER SKILL GENERATOR")
+    print(f"{'='*60}")
+    print(f"Sub-skills: {len(config_files)}")
+    for cfg in config_files:
+        print(f"  - {Path(cfg).stem}")
+    print("")
+
+    # Generate router
+    generator = RouterGenerator(config_files, args.name)
+    config_path, skill_path = generator.generate(args.output_dir)
+
+    print(f"✅ Router config created: {config_path}")
+    print(f"✅ Router SKILL.md created: {skill_path}")
+    print("")
+    print(f"{'='*60}")
+    print("NEXT STEPS")
+    print(f"{'='*60}")
+    print(f"1. Review router SKILL.md: {skill_path}")
+    print(f"2. Optionally scrape router (for overview pages):")
+    print(f"     skill-seekers scrape --config {config_path}")
+    print("3. Package router skill:")
+    print(f"     skill-seekers package output/{generator.router_name}/")
+    print("4. Upload router + all sub-skills to Claude")
+    print("")
+
+
+if __name__ == "__main__":
+    main()

+ 900 - 0
libs/external/Skill_Seekers-development/src/skill_seekers/cli/github_scraper.py

@@ -0,0 +1,900 @@
+#!/usr/bin/env python3
+"""
+GitHub Repository to Claude Skill Converter (Tasks C1.1-C1.12)
+
+Converts GitHub repositories into Claude AI skills by extracting:
+- README and documentation
+- Code structure and signatures
+- GitHub Issues, Changelog, and Releases
+- Usage examples from tests
+
+Usage:
+    skill-seekers github --repo facebook/react
+    skill-seekers github --config configs/react_github.json
+    skill-seekers github --repo owner/repo --token $GITHUB_TOKEN
+"""
+
+import os
+import sys
+import json
+import re
+import argparse
+import logging
+from pathlib import Path
+from typing import Dict, List, Optional, Any
+from datetime import datetime
+
+try:
+    from github import Github, GithubException, Repository
+    from github.GithubException import RateLimitExceededException
+except ImportError:
+    print("Error: PyGithub not installed. Run: pip install PyGithub")
+    sys.exit(1)
+
+# Configure logging FIRST (before using logger)
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(levelname)s - %(message)s'
+)
+logger = logging.getLogger(__name__)
+
+# Import code analyzer for deep code analysis
+try:
+    from .code_analyzer import CodeAnalyzer
+    CODE_ANALYZER_AVAILABLE = True
+except ImportError:
+    CODE_ANALYZER_AVAILABLE = False
+    logger.warning("Code analyzer not available - deep analysis disabled")
+
+# Directories to exclude from local repository analysis
+EXCLUDED_DIRS = {
+    'venv', 'env', '.venv', '.env',  # Virtual environments
+    'node_modules', '__pycache__', '.pytest_cache',  # Dependencies and caches
+    '.git', '.svn', '.hg',  # Version control
+    'build', 'dist', '*.egg-info',  # Build artifacts
+    'htmlcov', '.coverage',  # Coverage reports
+    '.tox', '.nox',  # Testing environments
+    '.mypy_cache', '.ruff_cache',  # Linter caches
+}
+
+
+class GitHubScraper:
+    """
+    GitHub Repository Scraper (C1.1-C1.9)
+
+    Extracts repository information for skill generation:
+    - Repository structure
+    - README files
+    - Code comments and docstrings
+    - Programming language detection
+    - Function/class signatures
+    - Test examples
+    - GitHub Issues
+    - CHANGELOG
+    - Releases
+    """
+
+    def __init__(self, config: Dict[str, Any], local_repo_path: Optional[str] = None):
+        """Initialize GitHub scraper with configuration."""
+        self.config = config
+        self.repo_name = config['repo']
+        self.name = config.get('name', self.repo_name.split('/')[-1])
+        self.description = config.get('description', f'Skill for {self.repo_name}')
+
+        # Local repository path (optional - enables unlimited analysis)
+        self.local_repo_path = local_repo_path or config.get('local_repo_path')
+        if self.local_repo_path:
+            self.local_repo_path = os.path.expanduser(self.local_repo_path)
+            logger.info(f"Local repository mode enabled: {self.local_repo_path}")
+
+        # Configure directory exclusions (smart defaults + optional customization)
+        self.excluded_dirs = set(EXCLUDED_DIRS)  # Start with smart defaults
+
+        # Option 1: Replace mode - Use only specified exclusions
+        if 'exclude_dirs' in config:
+            self.excluded_dirs = set(config['exclude_dirs'])
+            logger.warning(
+                f"Using custom directory exclusions ({len(self.excluded_dirs)} dirs) - "
+                "defaults overridden"
+            )
+            logger.debug(f"Custom exclusions: {sorted(self.excluded_dirs)}")
+
+        # Option 2: Extend mode - Add to default exclusions
+        elif 'exclude_dirs_additional' in config:
+            additional = set(config['exclude_dirs_additional'])
+            self.excluded_dirs = self.excluded_dirs.union(additional)
+            logger.info(
+                f"Added {len(additional)} custom directory exclusions "
+                f"(total: {len(self.excluded_dirs)})"
+            )
+            logger.debug(f"Additional exclusions: {sorted(additional)}")
+
+        # GitHub client setup (C1.1)
+        token = self._get_token()
+        self.github = Github(token) if token else Github()
+        self.repo: Optional[Repository.Repository] = None
+
+        # Options
+        self.include_issues = config.get('include_issues', True)
+        self.max_issues = config.get('max_issues', 100)
+        self.include_changelog = config.get('include_changelog', True)
+        self.include_releases = config.get('include_releases', True)
+        self.include_code = config.get('include_code', False)
+        self.code_analysis_depth = config.get('code_analysis_depth', 'surface')  # 'surface', 'deep', 'full'
+        self.file_patterns = config.get('file_patterns', [])
+
+        # Initialize code analyzer if deep analysis requested
+        self.code_analyzer = None
+        if self.code_analysis_depth != 'surface' and CODE_ANALYZER_AVAILABLE:
+            self.code_analyzer = CodeAnalyzer(depth=self.code_analysis_depth)
+            logger.info(f"Code analysis depth: {self.code_analysis_depth}")
+
+        # Output paths
+        self.skill_dir = f"output/{self.name}"
+        self.data_file = f"output/{self.name}_github_data.json"
+
+        # Extracted data storage
+        self.extracted_data = {
+            'repo_info': {},
+            'readme': '',
+            'file_tree': [],
+            'languages': {},
+            'signatures': [],
+            'test_examples': [],
+            'issues': [],
+            'changelog': '',
+            'releases': []
+        }
+
+    def _get_token(self) -> Optional[str]:
+        """
+        Get GitHub token from env var or config (both options supported).
+        Priority: GITHUB_TOKEN env var > config file > None
+        """
+        # Try environment variable first (recommended)
+        token = os.getenv('GITHUB_TOKEN')
+        if token:
+            logger.info("Using GitHub token from GITHUB_TOKEN environment variable")
+            return token
+
+        # Fall back to config file
+        token = self.config.get('github_token')
+        if token:
+            logger.warning("Using GitHub token from config file (less secure)")
+            return token
+
+        logger.warning("No GitHub token provided - using unauthenticated access (lower rate limits)")
+        return None
+
+    def scrape(self) -> Dict[str, Any]:
+        """
+        Main scraping entry point.
+        Executes all C1 tasks in sequence.
+        """
+        try:
+            logger.info(f"Starting GitHub scrape for: {self.repo_name}")
+
+            # C1.1: Fetch repository
+            self._fetch_repository()
+
+            # C1.2: Extract README
+            self._extract_readme()
+
+            # C1.3-C1.6: Extract code structure
+            self._extract_code_structure()
+
+            # C1.7: Extract Issues
+            if self.include_issues:
+                self._extract_issues()
+
+            # C1.8: Extract CHANGELOG
+            if self.include_changelog:
+                self._extract_changelog()
+
+            # C1.9: Extract Releases
+            if self.include_releases:
+                self._extract_releases()
+
+            # Save extracted data
+            self._save_data()
+
+            logger.info(f"✅ Scraping complete! Data saved to: {self.data_file}")
+            return self.extracted_data
+
+        except RateLimitExceededException:
+            logger.error("GitHub API rate limit exceeded. Please wait or use authentication token.")
+            raise
+        except GithubException as e:
+            logger.error(f"GitHub API error: {e}")
+            raise
+        except Exception as e:
+            logger.error(f"Unexpected error during scraping: {e}")
+            raise
+
+    def _fetch_repository(self):
+        """C1.1: Fetch repository structure using GitHub API."""
+        logger.info(f"Fetching repository: {self.repo_name}")
+
+        try:
+            self.repo = self.github.get_repo(self.repo_name)
+
+            # Extract basic repo info
+            self.extracted_data['repo_info'] = {
+                'name': self.repo.name,
+                'full_name': self.repo.full_name,
+                'description': self.repo.description,
+                'url': self.repo.html_url,
+                'homepage': self.repo.homepage,
+                'stars': self.repo.stargazers_count,
+                'forks': self.repo.forks_count,
+                'open_issues': self.repo.open_issues_count,
+                'default_branch': self.repo.default_branch,
+                'created_at': self.repo.created_at.isoformat() if self.repo.created_at else None,
+                'updated_at': self.repo.updated_at.isoformat() if self.repo.updated_at else None,
+                'language': self.repo.language,
+                'license': self.repo.license.name if self.repo.license else None,
+                'topics': self.repo.get_topics()
+            }
+
+            logger.info(f"Repository fetched: {self.repo.full_name} ({self.repo.stargazers_count} stars)")
+
+        except GithubException as e:
+            if e.status == 404:
+                raise ValueError(f"Repository not found: {self.repo_name}")
+            raise
+
+    def _extract_readme(self):
+        """C1.2: Extract README.md files."""
+        logger.info("Extracting README...")
+
+        # Try common README locations
+        readme_files = ['README.md', 'README.rst', 'README.txt', 'README',
+                       'docs/README.md', '.github/README.md']
+
+        for readme_path in readme_files:
+            try:
+                content = self.repo.get_contents(readme_path)
+                if content:
+                    self.extracted_data['readme'] = content.decoded_content.decode('utf-8')
+                    logger.info(f"README found: {readme_path}")
+                    return
+            except GithubException:
+                continue
+
+        logger.warning("No README found in repository")
+
+    def _extract_code_structure(self):
+        """
+        C1.3-C1.6: Extract code structure, languages, signatures, and test examples.
+        Surface layer only - no full implementation code.
+        """
+        logger.info("Extracting code structure...")
+
+        # C1.4: Get language breakdown
+        self._extract_languages()
+
+        # Get file tree
+        self._extract_file_tree()
+
+        # Extract signatures and test examples
+        if self.include_code:
+            self._extract_signatures_and_tests()
+
+    def _extract_languages(self):
+        """C1.4: Detect programming languages in repository."""
+        logger.info("Detecting programming languages...")
+
+        try:
+            languages = self.repo.get_languages()
+            total_bytes = sum(languages.values())
+
+            self.extracted_data['languages'] = {
+                lang: {
+                    'bytes': bytes_count,
+                    'percentage': round((bytes_count / total_bytes) * 100, 2) if total_bytes > 0 else 0
+                }
+                for lang, bytes_count in languages.items()
+            }
+
+            logger.info(f"Languages detected: {', '.join(languages.keys())}")
+
+        except GithubException as e:
+            logger.warning(f"Could not fetch languages: {e}")
+
+    def should_exclude_dir(self, dir_name: str) -> bool:
+        """Check if directory should be excluded from analysis."""
+        return dir_name in self.excluded_dirs or dir_name.startswith('.')
+
+    def _extract_file_tree(self):
+        """Extract repository file tree structure (dual-mode: GitHub API or local filesystem)."""
+        logger.info("Building file tree...")
+
+        if self.local_repo_path:
+            # Local filesystem mode - unlimited files
+            self._extract_file_tree_local()
+        else:
+            # GitHub API mode - limited by API rate limits
+            self._extract_file_tree_github()
+
+    def _extract_file_tree_local(self):
+        """Extract file tree from local filesystem (unlimited files)."""
+        if not os.path.exists(self.local_repo_path):
+            logger.error(f"Local repository path not found: {self.local_repo_path}")
+            return
+
+        file_tree = []
+        for root, dirs, files in os.walk(self.local_repo_path):
+            # Exclude directories in-place to prevent os.walk from descending into them
+            dirs[:] = [d for d in dirs if not self.should_exclude_dir(d)]
+
+            # Calculate relative path from repo root
+            rel_root = os.path.relpath(root, self.local_repo_path)
+            if rel_root == '.':
+                rel_root = ''
+
+            # Add directories
+            for dir_name in dirs:
+                dir_path = os.path.join(rel_root, dir_name) if rel_root else dir_name
+                file_tree.append({
+                    'path': dir_path,
+                    'type': 'dir',
+                    'size': None
+                })
+
+            # Add files
+            for file_name in files:
+                file_path = os.path.join(rel_root, file_name) if rel_root else file_name
+                full_path = os.path.join(root, file_name)
+                try:
+                    file_size = os.path.getsize(full_path)
+                except OSError:
+                    file_size = None
+
+                file_tree.append({
+                    'path': file_path,
+                    'type': 'file',
+                    'size': file_size
+                })
+
+        self.extracted_data['file_tree'] = file_tree
+        logger.info(f"File tree built (local mode): {len(file_tree)} items")
+
+    def _extract_file_tree_github(self):
+        """Extract file tree from GitHub API (rate-limited)."""
+        try:
+            contents = self.repo.get_contents("")
+            file_tree = []
+
+            while contents:
+                file_content = contents.pop(0)
+
+                file_info = {
+                    'path': file_content.path,
+                    'type': file_content.type,
+                    'size': file_content.size if file_content.type == 'file' else None
+                }
+                file_tree.append(file_info)
+
+                if file_content.type == "dir":
+                    contents.extend(self.repo.get_contents(file_content.path))
+
+            self.extracted_data['file_tree'] = file_tree
+            logger.info(f"File tree built (GitHub API mode): {len(file_tree)} items")
+
+        except GithubException as e:
+            logger.warning(f"Could not build file tree: {e}")
+
+    def _extract_signatures_and_tests(self):
+        """
+        C1.3, C1.5, C1.6: Extract signatures, docstrings, and test examples.
+
+        Extraction depth depends on code_analysis_depth setting:
+        - surface: File tree only (minimal)
+        - deep: Parse files for signatures, parameters, types
+        - full: Complete AST analysis (future enhancement)
+        """
+        if self.code_analysis_depth == 'surface':
+            logger.info("Code extraction: Surface level (file tree only)")
+            return
+
+        if not self.code_analyzer:
+            logger.warning("Code analyzer not available - skipping deep analysis")
+            return
+
+        logger.info(f"Extracting code signatures ({self.code_analysis_depth} analysis)...")
+
+        # Get primary language for the repository
+        languages = self.extracted_data.get('languages', {})
+        if not languages:
+            logger.warning("No languages detected - skipping code analysis")
+            return
+
+        # Determine primary language
+        primary_language = max(languages.items(), key=lambda x: x[1]['bytes'])[0]
+        logger.info(f"Primary language: {primary_language}")
+
+        # Determine file extensions to analyze
+        extension_map = {
+            'Python': ['.py'],
+            'JavaScript': ['.js', '.jsx'],
+            'TypeScript': ['.ts', '.tsx'],
+            'C': ['.c', '.h'],
+            'C++': ['.cpp', '.hpp', '.cc', '.hh', '.cxx']
+        }
+
+        extensions = extension_map.get(primary_language, [])
+        if not extensions:
+            logger.warning(f"No file extensions mapped for {primary_language}")
+            return
+
+        # Analyze files matching patterns and extensions
+        analyzed_files = []
+        file_tree = self.extracted_data.get('file_tree', [])
+
+        for file_info in file_tree:
+            file_path = file_info['path']
+
+            # Check if file matches extension
+            if not any(file_path.endswith(ext) for ext in extensions):
+                continue
+
+            # Check if file matches patterns (if specified)
+            if self.file_patterns:
+                import fnmatch
+                if not any(fnmatch.fnmatch(file_path, pattern) for pattern in self.file_patterns):
+                    continue
+
+            # Analyze this file
+            try:
+                # Read file content based on mode
+                if self.local_repo_path:
+                    # Local mode - read from filesystem
+                    full_path = os.path.join(self.local_repo_path, file_path)
+                    with open(full_path, 'r', encoding='utf-8') as f:
+                        content = f.read()
+                else:
+                    # GitHub API mode - fetch from API
+                    file_content = self.repo.get_contents(file_path)
+                    content = file_content.decoded_content.decode('utf-8')
+
+                analysis_result = self.code_analyzer.analyze_file(
+                    file_path,
+                    content,
+                    primary_language
+                )
+
+                if analysis_result and (analysis_result.get('classes') or analysis_result.get('functions')):
+                    analyzed_files.append({
+                        'file': file_path,
+                        'language': primary_language,
+                        **analysis_result
+                    })
+
+                    logger.debug(f"Analyzed {file_path}: "
+                               f"{len(analysis_result.get('classes', []))} classes, "
+                               f"{len(analysis_result.get('functions', []))} functions")
+
+            except Exception as e:
+                logger.debug(f"Could not analyze {file_path}: {e}")
+                continue
+
+            # Limit number of files analyzed to avoid rate limits (GitHub API mode only)
+            if not self.local_repo_path and len(analyzed_files) >= 50:
+                logger.info(f"Reached analysis limit (50 files, GitHub API mode)")
+                break
+
+        self.extracted_data['code_analysis'] = {
+            'depth': self.code_analysis_depth,
+            'language': primary_language,
+            'files_analyzed': len(analyzed_files),
+            'files': analyzed_files
+        }
+
+        # Calculate totals
+        total_classes = sum(len(f.get('classes', [])) for f in analyzed_files)
+        total_functions = sum(len(f.get('functions', [])) for f in analyzed_files)
+
+        logger.info(f"Code analysis complete: {len(analyzed_files)} files, "
+                   f"{total_classes} classes, {total_functions} functions")
+
+    def _extract_issues(self):
+        """C1.7: Extract GitHub Issues (open/closed, labels, milestones)."""
+        logger.info(f"Extracting GitHub Issues (max {self.max_issues})...")
+
+        try:
+            # Fetch recent issues (open + closed)
+            issues = self.repo.get_issues(state='all', sort='updated', direction='desc')
+
+            issue_list = []
+            for issue in issues[:self.max_issues]:
+                # Skip pull requests (they appear in issues)
+                if issue.pull_request:
+                    continue
+
+                issue_data = {
+                    'number': issue.number,
+                    'title': issue.title,
+                    'state': issue.state,
+                    'labels': [label.name for label in issue.labels],
+                    'milestone': issue.milestone.title if issue.milestone else None,
+                    'created_at': issue.created_at.isoformat() if issue.created_at else None,
+                    'updated_at': issue.updated_at.isoformat() if issue.updated_at else None,
+                    'closed_at': issue.closed_at.isoformat() if issue.closed_at else None,
+                    'url': issue.html_url,
+                    'body': issue.body[:500] if issue.body else None  # First 500 chars
+                }
+                issue_list.append(issue_data)
+
+            self.extracted_data['issues'] = issue_list
+            logger.info(f"Extracted {len(issue_list)} issues")
+
+        except GithubException as e:
+            logger.warning(f"Could not fetch issues: {e}")
+
+    def _extract_changelog(self):
+        """C1.8: Extract CHANGELOG.md and release notes."""
+        logger.info("Extracting CHANGELOG...")
+
+        # Try common changelog locations
+        changelog_files = ['CHANGELOG.md', 'CHANGES.md', 'HISTORY.md',
+                          'CHANGELOG.rst', 'CHANGELOG.txt', 'CHANGELOG',
+                          'docs/CHANGELOG.md', '.github/CHANGELOG.md']
+
+        for changelog_path in changelog_files:
+            try:
+                content = self.repo.get_contents(changelog_path)
+                if content:
+                    self.extracted_data['changelog'] = content.decoded_content.decode('utf-8')
+                    logger.info(f"CHANGELOG found: {changelog_path}")
+                    return
+            except GithubException:
+                continue
+
+        logger.warning("No CHANGELOG found in repository")
+
+    def _extract_releases(self):
+        """C1.9: Extract GitHub Releases with version history."""
+        logger.info("Extracting GitHub Releases...")
+
+        try:
+            releases = self.repo.get_releases()
+
+            release_list = []
+            for release in releases:
+                release_data = {
+                    'tag_name': release.tag_name,
+                    'name': release.title,
+                    'body': release.body,
+                    'draft': release.draft,
+                    'prerelease': release.prerelease,
+                    'created_at': release.created_at.isoformat() if release.created_at else None,
+                    'published_at': release.published_at.isoformat() if release.published_at else None,
+                    'url': release.html_url,
+                    'tarball_url': release.tarball_url,
+                    'zipball_url': release.zipball_url
+                }
+                release_list.append(release_data)
+
+            self.extracted_data['releases'] = release_list
+            logger.info(f"Extracted {len(release_list)} releases")
+
+        except GithubException as e:
+            logger.warning(f"Could not fetch releases: {e}")
+
+    def _save_data(self):
+        """Save extracted data to JSON file."""
+        os.makedirs('output', exist_ok=True)
+
+        with open(self.data_file, 'w', encoding='utf-8') as f:
+            json.dump(self.extracted_data, f, indent=2, ensure_ascii=False)
+
+        logger.info(f"Data saved to: {self.data_file}")
+
+
+class GitHubToSkillConverter:
+    """
+    Convert extracted GitHub data to Claude skill format (C1.10).
+    """
+
+    def __init__(self, config: Dict[str, Any]):
+        """Initialize converter with configuration."""
+        self.config = config
+        self.name = config.get('name', config['repo'].split('/')[-1])
+        self.description = config.get('description', f'Skill for {config["repo"]}')
+
+        # Paths
+        self.data_file = f"output/{self.name}_github_data.json"
+        self.skill_dir = f"output/{self.name}"
+
+        # Load extracted data
+        self.data = self._load_data()
+
+    def _load_data(self) -> Dict[str, Any]:
+        """Load extracted GitHub data from JSON."""
+        if not os.path.exists(self.data_file):
+            raise FileNotFoundError(f"Data file not found: {self.data_file}")
+
+        with open(self.data_file, 'r', encoding='utf-8') as f:
+            return json.load(f)
+
+    def build_skill(self):
+        """Build complete skill structure."""
+        logger.info(f"Building skill for: {self.name}")
+
+        # Create directories
+        os.makedirs(self.skill_dir, exist_ok=True)
+        os.makedirs(f"{self.skill_dir}/references", exist_ok=True)
+        os.makedirs(f"{self.skill_dir}/scripts", exist_ok=True)
+        os.makedirs(f"{self.skill_dir}/assets", exist_ok=True)
+
+        # Generate SKILL.md
+        self._generate_skill_md()
+
+        # Generate reference files
+        self._generate_references()
+
+        logger.info(f"✅ Skill built successfully: {self.skill_dir}/")
+
+    def _generate_skill_md(self):
+        """Generate main SKILL.md file."""
+        repo_info = self.data.get('repo_info', {})
+
+        # Generate skill name (lowercase, hyphens only, max 64 chars)
+        skill_name = self.name.lower().replace('_', '-').replace(' ', '-')[:64]
+
+        # Truncate description to 1024 chars if needed
+        desc = self.description[:1024] if len(self.description) > 1024 else self.description
+
+        skill_content = f"""---
+name: {skill_name}
+description: {desc}
+---
+
+# {repo_info.get('name', self.name)}
+
+{self.description}
+
+## Description
+
+{repo_info.get('description', 'GitHub repository skill')}
+
+**Repository:** [{repo_info.get('full_name', 'N/A')}]({repo_info.get('url', '#')})
+**Language:** {repo_info.get('language', 'N/A')}
+**Stars:** {repo_info.get('stars', 0):,}
+**License:** {repo_info.get('license', 'N/A')}
+
+## When to Use This Skill
+
+Use this skill when you need to:
+- Understand how to use {self.name}
+- Look up API documentation
+- Find usage examples
+- Check for known issues or recent changes
+- Review release history
+
+## Quick Reference
+
+### Repository Info
+- **Homepage:** {repo_info.get('homepage', 'N/A')}
+- **Topics:** {', '.join(repo_info.get('topics', []))}
+- **Open Issues:** {repo_info.get('open_issues', 0)}
+- **Last Updated:** {repo_info.get('updated_at', 'N/A')[:10]}
+
+### Languages
+{self._format_languages()}
+
+### Recent Releases
+{self._format_recent_releases()}
+
+## Available References
+
+- `references/README.md` - Complete README documentation
+- `references/CHANGELOG.md` - Version history and changes
+- `references/issues.md` - Recent GitHub issues
+- `references/releases.md` - Release notes
+- `references/file_structure.md` - Repository structure
+
+## Usage
+
+See README.md for complete usage instructions and examples.
+
+---
+
+**Generated by Skill Seeker** | GitHub Repository Scraper
+"""
+
+        skill_path = f"{self.skill_dir}/SKILL.md"
+        with open(skill_path, 'w', encoding='utf-8') as f:
+            f.write(skill_content)
+
+        logger.info(f"Generated: {skill_path}")
+
+    def _format_languages(self) -> str:
+        """Format language breakdown."""
+        languages = self.data.get('languages', {})
+        if not languages:
+            return "No language data available"
+
+        lines = []
+        for lang, info in sorted(languages.items(), key=lambda x: x[1]['bytes'], reverse=True):
+            lines.append(f"- **{lang}:** {info['percentage']:.1f}%")
+
+        return '\n'.join(lines)
+
+    def _format_recent_releases(self) -> str:
+        """Format recent releases (top 3)."""
+        releases = self.data.get('releases', [])
+        if not releases:
+            return "No releases available"
+
+        lines = []
+        for release in releases[:3]:
+            lines.append(f"- **{release['tag_name']}** ({release['published_at'][:10]}): {release['name']}")
+
+        return '\n'.join(lines)
+
+    def _generate_references(self):
+        """Generate all reference files."""
+        # README
+        if self.data.get('readme'):
+            readme_path = f"{self.skill_dir}/references/README.md"
+            with open(readme_path, 'w', encoding='utf-8') as f:
+                f.write(self.data['readme'])
+            logger.info(f"Generated: {readme_path}")
+
+        # CHANGELOG
+        if self.data.get('changelog'):
+            changelog_path = f"{self.skill_dir}/references/CHANGELOG.md"
+            with open(changelog_path, 'w', encoding='utf-8') as f:
+                f.write(self.data['changelog'])
+            logger.info(f"Generated: {changelog_path}")
+
+        # Issues
+        if self.data.get('issues'):
+            self._generate_issues_reference()
+
+        # Releases
+        if self.data.get('releases'):
+            self._generate_releases_reference()
+
+        # File structure
+        if self.data.get('file_tree'):
+            self._generate_file_structure_reference()
+
+    def _generate_issues_reference(self):
+        """Generate issues.md reference file."""
+        issues = self.data['issues']
+
+        content = f"# GitHub Issues\n\nRecent issues from the repository ({len(issues)} total).\n\n"
+
+        # Group by state
+        open_issues = [i for i in issues if i['state'] == 'open']
+        closed_issues = [i for i in issues if i['state'] == 'closed']
+
+        content += f"## Open Issues ({len(open_issues)})\n\n"
+        for issue in open_issues[:20]:
+            labels = ', '.join(issue['labels']) if issue['labels'] else 'No labels'
+            content += f"### #{issue['number']}: {issue['title']}\n"
+            content += f"**Labels:** {labels} | **Created:** {issue['created_at'][:10]}\n"
+            content += f"[View on GitHub]({issue['url']})\n\n"
+
+        content += f"\n## Recently Closed Issues ({len(closed_issues)})\n\n"
+        for issue in closed_issues[:10]:
+            labels = ', '.join(issue['labels']) if issue['labels'] else 'No labels'
+            content += f"### #{issue['number']}: {issue['title']}\n"
+            content += f"**Labels:** {labels} | **Closed:** {issue['closed_at'][:10]}\n"
+            content += f"[View on GitHub]({issue['url']})\n\n"
+
+        issues_path = f"{self.skill_dir}/references/issues.md"
+        with open(issues_path, 'w', encoding='utf-8') as f:
+            f.write(content)
+        logger.info(f"Generated: {issues_path}")
+
+    def _generate_releases_reference(self):
+        """Generate releases.md reference file."""
+        releases = self.data['releases']
+
+        content = f"# Releases\n\nVersion history for this repository ({len(releases)} releases).\n\n"
+
+        for release in releases:
+            content += f"## {release['tag_name']}: {release['name']}\n"
+            content += f"**Published:** {release['published_at'][:10]}\n"
+            if release['prerelease']:
+                content += f"**Pre-release**\n"
+            content += f"\n{release['body']}\n\n"
+            content += f"[View on GitHub]({release['url']})\n\n---\n\n"
+
+        releases_path = f"{self.skill_dir}/references/releases.md"
+        with open(releases_path, 'w', encoding='utf-8') as f:
+            f.write(content)
+        logger.info(f"Generated: {releases_path}")
+
+    def _generate_file_structure_reference(self):
+        """Generate file_structure.md reference file."""
+        file_tree = self.data['file_tree']
+
+        content = f"# Repository File Structure\n\n"
+        content += f"Total items: {len(file_tree)}\n\n"
+        content += "```\n"
+
+        # Build tree structure
+        for item in file_tree:
+            indent = "  " * item['path'].count('/')
+            icon = "📁" if item['type'] == 'dir' else "📄"
+            content += f"{indent}{icon} {os.path.basename(item['path'])}\n"
+
+        content += "```\n"
+
+        structure_path = f"{self.skill_dir}/references/file_structure.md"
+        with open(structure_path, 'w', encoding='utf-8') as f:
+            f.write(content)
+        logger.info(f"Generated: {structure_path}")
+
+
+def main():
+    """C1.10: CLI tool entry point."""
+    parser = argparse.ArgumentParser(
+        description='GitHub Repository to Claude Skill Converter',
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  skill-seekers github --repo facebook/react
+  skill-seekers github --config configs/react_github.json
+  skill-seekers github --repo owner/repo --token $GITHUB_TOKEN
+        """
+    )
+
+    parser.add_argument('--repo', help='GitHub repository (owner/repo)')
+    parser.add_argument('--config', help='Path to config JSON file')
+    parser.add_argument('--token', help='GitHub personal access token')
+    parser.add_argument('--name', help='Skill name (default: repo name)')
+    parser.add_argument('--description', help='Skill description')
+    parser.add_argument('--no-issues', action='store_true', help='Skip GitHub issues')
+    parser.add_argument('--no-changelog', action='store_true', help='Skip CHANGELOG')
+    parser.add_argument('--no-releases', action='store_true', help='Skip releases')
+    parser.add_argument('--max-issues', type=int, default=100, help='Max issues to fetch')
+    parser.add_argument('--scrape-only', action='store_true', help='Only scrape, don\'t build skill')
+
+    args = parser.parse_args()
+
+    # Build config from args or file
+    if args.config:
+        with open(args.config, 'r') as f:
+            config = json.load(f)
+    elif args.repo:
+        config = {
+            'repo': args.repo,
+            'name': args.name or args.repo.split('/')[-1],
+            'description': args.description or f'GitHub repository skill for {args.repo}',
+            'github_token': args.token,
+            'include_issues': not args.no_issues,
+            'include_changelog': not args.no_changelog,
+            'include_releases': not args.no_releases,
+            'max_issues': args.max_issues
+        }
+    else:
+        parser.error('Either --repo or --config is required')
+
+    try:
+        # Phase 1: Scrape GitHub repository
+        scraper = GitHubScraper(config)
+        scraper.scrape()
+
+        if args.scrape_only:
+            logger.info("Scrape complete (--scrape-only mode)")
+            return
+
+        # Phase 2: Build skill
+        converter = GitHubToSkillConverter(config)
+        converter.build_skill()
+
+        logger.info(f"\n✅ Success! Skill created at: output/{config.get('name', config['repo'].split('/')[-1])}/")
+        logger.info(f"Next step: skill-seekers-package output/{config.get('name', config['repo'].split('/')[-1])}/")
+
+    except Exception as e:
+        logger.error(f"Error: {e}")
+        sys.exit(1)
+
+
+if __name__ == '__main__':
+    main()

+ 66 - 0
libs/external/Skill_Seekers-development/src/skill_seekers/cli/llms_txt_detector.py

@@ -0,0 +1,66 @@
+# ABOUTME: Detects and validates llms.txt file availability at documentation URLs
+# ABOUTME: Supports llms-full.txt, llms.txt, and llms-small.txt variants
+
+import requests
+from typing import Optional, Dict, List
+from urllib.parse import urlparse
+
+class LlmsTxtDetector:
+    """Detect llms.txt files at documentation URLs"""
+
+    VARIANTS = [
+        ('llms-full.txt', 'full'),
+        ('llms.txt', 'standard'),
+        ('llms-small.txt', 'small')
+    ]
+
+    def __init__(self, base_url: str):
+        self.base_url = base_url.rstrip('/')
+
+    def detect(self) -> Optional[Dict[str, str]]:
+        """
+        Detect available llms.txt variant.
+
+        Returns:
+            Dict with 'url' and 'variant' keys, or None if not found
+        """
+        parsed = urlparse(self.base_url)
+        root_url = f"{parsed.scheme}://{parsed.netloc}"
+
+        for filename, variant in self.VARIANTS:
+            url = f"{root_url}/{filename}"
+
+            if self._check_url_exists(url):
+                return {'url': url, 'variant': variant}
+
+        return None
+
+    def detect_all(self) -> List[Dict[str, str]]:
+        """
+        Detect all available llms.txt variants.
+
+        Returns:
+            List of dicts with 'url' and 'variant' keys for each found variant
+        """
+        found_variants = []
+
+        for filename, variant in self.VARIANTS:
+            parsed = urlparse(self.base_url)
+            root_url = f"{parsed.scheme}://{parsed.netloc}"
+            url = f"{root_url}/{filename}"
+
+            if self._check_url_exists(url):
+                found_variants.append({
+                    'url': url,
+                    'variant': variant
+                })
+
+        return found_variants
+
+    def _check_url_exists(self, url: str) -> bool:
+        """Check if URL returns 200 status"""
+        try:
+            response = requests.head(url, timeout=5, allow_redirects=True)
+            return response.status_code == 200
+        except requests.RequestException:
+            return False

+ 94 - 0
libs/external/Skill_Seekers-development/src/skill_seekers/cli/llms_txt_downloader.py

@@ -0,0 +1,94 @@
+"""ABOUTME: Downloads llms.txt files from documentation URLs with retry logic"""
+"""ABOUTME: Validates markdown content and handles timeouts with exponential backoff"""
+
+import requests
+import time
+from typing import Optional
+
+class LlmsTxtDownloader:
+    """Download llms.txt content from URLs with retry logic"""
+
+    def __init__(self, url: str, timeout: int = 30, max_retries: int = 3):
+        self.url = url
+        self.timeout = timeout
+        self.max_retries = max_retries
+
+    def get_proper_filename(self) -> str:
+        """
+        Extract filename from URL and convert .txt to .md
+
+        Returns:
+            Proper filename with .md extension
+
+        Examples:
+            https://hono.dev/llms-full.txt -> llms-full.md
+            https://hono.dev/llms.txt -> llms.md
+            https://hono.dev/llms-small.txt -> llms-small.md
+        """
+        # Extract filename from URL
+        from urllib.parse import urlparse
+        parsed = urlparse(self.url)
+        filename = parsed.path.split('/')[-1]
+
+        # Replace .txt with .md
+        if filename.endswith('.txt'):
+            filename = filename[:-4] + '.md'
+
+        return filename
+
+    def _is_markdown(self, content: str) -> bool:
+        """
+        Check if content looks like markdown.
+
+        Returns:
+            True if content contains markdown patterns
+        """
+        markdown_patterns = ['# ', '## ', '```', '- ', '* ', '`']
+        return any(pattern in content for pattern in markdown_patterns)
+
+    def download(self) -> Optional[str]:
+        """
+        Download llms.txt content with retry logic.
+
+        Returns:
+            String content or None if download fails
+        """
+        headers = {
+            'User-Agent': 'Skill-Seekers-llms.txt-Reader/1.0'
+        }
+
+        for attempt in range(self.max_retries):
+            try:
+                response = requests.get(
+                    self.url,
+                    headers=headers,
+                    timeout=self.timeout
+                )
+                response.raise_for_status()
+
+                content = response.text
+
+                # Validate content is not empty
+                if len(content) < 100:
+                    print(f"⚠️  Content too short ({len(content)} chars), rejecting")
+                    return None
+
+                # Validate content looks like markdown
+                if not self._is_markdown(content):
+                    print(f"⚠️  Content doesn't look like markdown")
+                    return None
+
+                return content
+
+            except requests.RequestException as e:
+                if attempt < self.max_retries - 1:
+                    # Calculate exponential backoff delay: 1s, 2s, 4s, etc.
+                    delay = 2 ** attempt
+                    print(f"⚠️  Attempt {attempt + 1}/{self.max_retries} failed: {e}")
+                    print(f"   Retrying in {delay}s...")
+                    time.sleep(delay)
+                else:
+                    print(f"❌ Failed to download {self.url} after {self.max_retries} attempts: {e}")
+                    return None
+
+        return None

+ 74 - 0
libs/external/Skill_Seekers-development/src/skill_seekers/cli/llms_txt_parser.py

@@ -0,0 +1,74 @@
+"""ABOUTME: Parses llms.txt markdown content into structured page data"""
+"""ABOUTME: Extracts titles, content, code samples, and headings from markdown"""
+
+import re
+from typing import List, Dict
+
+class LlmsTxtParser:
+    """Parse llms.txt markdown content into page structures"""
+
+    def __init__(self, content: str):
+        self.content = content
+
+    def parse(self) -> List[Dict]:
+        """
+        Parse markdown content into page structures.
+
+        Returns:
+            List of page dicts with title, content, code_samples, headings
+        """
+        pages = []
+
+        # Split by h1 headers (# Title)
+        sections = re.split(r'\n# ', self.content)
+
+        for section in sections:
+            if not section.strip():
+                continue
+
+            # First line is title
+            lines = section.split('\n')
+            title = lines[0].strip('#').strip()
+
+            # Parse content
+            page = self._parse_section('\n'.join(lines[1:]), title)
+            pages.append(page)
+
+        return pages
+
+    def _parse_section(self, content: str, title: str) -> Dict:
+        """Parse a single section into page structure"""
+        page = {
+            'title': title,
+            'content': '',
+            'code_samples': [],
+            'headings': [],
+            'url': f'llms-txt#{title.lower().replace(" ", "-")}',
+            'links': []
+        }
+
+        # Extract code blocks
+        code_blocks = re.findall(r'```(\w+)?\n(.*?)```', content, re.DOTALL)
+        for lang, code in code_blocks:
+            page['code_samples'].append({
+                'code': code.strip(),
+                'language': lang or 'unknown'
+            })
+
+        # Extract h2/h3 headings
+        headings = re.findall(r'^(#{2,3})\s+(.+)$', content, re.MULTILINE)
+        for level_markers, text in headings:
+            page['headings'].append({
+                'level': f'h{len(level_markers)}',
+                'text': text.strip(),
+                'id': text.lower().replace(' ', '-')
+            })
+
+        # Remove code blocks from content for plain text
+        content_no_code = re.sub(r'```.*?```', '', content, flags=re.DOTALL)
+
+        # Extract paragraphs
+        paragraphs = [p.strip() for p in content_no_code.split('\n\n') if len(p.strip()) > 20]
+        page['content'] = '\n\n'.join(paragraphs)
+
+        return page

+ 285 - 0
libs/external/Skill_Seekers-development/src/skill_seekers/cli/main.py

@@ -0,0 +1,285 @@
+#!/usr/bin/env python3
+"""
+Skill Seekers - Unified CLI Entry Point
+
+Provides a git-style unified command-line interface for all Skill Seekers tools.
+
+Usage:
+    skill-seekers <command> [options]
+
+Commands:
+    scrape      Scrape documentation website
+    github      Scrape GitHub repository
+    pdf         Extract from PDF file
+    unified     Multi-source scraping (docs + GitHub + PDF)
+    enhance     AI-powered enhancement (local, no API key)
+    package     Package skill into .zip file
+    upload      Upload skill to Claude
+    estimate    Estimate page count before scraping
+
+Examples:
+    skill-seekers scrape --config configs/react.json
+    skill-seekers github --repo microsoft/TypeScript
+    skill-seekers unified --config configs/react_unified.json
+    skill-seekers package output/react/
+"""
+
+import sys
+import argparse
+from typing import List, Optional
+
+
+def create_parser() -> argparse.ArgumentParser:
+    """Create the main argument parser with subcommands."""
+    parser = argparse.ArgumentParser(
+        prog="skill-seekers",
+        description="Convert documentation, GitHub repos, and PDFs into Claude AI skills",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  # Scrape documentation
+  skill-seekers scrape --config configs/react.json
+
+  # Scrape GitHub repository
+  skill-seekers github --repo microsoft/TypeScript --name typescript
+
+  # Multi-source scraping (unified)
+  skill-seekers unified --config configs/react_unified.json
+
+  # AI-powered enhancement
+  skill-seekers enhance output/react/
+
+  # Package and upload
+  skill-seekers package output/react/
+  skill-seekers upload output/react.zip
+
+For more information: https://github.com/yusufkaraaslan/Skill_Seekers
+        """
+    )
+
+    parser.add_argument(
+        "--version",
+        action="version",
+        version="%(prog)s 2.1.1"
+    )
+
+    subparsers = parser.add_subparsers(
+        dest="command",
+        title="commands",
+        description="Available Skill Seekers commands",
+        help="Command to run"
+    )
+
+    # === scrape subcommand ===
+    scrape_parser = subparsers.add_parser(
+        "scrape",
+        help="Scrape documentation website",
+        description="Scrape documentation website and generate skill"
+    )
+    scrape_parser.add_argument("--config", help="Config JSON file")
+    scrape_parser.add_argument("--name", help="Skill name")
+    scrape_parser.add_argument("--url", help="Documentation URL")
+    scrape_parser.add_argument("--description", help="Skill description")
+    scrape_parser.add_argument("--skip-scrape", action="store_true", help="Skip scraping, use cached data")
+    scrape_parser.add_argument("--enhance", action="store_true", help="AI enhancement (API)")
+    scrape_parser.add_argument("--enhance-local", action="store_true", help="AI enhancement (local)")
+    scrape_parser.add_argument("--dry-run", action="store_true", help="Dry run mode")
+    scrape_parser.add_argument("--async", dest="async_mode", action="store_true", help="Use async scraping")
+    scrape_parser.add_argument("--workers", type=int, help="Number of async workers")
+
+    # === github subcommand ===
+    github_parser = subparsers.add_parser(
+        "github",
+        help="Scrape GitHub repository",
+        description="Scrape GitHub repository and generate skill"
+    )
+    github_parser.add_argument("--config", help="Config JSON file")
+    github_parser.add_argument("--repo", help="GitHub repo (owner/repo)")
+    github_parser.add_argument("--name", help="Skill name")
+    github_parser.add_argument("--description", help="Skill description")
+
+    # === pdf subcommand ===
+    pdf_parser = subparsers.add_parser(
+        "pdf",
+        help="Extract from PDF file",
+        description="Extract content from PDF and generate skill"
+    )
+    pdf_parser.add_argument("--config", help="Config JSON file")
+    pdf_parser.add_argument("--pdf", help="PDF file path")
+    pdf_parser.add_argument("--name", help="Skill name")
+    pdf_parser.add_argument("--description", help="Skill description")
+    pdf_parser.add_argument("--from-json", help="Build from extracted JSON")
+
+    # === unified subcommand ===
+    unified_parser = subparsers.add_parser(
+        "unified",
+        help="Multi-source scraping (docs + GitHub + PDF)",
+        description="Combine multiple sources into one skill"
+    )
+    unified_parser.add_argument("--config", required=True, help="Unified config JSON file")
+    unified_parser.add_argument("--merge-mode", help="Merge mode (rule-based, claude-enhanced)")
+    unified_parser.add_argument("--dry-run", action="store_true", help="Dry run mode")
+
+    # === enhance subcommand ===
+    enhance_parser = subparsers.add_parser(
+        "enhance",
+        help="AI-powered enhancement (local, no API key)",
+        description="Enhance SKILL.md using Claude Code (local)"
+    )
+    enhance_parser.add_argument("skill_directory", help="Skill directory path")
+
+    # === package subcommand ===
+    package_parser = subparsers.add_parser(
+        "package",
+        help="Package skill into .zip file",
+        description="Package skill directory into uploadable .zip"
+    )
+    package_parser.add_argument("skill_directory", help="Skill directory path")
+    package_parser.add_argument("--no-open", action="store_true", help="Don't open output folder")
+    package_parser.add_argument("--upload", action="store_true", help="Auto-upload after packaging")
+
+    # === upload subcommand ===
+    upload_parser = subparsers.add_parser(
+        "upload",
+        help="Upload skill to Claude",
+        description="Upload .zip file to Claude via Anthropic API"
+    )
+    upload_parser.add_argument("zip_file", help=".zip file to upload")
+    upload_parser.add_argument("--api-key", help="Anthropic API key")
+
+    # === estimate subcommand ===
+    estimate_parser = subparsers.add_parser(
+        "estimate",
+        help="Estimate page count before scraping",
+        description="Estimate total pages for documentation scraping"
+    )
+    estimate_parser.add_argument("config", help="Config JSON file")
+    estimate_parser.add_argument("--max-discovery", type=int, help="Max pages to discover")
+
+    return parser
+
+
+def main(argv: Optional[List[str]] = None) -> int:
+    """Main entry point for the unified CLI.
+
+    Args:
+        argv: Command-line arguments (defaults to sys.argv)
+
+    Returns:
+        Exit code (0 for success, non-zero for error)
+    """
+    parser = create_parser()
+    args = parser.parse_args(argv)
+
+    if not args.command:
+        parser.print_help()
+        return 1
+
+    # Delegate to the appropriate tool
+    try:
+        if args.command == "scrape":
+            from skill_seekers.cli.doc_scraper import main as scrape_main
+            # Convert args namespace to sys.argv format for doc_scraper
+            sys.argv = ["doc_scraper.py"]
+            if args.config:
+                sys.argv.extend(["--config", args.config])
+            if args.name:
+                sys.argv.extend(["--name", args.name])
+            if args.url:
+                sys.argv.extend(["--url", args.url])
+            if args.description:
+                sys.argv.extend(["--description", args.description])
+            if args.skip_scrape:
+                sys.argv.append("--skip-scrape")
+            if args.enhance:
+                sys.argv.append("--enhance")
+            if args.enhance_local:
+                sys.argv.append("--enhance-local")
+            if args.dry_run:
+                sys.argv.append("--dry-run")
+            if args.async_mode:
+                sys.argv.append("--async")
+            if args.workers:
+                sys.argv.extend(["--workers", str(args.workers)])
+            return scrape_main() or 0
+
+        elif args.command == "github":
+            from skill_seekers.cli.github_scraper import main as github_main
+            sys.argv = ["github_scraper.py"]
+            if args.config:
+                sys.argv.extend(["--config", args.config])
+            if args.repo:
+                sys.argv.extend(["--repo", args.repo])
+            if args.name:
+                sys.argv.extend(["--name", args.name])
+            if args.description:
+                sys.argv.extend(["--description", args.description])
+            return github_main() or 0
+
+        elif args.command == "pdf":
+            from skill_seekers.cli.pdf_scraper import main as pdf_main
+            sys.argv = ["pdf_scraper.py"]
+            if args.config:
+                sys.argv.extend(["--config", args.config])
+            if args.pdf:
+                sys.argv.extend(["--pdf", args.pdf])
+            if args.name:
+                sys.argv.extend(["--name", args.name])
+            if args.description:
+                sys.argv.extend(["--description", args.description])
+            if args.from_json:
+                sys.argv.extend(["--from-json", args.from_json])
+            return pdf_main() or 0
+
+        elif args.command == "unified":
+            from skill_seekers.cli.unified_scraper import main as unified_main
+            sys.argv = ["unified_scraper.py", "--config", args.config]
+            if args.merge_mode:
+                sys.argv.extend(["--merge-mode", args.merge_mode])
+            if args.dry_run:
+                sys.argv.append("--dry-run")
+            return unified_main() or 0
+
+        elif args.command == "enhance":
+            from skill_seekers.cli.enhance_skill_local import main as enhance_main
+            sys.argv = ["enhance_skill_local.py", args.skill_directory]
+            return enhance_main() or 0
+
+        elif args.command == "package":
+            from skill_seekers.cli.package_skill import main as package_main
+            sys.argv = ["package_skill.py", args.skill_directory]
+            if args.no_open:
+                sys.argv.append("--no-open")
+            if args.upload:
+                sys.argv.append("--upload")
+            return package_main() or 0
+
+        elif args.command == "upload":
+            from skill_seekers.cli.upload_skill import main as upload_main
+            sys.argv = ["upload_skill.py", args.zip_file]
+            if args.api_key:
+                sys.argv.extend(["--api-key", args.api_key])
+            return upload_main() or 0
+
+        elif args.command == "estimate":
+            from skill_seekers.cli.estimate_pages import main as estimate_main
+            sys.argv = ["estimate_pages.py", args.config]
+            if args.max_discovery:
+                sys.argv.extend(["--max-discovery", str(args.max_discovery)])
+            return estimate_main() or 0
+
+        else:
+            print(f"Error: Unknown command '{args.command}'", file=sys.stderr)
+            parser.print_help()
+            return 1
+
+    except KeyboardInterrupt:
+        print("\n\nInterrupted by user", file=sys.stderr)
+        return 130
+    except Exception as e:
+        print(f"Error: {e}", file=sys.stderr)
+        return 1
+
+
+if __name__ == "__main__":
+    sys.exit(main())

+ 513 - 0
libs/external/Skill_Seekers-development/src/skill_seekers/cli/merge_sources.py

@@ -0,0 +1,513 @@
+#!/usr/bin/env python3
+"""
+Source Merger for Multi-Source Skills
+
+Merges documentation and code data intelligently:
+- Rule-based merge: Fast, deterministic rules
+- Claude-enhanced merge: AI-powered reconciliation
+
+Handles conflicts and creates unified API reference.
+"""
+
+import json
+import logging
+import subprocess
+import tempfile
+import os
+from pathlib import Path
+from typing import Dict, List, Any, Optional
+from .conflict_detector import Conflict, ConflictDetector
+
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+
+class RuleBasedMerger:
+    """
+    Rule-based API merger using deterministic rules.
+
+    Rules:
+    1. If API only in docs → Include with [DOCS_ONLY] tag
+    2. If API only in code → Include with [UNDOCUMENTED] tag
+    3. If both match perfectly → Include normally
+    4. If conflict → Include both versions with [CONFLICT] tag, prefer code signature
+    """
+
+    def __init__(self, docs_data: Dict, github_data: Dict, conflicts: List[Conflict]):
+        """
+        Initialize rule-based merger.
+
+        Args:
+            docs_data: Documentation scraper data
+            github_data: GitHub scraper data
+            conflicts: List of detected conflicts
+        """
+        self.docs_data = docs_data
+        self.github_data = github_data
+        self.conflicts = conflicts
+
+        # Build conflict index for fast lookup
+        self.conflict_index = {c.api_name: c for c in conflicts}
+
+        # Extract APIs from both sources
+        detector = ConflictDetector(docs_data, github_data)
+        self.docs_apis = detector.docs_apis
+        self.code_apis = detector.code_apis
+
+    def merge_all(self) -> Dict[str, Any]:
+        """
+        Merge all APIs using rule-based logic.
+
+        Returns:
+            Dict containing merged API data
+        """
+        logger.info("Starting rule-based merge...")
+
+        merged_apis = {}
+
+        # Get all unique API names
+        all_api_names = set(self.docs_apis.keys()) | set(self.code_apis.keys())
+
+        for api_name in sorted(all_api_names):
+            merged_api = self._merge_single_api(api_name)
+            merged_apis[api_name] = merged_api
+
+        logger.info(f"Merged {len(merged_apis)} APIs")
+
+        return {
+            'merge_mode': 'rule-based',
+            'apis': merged_apis,
+            'summary': {
+                'total_apis': len(merged_apis),
+                'docs_only': sum(1 for api in merged_apis.values() if api['status'] == 'docs_only'),
+                'code_only': sum(1 for api in merged_apis.values() if api['status'] == 'code_only'),
+                'matched': sum(1 for api in merged_apis.values() if api['status'] == 'matched'),
+                'conflict': sum(1 for api in merged_apis.values() if api['status'] == 'conflict')
+            }
+        }
+
+    def _merge_single_api(self, api_name: str) -> Dict[str, Any]:
+        """
+        Merge a single API using rules.
+
+        Args:
+            api_name: Name of the API to merge
+
+        Returns:
+            Merged API dict
+        """
+        in_docs = api_name in self.docs_apis
+        in_code = api_name in self.code_apis
+        has_conflict = api_name in self.conflict_index
+
+        # Rule 1: Only in docs
+        if in_docs and not in_code:
+            conflict = self.conflict_index.get(api_name)
+            return {
+                'name': api_name,
+                'status': 'docs_only',
+                'source': 'documentation',
+                'data': self.docs_apis[api_name],
+                'warning': 'This API is documented but not found in codebase',
+                'conflict': conflict.__dict__ if conflict else None
+            }
+
+        # Rule 2: Only in code
+        if in_code and not in_docs:
+            is_private = api_name.startswith('_')
+            conflict = self.conflict_index.get(api_name)
+            return {
+                'name': api_name,
+                'status': 'code_only',
+                'source': 'code',
+                'data': self.code_apis[api_name],
+                'warning': 'This API exists in code but is not documented' if not is_private else 'Internal/private API',
+                'conflict': conflict.__dict__ if conflict else None
+            }
+
+        # Both exist - check for conflicts
+        docs_info = self.docs_apis[api_name]
+        code_info = self.code_apis[api_name]
+
+        # Rule 3: Both match perfectly (no conflict)
+        if not has_conflict:
+            return {
+                'name': api_name,
+                'status': 'matched',
+                'source': 'both',
+                'docs_data': docs_info,
+                'code_data': code_info,
+                'merged_signature': self._create_merged_signature(code_info, docs_info),
+                'merged_description': docs_info.get('docstring') or code_info.get('docstring')
+            }
+
+        # Rule 4: Conflict exists - prefer code signature, keep docs description
+        conflict = self.conflict_index[api_name]
+
+        return {
+            'name': api_name,
+            'status': 'conflict',
+            'source': 'both',
+            'docs_data': docs_info,
+            'code_data': code_info,
+            'conflict': conflict.__dict__,
+            'resolution': 'prefer_code_signature',
+            'merged_signature': self._create_merged_signature(code_info, docs_info),
+            'merged_description': docs_info.get('docstring') or code_info.get('docstring'),
+            'warning': conflict.difference
+        }
+
+    def _create_merged_signature(self, code_info: Dict, docs_info: Dict) -> str:
+        """
+        Create merged signature preferring code data.
+
+        Args:
+            code_info: API info from code
+            docs_info: API info from docs
+
+        Returns:
+            Merged signature string
+        """
+        name = code_info.get('name', docs_info.get('name'))
+        params = code_info.get('parameters', docs_info.get('parameters', []))
+        return_type = code_info.get('return_type', docs_info.get('return_type'))
+
+        # Build parameter string
+        param_strs = []
+        for param in params:
+            param_str = param['name']
+            if param.get('type_hint'):
+                param_str += f": {param['type_hint']}"
+            if param.get('default'):
+                param_str += f" = {param['default']}"
+            param_strs.append(param_str)
+
+        signature = f"{name}({', '.join(param_strs)})"
+
+        if return_type:
+            signature += f" -> {return_type}"
+
+        return signature
+
+
+class ClaudeEnhancedMerger:
+    """
+    Claude-enhanced API merger using local Claude Code.
+
+    Opens Claude Code in a new terminal to intelligently reconcile conflicts.
+    Uses the same approach as enhance_skill_local.py.
+    """
+
+    def __init__(self, docs_data: Dict, github_data: Dict, conflicts: List[Conflict]):
+        """
+        Initialize Claude-enhanced merger.
+
+        Args:
+            docs_data: Documentation scraper data
+            github_data: GitHub scraper data
+            conflicts: List of detected conflicts
+        """
+        self.docs_data = docs_data
+        self.github_data = github_data
+        self.conflicts = conflicts
+
+        # First do rule-based merge as baseline
+        self.rule_merger = RuleBasedMerger(docs_data, github_data, conflicts)
+
+    def merge_all(self) -> Dict[str, Any]:
+        """
+        Merge all APIs using Claude enhancement.
+
+        Returns:
+            Dict containing merged API data
+        """
+        logger.info("Starting Claude-enhanced merge...")
+
+        # Create temporary workspace
+        workspace_dir = self._create_workspace()
+
+        # Launch Claude Code for enhancement
+        logger.info("Launching Claude Code for intelligent merging...")
+        logger.info("Claude will analyze conflicts and create reconciled API reference")
+
+        try:
+            self._launch_claude_merge(workspace_dir)
+
+            # Read enhanced results
+            merged_data = self._read_merged_results(workspace_dir)
+
+            logger.info("Claude-enhanced merge complete")
+            return merged_data
+
+        except Exception as e:
+            logger.error(f"Claude enhancement failed: {e}")
+            logger.info("Falling back to rule-based merge")
+            return self.rule_merger.merge_all()
+
+    def _create_workspace(self) -> str:
+        """
+        Create temporary workspace with merge context.
+
+        Returns:
+            Path to workspace directory
+        """
+        workspace = tempfile.mkdtemp(prefix='skill_merge_')
+        logger.info(f"Created merge workspace: {workspace}")
+
+        # Write context files for Claude
+        self._write_context_files(workspace)
+
+        return workspace
+
+    def _write_context_files(self, workspace: str):
+        """Write context files for Claude to analyze."""
+
+        # 1. Write conflicts summary
+        conflicts_file = os.path.join(workspace, 'conflicts.json')
+        with open(conflicts_file, 'w') as f:
+            json.dump({
+                'conflicts': [c.__dict__ for c in self.conflicts],
+                'summary': {
+                    'total': len(self.conflicts),
+                    'by_type': self._count_by_field('type'),
+                    'by_severity': self._count_by_field('severity')
+                }
+            }, f, indent=2)
+
+        # 2. Write documentation APIs
+        docs_apis_file = os.path.join(workspace, 'docs_apis.json')
+        detector = ConflictDetector(self.docs_data, self.github_data)
+        with open(docs_apis_file, 'w') as f:
+            json.dump(detector.docs_apis, f, indent=2)
+
+        # 3. Write code APIs
+        code_apis_file = os.path.join(workspace, 'code_apis.json')
+        with open(code_apis_file, 'w') as f:
+            json.dump(detector.code_apis, f, indent=2)
+
+        # 4. Write merge instructions for Claude
+        instructions = """# API Merge Task
+
+You are merging API documentation from two sources:
+1. Official documentation (user-facing)
+2. Source code analysis (implementation reality)
+
+## Context Files:
+- `conflicts.json` - All detected conflicts between sources
+- `docs_apis.json` - APIs from documentation
+- `code_apis.json` - APIs from source code
+
+## Your Task:
+For each conflict, reconcile the differences intelligently:
+
+1. **Prefer code signatures as source of truth**
+   - Use actual parameter names, types, defaults from code
+   - Code is what actually runs, docs might be outdated
+
+2. **Keep documentation descriptions**
+   - Docs are user-friendly, code comments might be technical
+   - Keep the docs' explanation of what the API does
+
+3. **Add implementation notes for discrepancies**
+   - If docs differ from code, explain the difference
+   - Example: "⚠️ The `snap` parameter exists in code but is not documented"
+
+4. **Flag missing APIs clearly**
+   - Missing in docs → Add [UNDOCUMENTED] tag
+   - Missing in code → Add [REMOVED] or [DOCS_ERROR] tag
+
+5. **Create unified API reference**
+   - One definitive signature per API
+   - Clear warnings about conflicts
+   - Implementation notes where helpful
+
+## Output Format:
+Create `merged_apis.json` with this structure:
+
+```json
+{
+  "apis": {
+    "API.name": {
+      "signature": "final_signature_here",
+      "parameters": [...],
+      "return_type": "type",
+      "description": "user-friendly description",
+      "implementation_notes": "Any discrepancies or warnings",
+      "source": "both|docs_only|code_only",
+      "confidence": "high|medium|low"
+    }
+  }
+}
+```
+
+Take your time to analyze each conflict carefully. The goal is to create the most accurate and helpful API reference possible.
+"""
+
+        instructions_file = os.path.join(workspace, 'MERGE_INSTRUCTIONS.md')
+        with open(instructions_file, 'w') as f:
+            f.write(instructions)
+
+        logger.info(f"Wrote context files to {workspace}")
+
+    def _count_by_field(self, field: str) -> Dict[str, int]:
+        """Count conflicts by a specific field."""
+        counts = {}
+        for conflict in self.conflicts:
+            value = getattr(conflict, field)
+            counts[value] = counts.get(value, 0) + 1
+        return counts
+
+    def _launch_claude_merge(self, workspace: str):
+        """
+        Launch Claude Code to perform merge.
+
+        Similar to enhance_skill_local.py approach.
+        """
+        # Create a script that Claude will execute
+        script_path = os.path.join(workspace, 'merge_script.sh')
+
+        script_content = f"""#!/bin/bash
+# Automatic merge script for Claude Code
+
+cd "{workspace}"
+
+echo "📊 Analyzing conflicts..."
+cat conflicts.json | head -20
+
+echo ""
+echo "📖 Documentation APIs: $(cat docs_apis.json | grep -c '\"name\"')"
+echo "💻 Code APIs: $(cat code_apis.json | grep -c '\"name\"')"
+echo ""
+echo "Please review the conflicts and create merged_apis.json"
+echo "Follow the instructions in MERGE_INSTRUCTIONS.md"
+echo ""
+echo "When done, save merged_apis.json and close this terminal."
+
+# Wait for user to complete merge
+read -p "Press Enter when merge is complete..."
+"""
+
+        with open(script_path, 'w') as f:
+            f.write(script_content)
+
+        os.chmod(script_path, 0o755)
+
+        # Open new terminal with Claude Code
+        # Try different terminal emulators
+        terminals = [
+            ['x-terminal-emulator', '-e'],
+            ['gnome-terminal', '--'],
+            ['xterm', '-e'],
+            ['konsole', '-e']
+        ]
+
+        for terminal_cmd in terminals:
+            try:
+                cmd = terminal_cmd + ['bash', script_path]
+                subprocess.Popen(cmd)
+                logger.info(f"Opened terminal with {terminal_cmd[0]}")
+                break
+            except FileNotFoundError:
+                continue
+
+        # Wait for merge to complete
+        merged_file = os.path.join(workspace, 'merged_apis.json')
+        logger.info(f"Waiting for merged results at: {merged_file}")
+        logger.info("Close the terminal when done to continue...")
+
+        # Poll for file existence
+        import time
+        timeout = 3600  # 1 hour max
+        elapsed = 0
+        while not os.path.exists(merged_file) and elapsed < timeout:
+            time.sleep(5)
+            elapsed += 5
+
+        if not os.path.exists(merged_file):
+            raise TimeoutError("Claude merge timed out after 1 hour")
+
+    def _read_merged_results(self, workspace: str) -> Dict[str, Any]:
+        """Read merged results from workspace."""
+        merged_file = os.path.join(workspace, 'merged_apis.json')
+
+        if not os.path.exists(merged_file):
+            raise FileNotFoundError(f"Merged results not found: {merged_file}")
+
+        with open(merged_file, 'r') as f:
+            merged_data = json.load(f)
+
+        return {
+            'merge_mode': 'claude-enhanced',
+            **merged_data
+        }
+
+
+def merge_sources(docs_data_path: str,
+                  github_data_path: str,
+                  output_path: str,
+                  mode: str = 'rule-based') -> Dict[str, Any]:
+    """
+    Merge documentation and GitHub data.
+
+    Args:
+        docs_data_path: Path to documentation data JSON
+        github_data_path: Path to GitHub data JSON
+        output_path: Path to save merged output
+        mode: 'rule-based' or 'claude-enhanced'
+
+    Returns:
+        Merged data dict
+    """
+    # Load data
+    with open(docs_data_path, 'r') as f:
+        docs_data = json.load(f)
+
+    with open(github_data_path, 'r') as f:
+        github_data = json.load(f)
+
+    # Detect conflicts
+    detector = ConflictDetector(docs_data, github_data)
+    conflicts = detector.detect_all_conflicts()
+
+    logger.info(f"Detected {len(conflicts)} conflicts")
+
+    # Merge based on mode
+    if mode == 'claude-enhanced':
+        merger = ClaudeEnhancedMerger(docs_data, github_data, conflicts)
+    else:
+        merger = RuleBasedMerger(docs_data, github_data, conflicts)
+
+    merged_data = merger.merge_all()
+
+    # Save merged data
+    with open(output_path, 'w') as f:
+        json.dump(merged_data, f, indent=2, ensure_ascii=False)
+
+    logger.info(f"Merged data saved to: {output_path}")
+
+    return merged_data
+
+
+if __name__ == '__main__':
+    import argparse
+
+    parser = argparse.ArgumentParser(description='Merge documentation and code sources')
+    parser.add_argument('docs_data', help='Path to documentation data JSON')
+    parser.add_argument('github_data', help='Path to GitHub data JSON')
+    parser.add_argument('--output', '-o', default='merged_data.json', help='Output file path')
+    parser.add_argument('--mode', '-m', choices=['rule-based', 'claude-enhanced'],
+                       default='rule-based', help='Merge mode')
+
+    args = parser.parse_args()
+
+    merged = merge_sources(args.docs_data, args.github_data, args.output, args.mode)
+
+    # Print summary
+    summary = merged.get('summary', {})
+    print(f"\n✅ Merge complete ({merged.get('merge_mode')})")
+    print(f"   Total APIs: {summary.get('total_apis', 0)}")
+    print(f"   Matched: {summary.get('matched', 0)}")
+    print(f"   Docs only: {summary.get('docs_only', 0)}")
+    print(f"   Code only: {summary.get('code_only', 0)}")
+    print(f"   Conflicts: {summary.get('conflict', 0)}")
+    print(f"\n📄 Saved to: {args.output}")

+ 81 - 0
libs/external/Skill_Seekers-development/src/skill_seekers/cli/package_multi.py

@@ -0,0 +1,81 @@
+#!/usr/bin/env python3
+"""
+Multi-Skill Packager
+
+Package multiple skills at once. Useful for packaging router + sub-skills together.
+"""
+
+import sys
+import argparse
+from pathlib import Path
+import subprocess
+
+
+def package_skill(skill_dir: Path) -> bool:
+    """Package a single skill"""
+    try:
+        result = subprocess.run(
+            [sys.executable, str(Path(__file__).parent / "package_skill.py"), str(skill_dir)],
+            capture_output=True,
+            text=True
+        )
+        return result.returncode == 0
+    except Exception as e:
+        print(f"❌ Error packaging {skill_dir}: {e}")
+        return False
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Package multiple skills at once",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  # Package all godot skills
+  python3 package_multi.py output/godot*/
+
+  # Package specific skills
+  python3 package_multi.py output/godot-2d/ output/godot-3d/ output/godot-scripting/
+        """
+    )
+
+    parser.add_argument(
+        'skill_dirs',
+        nargs='+',
+        help='Skill directories to package'
+    )
+
+    args = parser.parse_args()
+
+    print(f"\n{'='*60}")
+    print(f"MULTI-SKILL PACKAGER")
+    print(f"{'='*60}\n")
+
+    skill_dirs = [Path(d) for d in args.skill_dirs]
+    success_count = 0
+    total_count = len(skill_dirs)
+
+    for skill_dir in skill_dirs:
+        if not skill_dir.exists():
+            print(f"⚠️  Skipping (not found): {skill_dir}")
+            continue
+
+        if not (skill_dir / "SKILL.md").exists():
+            print(f"⚠️  Skipping (no SKILL.md): {skill_dir}")
+            continue
+
+        print(f"📦 Packaging: {skill_dir.name}")
+        if package_skill(skill_dir):
+            success_count += 1
+            print(f"   ✅ Success")
+        else:
+            print(f"   ❌ Failed")
+        print("")
+
+    print(f"{'='*60}")
+    print(f"SUMMARY: {success_count}/{total_count} skills packaged")
+    print(f"{'='*60}\n")
+
+
+if __name__ == "__main__":
+    main()

+ 220 - 0
libs/external/Skill_Seekers-development/src/skill_seekers/cli/package_skill.py

@@ -0,0 +1,220 @@
+#!/usr/bin/env python3
+"""
+Simple Skill Packager
+Packages a skill directory into a .zip file for Claude.
+
+Usage:
+    skill-seekers package output/steam-inventory/
+    skill-seekers package output/react/
+    skill-seekers package output/react/ --no-open  # Don't open folder
+"""
+
+import os
+import sys
+import zipfile
+import argparse
+from pathlib import Path
+
+# Import utilities
+try:
+    from utils import (
+        open_folder,
+        print_upload_instructions,
+        format_file_size,
+        validate_skill_directory
+    )
+    from quality_checker import SkillQualityChecker, print_report
+except ImportError:
+    # If running from different directory, add cli to path
+    sys.path.insert(0, str(Path(__file__).parent))
+    from utils import (
+        open_folder,
+        print_upload_instructions,
+        format_file_size,
+        validate_skill_directory
+    )
+    from quality_checker import SkillQualityChecker, print_report
+
+
+def package_skill(skill_dir, open_folder_after=True, skip_quality_check=False):
+    """
+    Package a skill directory into a .zip file
+
+    Args:
+        skill_dir: Path to skill directory
+        open_folder_after: Whether to open the output folder after packaging
+        skip_quality_check: Skip quality checks before packaging
+
+    Returns:
+        tuple: (success, zip_path) where success is bool and zip_path is Path or None
+    """
+    skill_path = Path(skill_dir)
+
+    # Validate skill directory
+    is_valid, error_msg = validate_skill_directory(skill_path)
+    if not is_valid:
+        print(f"❌ Error: {error_msg}")
+        return False, None
+
+    # Run quality checks (unless skipped)
+    if not skip_quality_check:
+        print("\n" + "=" * 60)
+        print("QUALITY CHECK")
+        print("=" * 60)
+
+        checker = SkillQualityChecker(skill_path)
+        report = checker.check_all()
+
+        # Print report
+        print_report(report, verbose=False)
+
+        # If there are errors or warnings, ask user to confirm
+        if report.has_errors or report.has_warnings:
+            print("=" * 60)
+            response = input("\nContinue with packaging? (y/n): ").strip().lower()
+            if response != 'y':
+                print("\n❌ Packaging cancelled by user")
+                return False, None
+            print()
+        else:
+            print("=" * 60)
+            print()
+
+    # Create zip filename
+    skill_name = skill_path.name
+    zip_path = skill_path.parent / f"{skill_name}.zip"
+
+    print(f"📦 Packaging skill: {skill_name}")
+    print(f"   Source: {skill_path}")
+    print(f"   Output: {zip_path}")
+
+    # Create zip file
+    with zipfile.ZipFile(zip_path, 'w', zipfile.ZIP_DEFLATED) as zf:
+        for root, dirs, files in os.walk(skill_path):
+            # Skip backup files
+            files = [f for f in files if not f.endswith('.backup')]
+
+            for file in files:
+                file_path = Path(root) / file
+                arcname = file_path.relative_to(skill_path)
+                zf.write(file_path, arcname)
+                print(f"   + {arcname}")
+
+    # Get zip size
+    zip_size = zip_path.stat().st_size
+    print(f"\n✅ Package created: {zip_path}")
+    print(f"   Size: {zip_size:,} bytes ({format_file_size(zip_size)})")
+
+    # Open folder in file browser
+    if open_folder_after:
+        print(f"\n📂 Opening folder: {zip_path.parent}")
+        open_folder(zip_path.parent)
+
+    # Print upload instructions
+    print_upload_instructions(zip_path)
+
+    return True, zip_path
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Package a skill directory into a .zip file for Claude",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  # Package skill with quality checks (recommended)
+  skill-seekers package output/react/
+
+  # Package skill without opening folder
+  skill-seekers package output/react/ --no-open
+
+  # Skip quality checks (faster, but not recommended)
+  skill-seekers package output/react/ --skip-quality-check
+
+  # Package and auto-upload to Claude
+  skill-seekers package output/react/ --upload
+
+  # Get help
+  skill-seekers package --help
+        """
+    )
+
+    parser.add_argument(
+        'skill_dir',
+        help='Path to skill directory (e.g., output/react/)'
+    )
+
+    parser.add_argument(
+        '--no-open',
+        action='store_true',
+        help='Do not open the output folder after packaging'
+    )
+
+    parser.add_argument(
+        '--skip-quality-check',
+        action='store_true',
+        help='Skip quality checks before packaging'
+    )
+
+    parser.add_argument(
+        '--upload',
+        action='store_true',
+        help='Automatically upload to Claude after packaging (requires ANTHROPIC_API_KEY)'
+    )
+
+    args = parser.parse_args()
+
+    success, zip_path = package_skill(
+        args.skill_dir,
+        open_folder_after=not args.no_open,
+        skip_quality_check=args.skip_quality_check
+    )
+
+    if not success:
+        sys.exit(1)
+
+    # Auto-upload if requested
+    if args.upload:
+        # Check if API key is set BEFORE attempting upload
+        api_key = os.environ.get('ANTHROPIC_API_KEY', '').strip()
+
+        if not api_key:
+            # No API key - show helpful message but DON'T fail
+            print("\n" + "="*60)
+            print("💡 Automatic Upload")
+            print("="*60)
+            print()
+            print("To enable automatic upload:")
+            print("  1. Get API key from https://console.anthropic.com/")
+            print("  2. Set: export ANTHROPIC_API_KEY=sk-ant-...")
+            print("  3. Run package_skill.py with --upload flag")
+            print()
+            print("For now, use manual upload (instructions above) ☝️")
+            print("="*60)
+            # Exit successfully - packaging worked!
+            sys.exit(0)
+
+        # API key exists - try upload
+        try:
+            from upload_skill import upload_skill_api
+            print("\n" + "="*60)
+            upload_success, message = upload_skill_api(zip_path)
+            if not upload_success:
+                print(f"❌ Upload failed: {message}")
+                print()
+                print("💡 Try manual upload instead (instructions above) ☝️")
+                print("="*60)
+                # Exit successfully - packaging worked even if upload failed
+                sys.exit(0)
+            else:
+                print("="*60)
+                sys.exit(0)
+        except ImportError:
+            print("\n❌ Error: upload_skill.py not found")
+            sys.exit(1)
+
+    sys.exit(0)
+
+
+if __name__ == "__main__":
+    main()

+ 1222 - 0
libs/external/Skill_Seekers-development/src/skill_seekers/cli/pdf_extractor_poc.py

@@ -0,0 +1,1222 @@
+#!/usr/bin/env python3
+"""
+PDF Text Extractor - Complete Feature Set (Tasks B1.2 + B1.3 + B1.4 + B1.5 + Priority 2 & 3)
+
+Extracts text, code blocks, and images from PDF documentation files.
+Uses PyMuPDF (fitz) for fast, high-quality extraction.
+
+Features:
+    - Text and markdown extraction
+    - Code block detection (font, indent, pattern)
+    - Language detection with confidence scoring (19+ languages) (B1.4)
+    - Syntax validation and quality scoring (B1.4)
+    - Quality statistics and filtering (B1.4)
+    - Image extraction to files (B1.5)
+    - Image filtering by size (B1.5)
+    - Page chunking and chapter detection (B1.3)
+    - Code block merging across pages (B1.3)
+
+Advanced Features (Priority 2 & 3):
+    - OCR support for scanned PDFs (requires pytesseract) (Priority 2)
+    - Password-protected PDF support (Priority 2)
+    - Table extraction (Priority 2)
+    - Parallel page processing (Priority 3)
+    - Caching of expensive operations (Priority 3)
+
+Usage:
+    # Basic extraction
+    python3 pdf_extractor_poc.py input.pdf
+    python3 pdf_extractor_poc.py input.pdf --output output.json
+    python3 pdf_extractor_poc.py input.pdf --verbose
+
+    # Quality filtering
+    python3 pdf_extractor_poc.py input.pdf --min-quality 5.0
+
+    # Image extraction
+    python3 pdf_extractor_poc.py input.pdf --extract-images
+    python3 pdf_extractor_poc.py input.pdf --extract-images --image-dir images/
+
+    # Advanced features
+    python3 pdf_extractor_poc.py scanned.pdf --ocr
+    python3 pdf_extractor_poc.py encrypted.pdf --password mypassword
+    python3 pdf_extractor_poc.py input.pdf --extract-tables
+    python3 pdf_extractor_poc.py large.pdf --parallel --workers 8
+
+Example:
+    python3 pdf_extractor_poc.py docs/manual.pdf -o output.json -v \
+        --chunk-size 15 --min-quality 6.0 --extract-images \
+        --extract-tables --parallel
+"""
+
+import os
+import sys
+import json
+import re
+import argparse
+from pathlib import Path
+
+# Check if PyMuPDF is installed
+try:
+    import fitz  # PyMuPDF
+except ImportError:
+    print("ERROR: PyMuPDF not installed")
+    print("Install with: pip install PyMuPDF")
+    sys.exit(1)
+
+# Optional dependencies for advanced features
+try:
+    import pytesseract
+    from PIL import Image
+    TESSERACT_AVAILABLE = True
+except ImportError:
+    TESSERACT_AVAILABLE = False
+
+try:
+    import concurrent.futures
+    CONCURRENT_AVAILABLE = True
+except ImportError:
+    CONCURRENT_AVAILABLE = False
+
+
+class PDFExtractor:
+    """Extract text and code from PDF documentation"""
+
+    def __init__(self, pdf_path, verbose=False, chunk_size=10, min_quality=0.0,
+                 extract_images=False, image_dir=None, min_image_size=100,
+                 use_ocr=False, password=None, extract_tables=False,
+                 parallel=False, max_workers=None, use_cache=True):
+        self.pdf_path = pdf_path
+        self.verbose = verbose
+        self.chunk_size = chunk_size  # Pages per chunk (0 = no chunking)
+        self.min_quality = min_quality  # Minimum quality score (0-10)
+        self.extract_images = extract_images  # Extract images to files (NEW in B1.5)
+        self.image_dir = image_dir  # Directory to save images (NEW in B1.5)
+        self.min_image_size = min_image_size  # Minimum image dimension (NEW in B1.5)
+
+        # Advanced features (Priority 2 & 3)
+        self.use_ocr = use_ocr  # OCR for scanned PDFs (Priority 2)
+        self.password = password  # Password for encrypted PDFs (Priority 2)
+        self.extract_tables = extract_tables  # Extract tables (Priority 2)
+        self.parallel = parallel  # Parallel processing (Priority 3)
+        self.max_workers = max_workers or os.cpu_count()  # Worker threads (Priority 3)
+        self.use_cache = use_cache  # Cache expensive operations (Priority 3)
+
+        self.doc = None
+        self.pages = []
+        self.chapters = []  # Detected chapters/sections
+        self.extracted_images = []  # List of extracted image info (NEW in B1.5)
+        self._cache = {}  # Cache for expensive operations (Priority 3)
+
+    def log(self, message):
+        """Print message if verbose mode enabled"""
+        if self.verbose:
+            print(message)
+
+    def extract_text_with_ocr(self, page):
+        """
+        Extract text from scanned PDF page using OCR (Priority 2).
+        Falls back to regular text extraction if OCR is not available.
+
+        Args:
+            page: PyMuPDF page object
+
+        Returns:
+            str: Extracted text
+        """
+        # Try regular text extraction first
+        text = page.get_text("text").strip()
+
+        # If page has very little text, it might be scanned
+        if len(text) < 50 and self.use_ocr:
+            if not TESSERACT_AVAILABLE:
+                self.log("⚠️  OCR requested but pytesseract not installed")
+                self.log("   Install with: pip install pytesseract Pillow")
+                return text
+
+            try:
+                # Render page as image
+                pix = page.get_pixmap()
+                img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
+
+                # Run OCR
+                ocr_text = pytesseract.image_to_string(img)
+                self.log(f"   OCR extracted {len(ocr_text)} chars (was {len(text)})")
+                return ocr_text if len(ocr_text) > len(text) else text
+
+            except Exception as e:
+                self.log(f"   OCR failed: {e}")
+                return text
+
+        return text
+
+    def extract_tables_from_page(self, page):
+        """
+        Extract tables from PDF page (Priority 2).
+        Uses PyMuPDF's table detection.
+
+        Args:
+            page: PyMuPDF page object
+
+        Returns:
+            list: List of extracted tables as dicts
+        """
+        if not self.extract_tables:
+            return []
+
+        tables = []
+        try:
+            # PyMuPDF table extraction
+            tabs = page.find_tables()
+            for idx, tab in enumerate(tabs.tables):
+                table_data = {
+                    'table_index': idx,
+                    'rows': tab.extract(),
+                    'bbox': tab.bbox,
+                    'row_count': len(tab.extract()),
+                    'col_count': len(tab.extract()[0]) if tab.extract() else 0
+                }
+                tables.append(table_data)
+                self.log(f"   Found table {idx}: {table_data['row_count']}x{table_data['col_count']}")
+
+        except Exception as e:
+            self.log(f"   Table extraction failed: {e}")
+
+        return tables
+
+    def get_cached(self, key):
+        """
+        Get cached value (Priority 3).
+
+        Args:
+            key: Cache key
+
+        Returns:
+            Cached value or None
+        """
+        if not self.use_cache:
+            return None
+        return self._cache.get(key)
+
+    def set_cached(self, key, value):
+        """
+        Set cached value (Priority 3).
+
+        Args:
+            key: Cache key
+            value: Value to cache
+        """
+        if self.use_cache:
+            self._cache[key] = value
+
+    def detect_language_from_code(self, code):
+        """
+        Detect programming language from code content using patterns.
+        Enhanced in B1.4 with confidence scoring.
+
+        Returns (language, confidence) tuple
+        """
+        code_lower = code.lower()
+
+        # Language detection patterns with weights
+        patterns = {
+            'python': [
+                (r'\bdef\s+\w+\s*\(', 3),
+                (r'\bimport\s+\w+', 2),
+                (r'\bclass\s+\w+:', 3),
+                (r'\bfrom\s+\w+\s+import', 2),
+                (r':\s*$', 1),  # Lines ending with :
+                (r'^\s{4}|\t', 1),  # Indentation
+            ],
+            'javascript': [
+                (r'\bfunction\s+\w+\s*\(', 3),
+                (r'\bconst\s+\w+\s*=', 2),
+                (r'\blet\s+\w+\s*=', 2),
+                (r'=>', 2),
+                (r'\bconsole\.log', 2),
+                (r'\bvar\s+\w+\s*=', 1),
+            ],
+            'java': [
+                (r'\bpublic\s+class\s+\w+', 4),
+                (r'\bprivate\s+\w+\s+\w+', 2),
+                (r'\bSystem\.out\.println', 3),
+                (r'\bpublic\s+static\s+void', 3),
+            ],
+            'cpp': [
+                (r'#include\s*<', 3),
+                (r'\bstd::', 3),
+                (r'\bnamespace\s+\w+', 2),
+                (r'cout\s*<<', 3),
+                (r'\bvoid\s+\w+\s*\(', 1),
+            ],
+            'c': [
+                (r'#include\s+<\w+\.h>', 4),
+                (r'\bprintf\s*\(', 3),
+                (r'\bmain\s*\(', 2),
+                (r'\bstruct\s+\w+', 2),
+            ],
+            'csharp': [
+                (r'\bnamespace\s+\w+', 3),
+                (r'\bpublic\s+class\s+\w+', 3),
+                (r'\busing\s+System', 3),
+            ],
+            'go': [
+                (r'\bfunc\s+\w+\s*\(', 3),
+                (r'\bpackage\s+\w+', 4),
+                (r':=', 2),
+                (r'\bfmt\.Print', 2),
+            ],
+            'rust': [
+                (r'\bfn\s+\w+\s*\(', 4),
+                (r'\blet\s+mut\s+\w+', 3),
+                (r'\bprintln!', 3),
+                (r'\bimpl\s+\w+', 2),
+            ],
+            'php': [
+                (r'<\?php', 5),
+                (r'\$\w+\s*=', 2),
+                (r'\bfunction\s+\w+\s*\(', 1),
+            ],
+            'ruby': [
+                (r'\bdef\s+\w+', 3),
+                (r'\bend\b', 2),
+                (r'\brequire\s+[\'"]', 2),
+            ],
+            'swift': [
+                (r'\bfunc\s+\w+\s*\(', 3),
+                (r'\bvar\s+\w+:', 2),
+                (r'\blet\s+\w+:', 2),
+            ],
+            'kotlin': [
+                (r'\bfun\s+\w+\s*\(', 4),
+                (r'\bval\s+\w+\s*=', 2),
+                (r'\bvar\s+\w+\s*=', 2),
+            ],
+            'shell': [
+                (r'#!/bin/bash', 5),
+                (r'#!/bin/sh', 5),
+                (r'\becho\s+', 1),
+                (r'\$\{?\w+\}?', 1),
+            ],
+            'sql': [
+                (r'\bSELECT\s+', 4),
+                (r'\bFROM\s+', 3),
+                (r'\bWHERE\s+', 2),
+                (r'\bINSERT\s+INTO', 4),
+                (r'\bCREATE\s+TABLE', 4),
+            ],
+            'html': [
+                (r'<html', 4),
+                (r'<div', 2),
+                (r'<span', 2),
+                (r'<script', 2),
+            ],
+            'css': [
+                (r'\{\s*[\w-]+\s*:', 3),
+                (r'@media', 3),
+                (r'\.[\w-]+\s*\{', 2),
+            ],
+            'json': [
+                (r'^\s*\{', 2),
+                (r'^\s*\[', 2),
+                (r'"\w+"\s*:', 3),
+            ],
+            'yaml': [
+                (r'^\w+:', 2),
+                (r'^\s+-\s+\w+', 2),
+            ],
+            'xml': [
+                (r'<\?xml', 5),
+                (r'<\w+>', 1),
+            ],
+        }
+
+        # Calculate confidence scores for each language
+        scores = {}
+        for lang, lang_patterns in patterns.items():
+            score = 0
+            for pattern, weight in lang_patterns:
+                if re.search(pattern, code, re.IGNORECASE | re.MULTILINE):
+                    score += weight
+            if score > 0:
+                scores[lang] = score
+
+        if not scores:
+            return 'unknown', 0
+
+        # Get language with highest score
+        best_lang = max(scores, key=scores.get)
+        confidence = min(scores[best_lang] / 10.0, 1.0)  # Normalize to 0-1
+
+        return best_lang, confidence
+
+    def validate_code_syntax(self, code, language):
+        """
+        Validate code syntax (basic checks).
+        Enhanced in B1.4 with syntax validation.
+
+        Returns (is_valid, issues) tuple
+        """
+        issues = []
+
+        # Common syntax checks
+        if not code.strip():
+            return False, ['Empty code block']
+
+        # Language-specific validation
+        if language == 'python':
+            # Check indentation consistency
+            lines = code.split('\n')
+            indent_chars = set()
+            for line in lines:
+                if line.startswith(' '):
+                    indent_chars.add('space')
+                elif line.startswith('\t'):
+                    indent_chars.add('tab')
+
+            if len(indent_chars) > 1:
+                issues.append('Mixed tabs and spaces')
+
+            # Check for unclosed brackets/parens
+            open_count = code.count('(') + code.count('[') + code.count('{')
+            close_count = code.count(')') + code.count(']') + code.count('}')
+            if abs(open_count - close_count) > 2:  # Allow small mismatch
+                issues.append('Unbalanced brackets')
+
+        elif language in ['javascript', 'java', 'cpp', 'c', 'csharp', 'go']:
+            # Check for balanced braces
+            open_braces = code.count('{')
+            close_braces = code.count('}')
+            if abs(open_braces - close_braces) > 1:
+                issues.append('Unbalanced braces')
+
+        elif language == 'json':
+            # Try to parse JSON
+            try:
+                json.loads(code)
+            except (json.JSONDecodeError, ValueError) as e:
+                issues.append(f'Invalid JSON syntax: {str(e)[:50]}')
+
+        # General checks
+        # Check if code looks like natural language (too many common words)
+        common_words = ['the', 'and', 'for', 'with', 'this', 'that', 'have', 'from']
+        word_count = sum(1 for word in common_words if word in code.lower())
+        if word_count > 5 and len(code.split()) < 50:
+            issues.append('May be natural language, not code')
+
+        # Check code/comment ratio
+        comment_lines = sum(1 for line in code.split('\n') if line.strip().startswith(('#', '//', '/*', '*', '--')))
+        total_lines = len([l for l in code.split('\n') if l.strip()])
+        if total_lines > 0 and comment_lines / total_lines > 0.7:
+            issues.append('Mostly comments')
+
+        return len(issues) == 0, issues
+
+    def score_code_quality(self, code, language, confidence):
+        """
+        Score the quality/usefulness of detected code block.
+        New in B1.4.
+
+        Returns quality score (0-10)
+        """
+        score = 5.0  # Start with neutral score
+
+        # Factor 1: Language detection confidence
+        score += confidence * 2.0
+
+        # Factor 2: Code length (not too short, not too long)
+        code_length = len(code.strip())
+        if 20 <= code_length <= 500:
+            score += 1.0
+        elif 500 < code_length <= 2000:
+            score += 0.5
+        elif code_length < 10:
+            score -= 2.0
+
+        # Factor 3: Number of lines
+        lines = [l for l in code.split('\n') if l.strip()]
+        if 2 <= len(lines) <= 50:
+            score += 1.0
+        elif len(lines) > 100:
+            score -= 1.0
+
+        # Factor 4: Has function/class definitions
+        if re.search(r'\b(def|function|class|func|fn|public class)\b', code):
+            score += 1.5
+
+        # Factor 5: Has meaningful variable names (not just x, y, i)
+        meaningful_vars = re.findall(r'\b[a-z_][a-z0-9_]{3,}\b', code.lower())
+        if len(meaningful_vars) >= 2:
+            score += 1.0
+
+        # Factor 6: Syntax validation
+        is_valid, issues = self.validate_code_syntax(code, language)
+        if is_valid:
+            score += 1.0
+        else:
+            score -= len(issues) * 0.5
+
+        # Clamp score to 0-10 range
+        return max(0, min(10, score))
+
+    def detect_code_blocks_by_font(self, page):
+        """
+        Detect code blocks by analyzing font properties.
+        Monospace fonts typically indicate code.
+
+        Returns list of detected code blocks with metadata.
+        """
+        code_blocks = []
+        blocks = page.get_text("dict")["blocks"]
+
+        monospace_fonts = ['courier', 'mono', 'consolas', 'menlo', 'monaco', 'dejavu']
+
+        current_code = []
+        current_font = None
+
+        for block in blocks:
+            if 'lines' not in block:
+                continue
+
+            for line in block['lines']:
+                for span in line['spans']:
+                    font = span['font'].lower()
+                    text = span['text']
+
+                    # Check if font is monospace
+                    is_monospace = any(mf in font for mf in monospace_fonts)
+
+                    if is_monospace:
+                        # Accumulate code text
+                        current_code.append(text)
+                        current_font = span['font']
+                    else:
+                        # End of code block
+                        if current_code:
+                            code_text = ''.join(current_code).strip()
+                            if len(code_text) > 10:  # Minimum code length
+                                lang, confidence = self.detect_language_from_code(code_text)
+                                quality = self.score_code_quality(code_text, lang, confidence)
+                                is_valid, issues = self.validate_code_syntax(code_text, lang)
+
+                                code_blocks.append({
+                                    'code': code_text,
+                                    'language': lang,
+                                    'confidence': confidence,
+                                    'quality_score': quality,
+                                    'is_valid': is_valid,
+                                    'validation_issues': issues if not is_valid else [],
+                                    'font': current_font,
+                                    'detection_method': 'font'
+                                })
+                            current_code = []
+                            current_font = None
+
+        # Handle final code block
+        if current_code:
+            code_text = ''.join(current_code).strip()
+            if len(code_text) > 10:
+                lang, confidence = self.detect_language_from_code(code_text)
+                quality = self.score_code_quality(code_text, lang, confidence)
+                is_valid, issues = self.validate_code_syntax(code_text, lang)
+
+                code_blocks.append({
+                    'code': code_text,
+                    'language': lang,
+                    'confidence': confidence,
+                    'quality_score': quality,
+                    'is_valid': is_valid,
+                    'validation_issues': issues if not is_valid else [],
+                    'font': current_font,
+                    'detection_method': 'font'
+                })
+
+        return code_blocks
+
+    def detect_code_blocks_by_indent(self, text):
+        """
+        Detect code blocks by indentation patterns.
+        Code often has consistent indentation.
+
+        Returns list of detected code blocks.
+        """
+        code_blocks = []
+        lines = text.split('\n')
+        current_block = []
+        indent_pattern = None
+
+        for line in lines:
+            # Check for indentation (4 spaces or tab)
+            if line.startswith('    ') or line.startswith('\t'):
+                # Start or continue code block
+                if not indent_pattern:
+                    indent_pattern = line[:4] if line.startswith('    ') else '\t'
+                current_block.append(line)
+            else:
+                # End of code block
+                if current_block and len(current_block) >= 2:  # At least 2 lines
+                    code_text = '\n'.join(current_block).strip()
+                    if len(code_text) > 20:  # Minimum code length
+                        lang, confidence = self.detect_language_from_code(code_text)
+                        quality = self.score_code_quality(code_text, lang, confidence)
+                        is_valid, issues = self.validate_code_syntax(code_text, lang)
+
+                        code_blocks.append({
+                            'code': code_text,
+                            'language': lang,
+                            'confidence': confidence,
+                            'quality_score': quality,
+                            'is_valid': is_valid,
+                            'validation_issues': issues if not is_valid else [],
+                            'detection_method': 'indent'
+                        })
+                current_block = []
+                indent_pattern = None
+
+        # Handle final block
+        if current_block and len(current_block) >= 2:
+            code_text = '\n'.join(current_block).strip()
+            if len(code_text) > 20:
+                lang, confidence = self.detect_language_from_code(code_text)
+                quality = self.score_code_quality(code_text, lang, confidence)
+                is_valid, issues = self.validate_code_syntax(code_text, lang)
+
+                code_blocks.append({
+                    'code': code_text,
+                    'language': lang,
+                    'confidence': confidence,
+                    'quality_score': quality,
+                    'is_valid': is_valid,
+                    'validation_issues': issues if not is_valid else [],
+                    'detection_method': 'indent'
+                })
+
+        return code_blocks
+
+    def detect_code_blocks_by_pattern(self, text):
+        """
+        Detect code blocks by common code patterns (keywords, syntax).
+
+        Returns list of detected code snippets.
+        """
+        code_blocks = []
+
+        # Common code patterns that span multiple lines
+        patterns = [
+            # Function definitions
+            (r'((?:def|function|func|fn|public|private)\s+\w+\s*\([^)]*\)\s*[{:]?[^}]*[}]?)', 'function'),
+            # Class definitions
+            (r'(class\s+\w+[^{]*\{[^}]*\})', 'class'),
+            # Import statements block
+            (r'((?:import|require|use|include)[^\n]+(?:\n(?:import|require|use|include)[^\n]+)*)', 'imports'),
+        ]
+
+        for pattern, block_type in patterns:
+            matches = re.finditer(pattern, text, re.MULTILINE | re.DOTALL)
+            for match in matches:
+                code_text = match.group(1).strip()
+                if len(code_text) > 15:
+                    lang, confidence = self.detect_language_from_code(code_text)
+                    quality = self.score_code_quality(code_text, lang, confidence)
+                    is_valid, issues = self.validate_code_syntax(code_text, lang)
+
+                    code_blocks.append({
+                        'code': code_text,
+                        'language': lang,
+                        'confidence': confidence,
+                        'quality_score': quality,
+                        'is_valid': is_valid,
+                        'validation_issues': issues if not is_valid else [],
+                        'detection_method': 'pattern',
+                        'pattern_type': block_type
+                    })
+
+        return code_blocks
+
+    def detect_chapter_start(self, page_data):
+        """
+        Detect if a page starts a new chapter/section.
+
+        Returns (is_chapter_start, chapter_title) tuple.
+        """
+        headings = page_data.get('headings', [])
+
+        # Check for h1 or h2 at start of page
+        if headings:
+            first_heading = headings[0]
+            # H1 headings are strong indicators of chapters
+            if first_heading['level'] in ['h1', 'h2']:
+                return True, first_heading['text']
+
+        # Check for specific chapter markers in text
+        text = page_data.get('text', '')
+        first_line = text.split('\n')[0] if text else ''
+
+        chapter_patterns = [
+            r'^Chapter\s+\d+',
+            r'^Part\s+\d+',
+            r'^Section\s+\d+',
+            r'^\d+\.\s+[A-Z]',  # "1. Introduction"
+        ]
+
+        for pattern in chapter_patterns:
+            if re.match(pattern, first_line, re.IGNORECASE):
+                return True, first_line.strip()
+
+        return False, None
+
+    def merge_continued_code_blocks(self, pages):
+        """
+        Merge code blocks that are split across pages.
+
+        Detects when a code block at the end of one page continues
+        on the next page.
+        """
+        for i in range(len(pages) - 1):
+            current_page = pages[i]
+            next_page = pages[i + 1]
+
+            # Check if current page has code blocks
+            if not current_page['code_samples']:
+                continue
+
+            # Get last code block of current page
+            last_code = current_page['code_samples'][-1]
+
+            # Check if next page starts with code
+            if not next_page['code_samples']:
+                continue
+
+            first_next_code = next_page['code_samples'][0]
+
+            # Same language and detection method = likely continuation
+            if (last_code['language'] == first_next_code['language'] and
+                last_code['detection_method'] == first_next_code['detection_method']):
+
+                # Check if last code block looks incomplete (doesn't end with closing brace/etc)
+                last_code_text = last_code['code'].rstrip()
+                continuation_indicators = [
+                    not last_code_text.endswith('}'),
+                    not last_code_text.endswith(';'),
+                    last_code_text.endswith(','),
+                    last_code_text.endswith('\\'),
+                ]
+
+                if any(continuation_indicators):
+                    # Merge the code blocks
+                    merged_code = last_code['code'] + '\n' + first_next_code['code']
+                    last_code['code'] = merged_code
+                    last_code['merged_from_next_page'] = True
+
+                    # Remove the first code block from next page
+                    next_page['code_samples'].pop(0)
+                    next_page['code_blocks_count'] -= 1
+
+                    self.log(f"  Merged code block from page {i+1} to {i+2}")
+
+        return pages
+
+    def create_chunks(self, pages):
+        """
+        Create chunks of pages for better organization.
+
+        Returns array of chunks, each containing:
+        - chunk_number
+        - start_page, end_page
+        - pages (array)
+        - chapter_title (if detected)
+        """
+        if self.chunk_size == 0:
+            # No chunking - return all pages as one chunk
+            return [{
+                'chunk_number': 1,
+                'start_page': 1,
+                'end_page': len(pages),
+                'pages': pages,
+                'chapter_title': None
+            }]
+
+        chunks = []
+        current_chunk = []
+        chunk_start = 0
+        current_chapter = None
+
+        for i, page in enumerate(pages):
+            # Check if this page starts a new chapter
+            is_chapter, chapter_title = self.detect_chapter_start(page)
+
+            if is_chapter and current_chunk:
+                # Save current chunk before starting new one
+                chunks.append({
+                    'chunk_number': len(chunks) + 1,
+                    'start_page': chunk_start + 1,
+                    'end_page': i,
+                    'pages': current_chunk,
+                    'chapter_title': current_chapter
+                })
+                current_chunk = []
+                chunk_start = i
+                current_chapter = chapter_title
+
+            if not current_chapter and is_chapter:
+                current_chapter = chapter_title
+
+            current_chunk.append(page)
+
+            # Check if chunk size reached (but don't break chapters)
+            if not is_chapter and len(current_chunk) >= self.chunk_size:
+                chunks.append({
+                    'chunk_number': len(chunks) + 1,
+                    'start_page': chunk_start + 1,
+                    'end_page': i + 1,
+                    'pages': current_chunk,
+                    'chapter_title': current_chapter
+                })
+                current_chunk = []
+                chunk_start = i + 1
+                current_chapter = None
+
+        # Add remaining pages as final chunk
+        if current_chunk:
+            chunks.append({
+                'chunk_number': len(chunks) + 1,
+                'start_page': chunk_start + 1,
+                'end_page': len(pages),
+                'pages': current_chunk,
+                'chapter_title': current_chapter
+            })
+
+        return chunks
+
+    def extract_images_from_page(self, page, page_num):
+        """
+        Extract images from a PDF page and save to disk (NEW in B1.5).
+
+        Returns list of extracted image metadata.
+        """
+        if not self.extract_images:
+            # Just count images, don't extract
+            return []
+
+        extracted = []
+        image_list = page.get_images()
+
+        for img_index, img in enumerate(image_list):
+            try:
+                xref = img[0]  # Image XREF number
+                base_image = self.doc.extract_image(xref)
+
+                if not base_image:
+                    continue
+
+                image_bytes = base_image["image"]
+                image_ext = base_image["ext"]  # png, jpeg, etc.
+                width = base_image.get("width", 0)
+                height = base_image.get("height", 0)
+
+                # Filter out small images (icons, bullets, etc.)
+                if width < self.min_image_size or height < self.min_image_size:
+                    self.log(f"    Skipping small image: {width}x{height}")
+                    continue
+
+                # Generate filename
+                pdf_basename = Path(self.pdf_path).stem
+                image_filename = f"{pdf_basename}_page{page_num+1}_img{img_index+1}.{image_ext}"
+
+                # Save image
+                image_path = Path(self.image_dir) / image_filename
+                image_path.parent.mkdir(parents=True, exist_ok=True)
+
+                with open(image_path, "wb") as f:
+                    f.write(image_bytes)
+
+                # Store metadata
+                image_info = {
+                    'filename': image_filename,
+                    'path': str(image_path),
+                    'page_number': page_num + 1,
+                    'width': width,
+                    'height': height,
+                    'format': image_ext,
+                    'size_bytes': len(image_bytes),
+                    'xref': xref
+                }
+
+                extracted.append(image_info)
+                self.extracted_images.append(image_info)
+                self.log(f"    Extracted image: {image_filename} ({width}x{height})")
+
+            except Exception as e:
+                self.log(f"    Error extracting image {img_index}: {e}")
+                continue
+
+        return extracted
+
+    def extract_page(self, page_num):
+        """
+        Extract content from a single PDF page.
+
+        Returns dict with page content, code blocks, and metadata.
+        """
+        # Check cache first (Priority 3)
+        cache_key = f"page_{page_num}"
+        cached = self.get_cached(cache_key)
+        if cached is not None:
+            self.log(f"  Page {page_num + 1}: Using cached data")
+            return cached
+
+        page = self.doc.load_page(page_num)
+
+        # Extract plain text (with OCR if enabled - Priority 2)
+        if self.use_ocr:
+            text = self.extract_text_with_ocr(page)
+        else:
+            text = page.get_text("text")
+
+        # Extract markdown (better structure preservation)
+        markdown = page.get_text("markdown")
+
+        # Extract tables (Priority 2)
+        tables = self.extract_tables_from_page(page)
+
+        # Get page images (for diagrams)
+        images = page.get_images()
+
+        # Extract images to files (NEW in B1.5)
+        extracted_images = self.extract_images_from_page(page, page_num)
+
+        # Detect code blocks using multiple methods
+        font_code_blocks = self.detect_code_blocks_by_font(page)
+        indent_code_blocks = self.detect_code_blocks_by_indent(text)
+        pattern_code_blocks = self.detect_code_blocks_by_pattern(text)
+
+        # Merge and deduplicate code blocks
+        all_code_blocks = font_code_blocks + indent_code_blocks + pattern_code_blocks
+
+        # Simple deduplication by code content
+        unique_code = {}
+        for block in all_code_blocks:
+            code_hash = hash(block['code'])
+            if code_hash not in unique_code:
+                unique_code[code_hash] = block
+            else:
+                # Keep the one with higher quality score
+                if block['quality_score'] > unique_code[code_hash]['quality_score']:
+                    unique_code[code_hash] = block
+
+        code_samples = list(unique_code.values())
+
+        # Filter by minimum quality (NEW in B1.4)
+        if self.min_quality > 0:
+            code_samples_before = len(code_samples)
+            code_samples = [c for c in code_samples if c['quality_score'] >= self.min_quality]
+            filtered_count = code_samples_before - len(code_samples)
+            if filtered_count > 0:
+                self.log(f"  Filtered out {filtered_count} low-quality code blocks (min_quality={self.min_quality})")
+
+        # Sort by quality score (highest first)
+        code_samples.sort(key=lambda x: x['quality_score'], reverse=True)
+
+        # Extract headings from markdown
+        headings = []
+        for line in markdown.split('\n'):
+            if line.startswith('#'):
+                level = len(line) - len(line.lstrip('#'))
+                text = line.lstrip('#').strip()
+                if text:
+                    headings.append({
+                        'level': f'h{level}',
+                        'text': text
+                    })
+
+        page_data = {
+            'page_number': page_num + 1,  # 1-indexed for humans
+            'text': text.strip(),
+            'markdown': markdown.strip(),
+            'headings': headings,
+            'code_samples': code_samples,
+            'images_count': len(images),
+            'extracted_images': extracted_images,  # NEW in B1.5
+            'tables': tables,  # NEW in Priority 2
+            'char_count': len(text),
+            'code_blocks_count': len(code_samples),
+            'tables_count': len(tables)  # NEW in Priority 2
+        }
+
+        # Cache the result (Priority 3)
+        self.set_cached(cache_key, page_data)
+
+        self.log(f"  Page {page_num + 1}: {len(text)} chars, {len(code_samples)} code blocks, {len(headings)} headings, {len(extracted_images)} images, {len(tables)} tables")
+
+        return page_data
+
+    def extract_all(self):
+        """
+        Extract content from all pages of the PDF.
+        Enhanced with password support and parallel processing.
+
+        Returns dict with metadata and pages array.
+        """
+        print(f"\n📄 Extracting from: {self.pdf_path}")
+
+        # Open PDF (with password support - Priority 2)
+        try:
+            self.doc = fitz.open(self.pdf_path)
+
+            # Handle encrypted PDFs (Priority 2)
+            if self.doc.is_encrypted:
+                if self.password:
+                    print(f"   🔐 PDF is encrypted, trying password...")
+                    if self.doc.authenticate(self.password):
+                        print(f"   ✅ Password accepted")
+                    else:
+                        print(f"   ❌ Invalid password")
+                        return None
+                else:
+                    print(f"   ❌ PDF is encrypted but no password provided")
+                    print(f"   Use --password option to provide password")
+                    return None
+
+        except Exception as e:
+            print(f"❌ Error opening PDF: {e}")
+            return None
+
+        print(f"   Pages: {len(self.doc)}")
+        print(f"   Metadata: {self.doc.metadata}")
+
+        # Set up image directory (NEW in B1.5)
+        if self.extract_images and not self.image_dir:
+            pdf_basename = Path(self.pdf_path).stem
+            self.image_dir = f"output/{pdf_basename}_images"
+            print(f"   Image directory: {self.image_dir}")
+
+        # Show feature status
+        if self.use_ocr:
+            status = "✅ enabled" if TESSERACT_AVAILABLE else "⚠️  not available (install pytesseract)"
+            print(f"   OCR: {status}")
+        if self.extract_tables:
+            print(f"   Table extraction: ✅ enabled")
+        if self.parallel:
+            status = "✅ enabled" if CONCURRENT_AVAILABLE else "⚠️  not available"
+            print(f"   Parallel processing: {status} ({self.max_workers} workers)")
+        if self.use_cache:
+            print(f"   Caching: ✅ enabled")
+
+        print("")
+
+        # Extract each page (with parallel processing - Priority 3)
+        if self.parallel and CONCURRENT_AVAILABLE and len(self.doc) > 5:
+            print(f"🚀 Extracting {len(self.doc)} pages in parallel ({self.max_workers} workers)...")
+            with concurrent.futures.ThreadPoolExecutor(max_workers=self.max_workers) as executor:
+                page_numbers = list(range(len(self.doc)))
+                self.pages = list(executor.map(self.extract_page, page_numbers))
+        else:
+            # Sequential extraction
+            for page_num in range(len(self.doc)):
+                page_data = self.extract_page(page_num)
+                self.pages.append(page_data)
+
+        # Merge code blocks that span across pages
+        self.log("\n🔗 Merging code blocks across pages...")
+        self.pages = self.merge_continued_code_blocks(self.pages)
+
+        # Create chunks
+        self.log(f"\n📦 Creating chunks (chunk_size={self.chunk_size})...")
+        chunks = self.create_chunks(self.pages)
+
+        # Build summary
+        total_chars = sum(p['char_count'] for p in self.pages)
+        total_code_blocks = sum(p['code_blocks_count'] for p in self.pages)
+        total_headings = sum(len(p['headings']) for p in self.pages)
+        total_images = sum(p['images_count'] for p in self.pages)
+        total_tables = sum(p['tables_count'] for p in self.pages)  # NEW in Priority 2
+
+        # Detect languages used
+        languages = {}
+        all_code_blocks_list = []
+        for page in self.pages:
+            for code in page['code_samples']:
+                lang = code['language']
+                languages[lang] = languages.get(lang, 0) + 1
+                all_code_blocks_list.append(code)
+
+        # Calculate quality statistics (NEW in B1.4)
+        quality_stats = {}
+        if all_code_blocks_list:
+            quality_scores = [c['quality_score'] for c in all_code_blocks_list]
+            confidences = [c['confidence'] for c in all_code_blocks_list]
+            valid_count = sum(1 for c in all_code_blocks_list if c['is_valid'])
+
+            quality_stats = {
+                'average_quality': sum(quality_scores) / len(quality_scores),
+                'average_confidence': sum(confidences) / len(confidences),
+                'valid_code_blocks': valid_count,
+                'invalid_code_blocks': total_code_blocks - valid_count,
+                'validation_rate': valid_count / total_code_blocks if total_code_blocks > 0 else 0,
+                'high_quality_blocks': sum(1 for s in quality_scores if s >= 7.0),
+                'medium_quality_blocks': sum(1 for s in quality_scores if 4.0 <= s < 7.0),
+                'low_quality_blocks': sum(1 for s in quality_scores if s < 4.0),
+            }
+
+        # Extract chapter information
+        chapters = []
+        for chunk in chunks:
+            if chunk['chapter_title']:
+                chapters.append({
+                    'title': chunk['chapter_title'],
+                    'start_page': chunk['start_page'],
+                    'end_page': chunk['end_page']
+                })
+
+        result = {
+            'source_file': self.pdf_path,
+            'metadata': self.doc.metadata,
+            'total_pages': len(self.doc),
+            'total_chars': total_chars,
+            'total_code_blocks': total_code_blocks,
+            'total_headings': total_headings,
+            'total_images': total_images,
+            'total_extracted_images': len(self.extracted_images),  # NEW in B1.5
+            'total_tables': total_tables,  # NEW in Priority 2
+            'image_directory': self.image_dir if self.extract_images else None,  # NEW in B1.5
+            'extracted_images': self.extracted_images,  # NEW in B1.5
+            'total_chunks': len(chunks),
+            'chapters': chapters,
+            'languages_detected': languages,
+            'quality_statistics': quality_stats,  # NEW in B1.4
+            'chunks': chunks,
+            'pages': self.pages  # Still include all pages for compatibility
+        }
+
+        # Close document
+        self.doc.close()
+
+        print(f"\n✅ Extraction complete:")
+        print(f"   Total characters: {total_chars:,}")
+        print(f"   Code blocks found: {total_code_blocks}")
+        print(f"   Headings found: {total_headings}")
+        print(f"   Images found: {total_images}")
+        if self.extract_images:
+            print(f"   Images extracted: {len(self.extracted_images)}")
+            if self.image_dir:
+                print(f"   Image directory: {self.image_dir}")
+        if self.extract_tables:
+            print(f"   Tables found: {total_tables}")
+        print(f"   Chunks created: {len(chunks)}")
+        print(f"   Chapters detected: {len(chapters)}")
+        print(f"   Languages detected: {', '.join(languages.keys())}")
+
+        # Print quality statistics (NEW in B1.4)
+        if quality_stats:
+            print(f"\n📊 Code Quality Statistics:")
+            print(f"   Average quality: {quality_stats['average_quality']:.1f}/10")
+            print(f"   Average confidence: {quality_stats['average_confidence']:.1%}")
+            print(f"   Valid code blocks: {quality_stats['valid_code_blocks']}/{total_code_blocks} ({quality_stats['validation_rate']:.1%})")
+            print(f"   High quality (7+): {quality_stats['high_quality_blocks']}")
+            print(f"   Medium quality (4-7): {quality_stats['medium_quality_blocks']}")
+            print(f"   Low quality (<4): {quality_stats['low_quality_blocks']}")
+
+        return result
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description='Extract text and code blocks from PDF documentation',
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  # Extract from PDF
+  python3 pdf_extractor_poc.py input.pdf
+
+  # Save to JSON file
+  python3 pdf_extractor_poc.py input.pdf --output result.json
+
+  # Verbose mode
+  python3 pdf_extractor_poc.py input.pdf --verbose
+
+  # Extract and save
+  python3 pdf_extractor_poc.py docs/python.pdf -o python_extracted.json -v
+        """
+    )
+
+    parser.add_argument('pdf_file', help='Path to PDF file to extract')
+    parser.add_argument('-o', '--output', help='Output JSON file path (default: print to stdout)')
+    parser.add_argument('-v', '--verbose', action='store_true', help='Verbose output')
+    parser.add_argument('--pretty', action='store_true', help='Pretty-print JSON output')
+    parser.add_argument('--chunk-size', type=int, default=10,
+                        help='Pages per chunk (0 = no chunking, default: 10)')
+    parser.add_argument('--no-merge', action='store_true',
+                        help='Disable merging code blocks across pages')
+    parser.add_argument('--min-quality', type=float, default=0.0,
+                        help='Minimum code quality score (0-10, default: 0 = no filtering)')
+    parser.add_argument('--extract-images', action='store_true',
+                        help='Extract images to files (NEW in B1.5)')
+    parser.add_argument('--image-dir', type=str, default=None,
+                        help='Directory to save extracted images (default: output/{pdf_name}_images)')
+    parser.add_argument('--min-image-size', type=int, default=100,
+                        help='Minimum image dimension in pixels (filters icons, default: 100)')
+
+    # Advanced features (Priority 2 & 3)
+    parser.add_argument('--ocr', action='store_true',
+                        help='Use OCR for scanned PDFs (requires pytesseract)')
+    parser.add_argument('--password', type=str, default=None,
+                        help='Password for encrypted PDF')
+    parser.add_argument('--extract-tables', action='store_true',
+                        help='Extract tables from PDF (Priority 2)')
+    parser.add_argument('--parallel', action='store_true',
+                        help='Process pages in parallel (Priority 3)')
+    parser.add_argument('--workers', type=int, default=None,
+                        help='Number of parallel workers (default: CPU count)')
+    parser.add_argument('--no-cache', action='store_true',
+                        help='Disable caching of expensive operations')
+
+    args = parser.parse_args()
+
+    # Validate input file
+    if not os.path.exists(args.pdf_file):
+        print(f"❌ Error: File not found: {args.pdf_file}")
+        sys.exit(1)
+
+    if not args.pdf_file.lower().endswith('.pdf'):
+        print(f"⚠️  Warning: File does not have .pdf extension")
+
+    # Extract
+    extractor = PDFExtractor(
+        args.pdf_file,
+        verbose=args.verbose,
+        chunk_size=args.chunk_size,
+        min_quality=args.min_quality,
+        extract_images=args.extract_images,
+        image_dir=args.image_dir,
+        min_image_size=args.min_image_size,
+        # Advanced features (Priority 2 & 3)
+        use_ocr=args.ocr,
+        password=args.password,
+        extract_tables=args.extract_tables,
+        parallel=args.parallel,
+        max_workers=args.workers,
+        use_cache=not args.no_cache
+    )
+    result = extractor.extract_all()
+
+    if result is None:
+        sys.exit(1)
+
+    # Output
+    if args.output:
+        # Save to file
+        with open(args.output, 'w', encoding='utf-8') as f:
+            if args.pretty:
+                json.dump(result, f, indent=2, ensure_ascii=False)
+            else:
+                json.dump(result, f, ensure_ascii=False)
+        print(f"\n💾 Saved to: {args.output}")
+    else:
+        # Print to stdout
+        if args.pretty:
+            print("\n" + json.dumps(result, indent=2, ensure_ascii=False))
+        else:
+            print(json.dumps(result, ensure_ascii=False))
+
+
+if __name__ == '__main__':
+    main()

+ 401 - 0
libs/external/Skill_Seekers-development/src/skill_seekers/cli/pdf_scraper.py

@@ -0,0 +1,401 @@
+#!/usr/bin/env python3
+"""
+PDF Documentation to Claude Skill Converter (Task B1.6)
+
+Converts PDF documentation into Claude AI skills.
+Uses pdf_extractor_poc.py for extraction, builds skill structure.
+
+Usage:
+    python3 pdf_scraper.py --config configs/manual_pdf.json
+    python3 pdf_scraper.py --pdf manual.pdf --name myskill
+    python3 pdf_scraper.py --from-json manual_extracted.json
+"""
+
+import os
+import sys
+import json
+import re
+import argparse
+from pathlib import Path
+
+# Import the PDF extractor
+from .pdf_extractor_poc import PDFExtractor
+
+
+class PDFToSkillConverter:
+    """Convert PDF documentation to Claude skill"""
+
+    def __init__(self, config):
+        self.config = config
+        self.name = config['name']
+        self.pdf_path = config.get('pdf_path', '')
+        self.description = config.get('description', f'Documentation skill for {self.name}')
+
+        # Paths
+        self.skill_dir = f"output/{self.name}"
+        self.data_file = f"output/{self.name}_extracted.json"
+
+        # Extraction options
+        self.extract_options = config.get('extract_options', {})
+
+        # Categories
+        self.categories = config.get('categories', {})
+
+        # Extracted data
+        self.extracted_data = None
+
+    def extract_pdf(self):
+        """Extract content from PDF using pdf_extractor_poc.py"""
+        print(f"\n🔍 Extracting from PDF: {self.pdf_path}")
+
+        # Create extractor with options
+        extractor = PDFExtractor(
+            self.pdf_path,
+            verbose=True,
+            chunk_size=self.extract_options.get('chunk_size', 10),
+            min_quality=self.extract_options.get('min_quality', 5.0),
+            extract_images=self.extract_options.get('extract_images', True),
+            image_dir=f"{self.skill_dir}/assets/images",
+            min_image_size=self.extract_options.get('min_image_size', 100)
+        )
+
+        # Extract
+        result = extractor.extract_all()
+
+        if not result:
+            print("❌ Extraction failed")
+            raise RuntimeError(f"Failed to extract PDF: {self.pdf_path}")
+
+        # Save extracted data
+        with open(self.data_file, 'w', encoding='utf-8') as f:
+            json.dump(result, f, indent=2, ensure_ascii=False)
+
+        print(f"\n💾 Saved extracted data to: {self.data_file}")
+        self.extracted_data = result
+        return True
+
+    def load_extracted_data(self, json_path):
+        """Load previously extracted data from JSON"""
+        print(f"\n📂 Loading extracted data from: {json_path}")
+
+        with open(json_path, 'r', encoding='utf-8') as f:
+            self.extracted_data = json.load(f)
+
+        print(f"✅ Loaded {self.extracted_data['total_pages']} pages")
+        return True
+
+    def categorize_content(self):
+        """Categorize pages based on chapters or keywords"""
+        print(f"\n📋 Categorizing content...")
+
+        categorized = {}
+
+        # Use chapters if available
+        if self.extracted_data.get('chapters'):
+            for chapter in self.extracted_data['chapters']:
+                category_key = self._sanitize_filename(chapter['title'])
+                categorized[category_key] = {
+                    'title': chapter['title'],
+                    'pages': []
+                }
+
+            # Assign pages to chapters
+            for page in self.extracted_data['pages']:
+                page_num = page['page_number']
+
+                # Find which chapter this page belongs to
+                for chapter in self.extracted_data['chapters']:
+                    if chapter['start_page'] <= page_num <= chapter['end_page']:
+                        category_key = self._sanitize_filename(chapter['title'])
+                        categorized[category_key]['pages'].append(page)
+                        break
+
+        # Fall back to keyword-based categorization
+        elif self.categories:
+            # Check if categories is already in the right format (for tests)
+            # If first value is a list of dicts (pages), use as-is
+            first_value = next(iter(self.categories.values()))
+            if isinstance(first_value, list) and first_value and isinstance(first_value[0], dict):
+                # Already categorized - convert to expected format
+                for cat_key, pages in self.categories.items():
+                    categorized[cat_key] = {
+                        'title': cat_key.replace('_', ' ').title(),
+                        'pages': pages
+                    }
+            else:
+                # Keyword-based categorization
+                # Initialize categories
+                for cat_key, keywords in self.categories.items():
+                    categorized[cat_key] = {
+                        'title': cat_key.replace('_', ' ').title(),
+                        'pages': []
+                    }
+
+                # Categorize by keywords
+                for page in self.extracted_data['pages']:
+                    text = page.get('text', '').lower()
+                    headings_text = ' '.join([h['text'] for h in page.get('headings', [])]).lower()
+
+                    # Score against each category
+                    scores = {}
+                    for cat_key, keywords in self.categories.items():
+                        # Handle both string keywords and dict keywords (shouldn't happen, but be safe)
+                        if isinstance(keywords, list):
+                            score = sum(1 for kw in keywords
+                                      if isinstance(kw, str) and (kw.lower() in text or kw.lower() in headings_text))
+                        else:
+                            score = 0
+                        if score > 0:
+                            scores[cat_key] = score
+
+                    # Assign to highest scoring category
+                    if scores:
+                        best_cat = max(scores, key=scores.get)
+                        categorized[best_cat]['pages'].append(page)
+                    else:
+                        # Default category
+                        if 'other' not in categorized:
+                            categorized['other'] = {'title': 'Other', 'pages': []}
+                        categorized['other']['pages'].append(page)
+
+        else:
+            # No categorization - use single category
+            categorized['content'] = {
+                'title': 'Content',
+                'pages': self.extracted_data['pages']
+            }
+
+        print(f"✅ Created {len(categorized)} categories")
+        for cat_key, cat_data in categorized.items():
+            print(f"   - {cat_data['title']}: {len(cat_data['pages'])} pages")
+
+        return categorized
+
+    def build_skill(self):
+        """Build complete skill structure"""
+        print(f"\n🏗️  Building skill: {self.name}")
+
+        # Create directories
+        os.makedirs(f"{self.skill_dir}/references", exist_ok=True)
+        os.makedirs(f"{self.skill_dir}/scripts", exist_ok=True)
+        os.makedirs(f"{self.skill_dir}/assets", exist_ok=True)
+
+        # Categorize content
+        categorized = self.categorize_content()
+
+        # Generate reference files
+        print(f"\n📝 Generating reference files...")
+        for cat_key, cat_data in categorized.items():
+            self._generate_reference_file(cat_key, cat_data)
+
+        # Generate index
+        self._generate_index(categorized)
+
+        # Generate SKILL.md
+        self._generate_skill_md(categorized)
+
+        print(f"\n✅ Skill built successfully: {self.skill_dir}/")
+        print(f"\n📦 Next step: Package with: skill-seekers package {self.skill_dir}/")
+
+    def _generate_reference_file(self, cat_key, cat_data):
+        """Generate a reference markdown file for a category"""
+        filename = f"{self.skill_dir}/references/{cat_key}.md"
+
+        with open(filename, 'w', encoding='utf-8') as f:
+            f.write(f"# {cat_data['title']}\n\n")
+
+            for page in cat_data['pages']:
+                # Add headings as section markers
+                if page.get('headings'):
+                    f.write(f"## {page['headings'][0]['text']}\n\n")
+
+                # Add text content
+                if page.get('text'):
+                    # Limit to first 1000 chars per page to avoid huge files
+                    text = page['text'][:1000]
+                    f.write(f"{text}\n\n")
+
+                # Add code samples (check both 'code_samples' and 'code_blocks' for compatibility)
+                code_list = page.get('code_samples') or page.get('code_blocks')
+                if code_list:
+                    f.write("### Code Examples\n\n")
+                    for code in code_list[:3]:  # Limit to top 3
+                        lang = code.get('language', '')
+                        f.write(f"```{lang}\n{code['code']}\n```\n\n")
+
+                # Add images
+                if page.get('images'):
+                    # Create assets directory if needed
+                    assets_dir = os.path.join(self.skill_dir, 'assets')
+                    os.makedirs(assets_dir, exist_ok=True)
+
+                    f.write("### Images\n\n")
+                    for img in page['images']:
+                        # Save image to assets
+                        img_filename = f"page_{page['page_number']}_img_{img['index']}.png"
+                        img_path = os.path.join(assets_dir, img_filename)
+
+                        with open(img_path, 'wb') as img_file:
+                            img_file.write(img['data'])
+
+                        # Add markdown image reference
+                        f.write(f"![Image {img['index']}](../assets/{img_filename})\n\n")
+
+                f.write("---\n\n")
+
+        print(f"   Generated: {filename}")
+
+    def _generate_index(self, categorized):
+        """Generate reference index"""
+        filename = f"{self.skill_dir}/references/index.md"
+
+        with open(filename, 'w', encoding='utf-8') as f:
+            f.write(f"# {self.name.title()} Documentation Reference\n\n")
+            f.write("## Categories\n\n")
+
+            for cat_key, cat_data in categorized.items():
+                page_count = len(cat_data['pages'])
+                f.write(f"- [{cat_data['title']}]({cat_key}.md) ({page_count} pages)\n")
+
+            f.write("\n## Statistics\n\n")
+            stats = self.extracted_data.get('quality_statistics', {})
+            f.write(f"- Total pages: {self.extracted_data.get('total_pages', 0)}\n")
+            f.write(f"- Code blocks: {self.extracted_data.get('total_code_blocks', 0)}\n")
+            f.write(f"- Images: {self.extracted_data.get('total_images', 0)}\n")
+            if stats:
+                f.write(f"- Average code quality: {stats.get('average_quality', 0):.1f}/10\n")
+                f.write(f"- Valid code blocks: {stats.get('valid_code_blocks', 0)}\n")
+
+        print(f"   Generated: {filename}")
+
+    def _generate_skill_md(self, categorized):
+        """Generate main SKILL.md file"""
+        filename = f"{self.skill_dir}/SKILL.md"
+
+        # Generate skill name (lowercase, hyphens only, max 64 chars)
+        skill_name = self.name.lower().replace('_', '-').replace(' ', '-')[:64]
+
+        # Truncate description to 1024 chars if needed
+        desc = self.description[:1024] if len(self.description) > 1024 else self.description
+
+        with open(filename, 'w', encoding='utf-8') as f:
+            # Write YAML frontmatter
+            f.write(f"---\n")
+            f.write(f"name: {skill_name}\n")
+            f.write(f"description: {desc}\n")
+            f.write(f"---\n\n")
+
+            f.write(f"# {self.name.title()} Documentation Skill\n\n")
+            f.write(f"{self.description}\n\n")
+
+            f.write("## When to use this skill\n\n")
+            f.write(f"Use this skill when the user asks about {self.name} documentation, ")
+            f.write("including API references, tutorials, examples, and best practices.\n\n")
+
+            f.write("## What's included\n\n")
+            f.write("This skill contains:\n\n")
+            for cat_key, cat_data in categorized.items():
+                f.write(f"- **{cat_data['title']}**: {len(cat_data['pages'])} pages\n")
+
+            f.write("\n## Quick Reference\n\n")
+
+            # Get high-quality code samples
+            all_code = []
+            for page in self.extracted_data['pages']:
+                all_code.extend(page.get('code_samples', []))
+
+            # Sort by quality and get top 5
+            all_code.sort(key=lambda x: x.get('quality_score', 0), reverse=True)
+            top_code = all_code[:5]
+
+            if top_code:
+                f.write("### Top Code Examples\n\n")
+                for i, code in enumerate(top_code, 1):
+                    lang = code['language']
+                    quality = code.get('quality_score', 0)
+                    f.write(f"**Example {i}** (Quality: {quality:.1f}/10):\n\n")
+                    f.write(f"```{lang}\n{code['code'][:300]}...\n```\n\n")
+
+            f.write("## Navigation\n\n")
+            f.write("See `references/index.md` for complete documentation structure.\n\n")
+
+            # Add language statistics
+            langs = self.extracted_data.get('languages_detected', {})
+            if langs:
+                f.write("## Languages Covered\n\n")
+                for lang, count in sorted(langs.items(), key=lambda x: x[1], reverse=True):
+                    f.write(f"- {lang}: {count} examples\n")
+
+        print(f"   Generated: {filename}")
+
+    def _sanitize_filename(self, name):
+        """Convert string to safe filename"""
+        # Remove special chars, replace spaces with underscores
+        safe = re.sub(r'[^\w\s-]', '', name.lower())
+        safe = re.sub(r'[-\s]+', '_', safe)
+        return safe
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description='Convert PDF documentation to Claude skill',
+        formatter_class=argparse.RawDescriptionHelpFormatter
+    )
+
+    parser.add_argument('--config', help='PDF config JSON file')
+    parser.add_argument('--pdf', help='Direct PDF file path')
+    parser.add_argument('--name', help='Skill name (with --pdf)')
+    parser.add_argument('--from-json', help='Build skill from extracted JSON')
+    parser.add_argument('--description', help='Skill description')
+
+    args = parser.parse_args()
+
+    # Validate inputs
+    if not (args.config or args.pdf or args.from_json):
+        parser.error("Must specify --config, --pdf, or --from-json")
+
+    # Load or create config
+    if args.config:
+        with open(args.config, 'r') as f:
+            config = json.load(f)
+    elif args.from_json:
+        # Build from extracted JSON
+        name = Path(args.from_json).stem.replace('_extracted', '')
+        config = {
+            'name': name,
+            'description': args.description or f'Documentation skill for {name}'
+        }
+        converter = PDFToSkillConverter(config)
+        converter.load_extracted_data(args.from_json)
+        converter.build_skill()
+        return
+    else:
+        # Direct PDF mode
+        if not args.name:
+            parser.error("Must specify --name with --pdf")
+        config = {
+            'name': args.name,
+            'pdf_path': args.pdf,
+            'description': args.description or f'Documentation skill for {args.name}',
+            'extract_options': {
+                'chunk_size': 10,
+                'min_quality': 5.0,
+                'extract_images': True,
+                'min_image_size': 100
+            }
+        }
+
+    # Create converter
+    converter = PDFToSkillConverter(config)
+
+    # Extract if needed
+    if config.get('pdf_path'):
+        if not converter.extract_pdf():
+            sys.exit(1)
+
+    # Build skill
+    converter.build_skill()
+
+
+if __name__ == '__main__':
+    main()

+ 480 - 0
libs/external/Skill_Seekers-development/src/skill_seekers/cli/quality_checker.py

@@ -0,0 +1,480 @@
+#!/usr/bin/env python3
+"""
+Quality Checker for Claude Skills
+Validates skill quality, checks links, and generates quality reports.
+
+Usage:
+    python3 quality_checker.py output/react/
+    python3 quality_checker.py output/godot/ --verbose
+"""
+
+import os
+import re
+import sys
+from pathlib import Path
+from typing import Dict, List, Tuple, Optional
+from dataclasses import dataclass, field
+
+
+@dataclass
+class QualityIssue:
+    """Represents a quality issue found during validation."""
+    level: str  # 'error', 'warning', 'info'
+    category: str  # 'enhancement', 'content', 'links', 'structure'
+    message: str
+    file: Optional[str] = None
+    line: Optional[int] = None
+
+
+@dataclass
+class QualityReport:
+    """Complete quality report for a skill."""
+    skill_name: str
+    skill_path: Path
+    errors: List[QualityIssue] = field(default_factory=list)
+    warnings: List[QualityIssue] = field(default_factory=list)
+    info: List[QualityIssue] = field(default_factory=list)
+
+    def add_error(self, category: str, message: str, file: str = None, line: int = None):
+        """Add an error to the report."""
+        self.errors.append(QualityIssue('error', category, message, file, line))
+
+    def add_warning(self, category: str, message: str, file: str = None, line: int = None):
+        """Add a warning to the report."""
+        self.warnings.append(QualityIssue('warning', category, message, file, line))
+
+    def add_info(self, category: str, message: str, file: str = None, line: int = None):
+        """Add info to the report."""
+        self.info.append(QualityIssue('info', category, message, file, line))
+
+    @property
+    def has_errors(self) -> bool:
+        """Check if there are any errors."""
+        return len(self.errors) > 0
+
+    @property
+    def has_warnings(self) -> bool:
+        """Check if there are any warnings."""
+        return len(self.warnings) > 0
+
+    @property
+    def is_excellent(self) -> bool:
+        """Check if quality is excellent (no errors, no warnings)."""
+        return not self.has_errors and not self.has_warnings
+
+    @property
+    def quality_score(self) -> float:
+        """Calculate quality score (0-100)."""
+        # Start with perfect score
+        score = 100.0
+
+        # Deduct points for issues
+        score -= len(self.errors) * 15  # -15 per error
+        score -= len(self.warnings) * 5  # -5 per warning
+
+        # Never go below 0
+        return max(0.0, score)
+
+    @property
+    def quality_grade(self) -> str:
+        """Get quality grade (A-F)."""
+        score = self.quality_score
+        if score >= 90:
+            return 'A'
+        elif score >= 80:
+            return 'B'
+        elif score >= 70:
+            return 'C'
+        elif score >= 60:
+            return 'D'
+        else:
+            return 'F'
+
+
+class SkillQualityChecker:
+    """Validates skill quality and generates reports."""
+
+    def __init__(self, skill_dir: Path):
+        """Initialize quality checker.
+
+        Args:
+            skill_dir: Path to skill directory
+        """
+        self.skill_dir = Path(skill_dir)
+        self.skill_md_path = self.skill_dir / "SKILL.md"
+        self.references_dir = self.skill_dir / "references"
+        self.report = QualityReport(
+            skill_name=self.skill_dir.name,
+            skill_path=self.skill_dir
+        )
+
+    def check_all(self) -> QualityReport:
+        """Run all quality checks and return report.
+
+        Returns:
+            QualityReport: Complete quality report
+        """
+        # Basic structure checks
+        self._check_skill_structure()
+
+        # Enhancement verification
+        self._check_enhancement_quality()
+
+        # Content quality checks
+        self._check_content_quality()
+
+        # Link validation
+        self._check_links()
+
+        return self.report
+
+    def _check_skill_structure(self):
+        """Check basic skill structure."""
+        # Check SKILL.md exists
+        if not self.skill_md_path.exists():
+            self.report.add_error(
+                'structure',
+                'SKILL.md file not found',
+                str(self.skill_md_path)
+            )
+            return
+
+        # Check references directory exists
+        if not self.references_dir.exists():
+            self.report.add_warning(
+                'structure',
+                'references/ directory not found - skill may be incomplete',
+                str(self.references_dir)
+            )
+        elif not list(self.references_dir.glob('*.md')):
+            self.report.add_warning(
+                'structure',
+                'references/ directory is empty - no reference documentation found',
+                str(self.references_dir)
+            )
+
+    def _check_enhancement_quality(self):
+        """Check if SKILL.md was properly enhanced."""
+        if not self.skill_md_path.exists():
+            return
+
+        content = self.skill_md_path.read_text(encoding='utf-8')
+
+        # Check for template indicators (signs it wasn't enhanced)
+        template_indicators = [
+            "TODO:",
+            "[Add description]",
+            "[Framework specific tips]",
+            "coming soon",
+        ]
+
+        for indicator in template_indicators:
+            if indicator.lower() in content.lower():
+                self.report.add_warning(
+                    'enhancement',
+                    f'Found template placeholder: "{indicator}" - SKILL.md may not be enhanced',
+                    'SKILL.md'
+                )
+
+        # Check for good signs of enhancement
+        enhancement_indicators = {
+            'code_examples': re.compile(r'```[\w-]+\n', re.MULTILINE),
+            'real_examples': re.compile(r'Example:', re.IGNORECASE),
+            'sections': re.compile(r'^## .+', re.MULTILINE),
+        }
+
+        code_blocks = len(enhancement_indicators['code_examples'].findall(content))
+        real_examples = len(enhancement_indicators['real_examples'].findall(content))
+        sections = len(enhancement_indicators['sections'].findall(content))
+
+        # Quality thresholds
+        if code_blocks == 0:
+            self.report.add_warning(
+                'enhancement',
+                'No code examples found in SKILL.md - consider enhancing',
+                'SKILL.md'
+            )
+        elif code_blocks < 3:
+            self.report.add_info(
+                'enhancement',
+                f'Only {code_blocks} code examples found - more examples would improve quality',
+                'SKILL.md'
+            )
+        else:
+            self.report.add_info(
+                'enhancement',
+                f'✓ Found {code_blocks} code examples',
+                'SKILL.md'
+            )
+
+        if sections < 4:
+            self.report.add_warning(
+                'enhancement',
+                f'Only {sections} sections found - SKILL.md may be too basic',
+                'SKILL.md'
+            )
+        else:
+            self.report.add_info(
+                'enhancement',
+                f'✓ Found {sections} sections',
+                'SKILL.md'
+            )
+
+    def _check_content_quality(self):
+        """Check content quality."""
+        if not self.skill_md_path.exists():
+            return
+
+        content = self.skill_md_path.read_text(encoding='utf-8')
+
+        # Check YAML frontmatter
+        if not content.startswith('---'):
+            self.report.add_error(
+                'content',
+                'Missing YAML frontmatter - SKILL.md must start with ---',
+                'SKILL.md',
+                1
+            )
+        else:
+            # Extract frontmatter
+            try:
+                frontmatter_match = re.match(r'^---\n(.*?)\n---', content, re.DOTALL)
+                if frontmatter_match:
+                    frontmatter = frontmatter_match.group(1)
+
+                    # Check for required fields
+                    if 'name:' not in frontmatter:
+                        self.report.add_error(
+                            'content',
+                            'Missing "name:" field in YAML frontmatter',
+                            'SKILL.md',
+                            2
+                        )
+
+                    # Check for description
+                    if 'description:' in frontmatter:
+                        self.report.add_info(
+                            'content',
+                            '✓ YAML frontmatter includes description',
+                            'SKILL.md'
+                        )
+                else:
+                    self.report.add_error(
+                        'content',
+                        'Invalid YAML frontmatter format',
+                        'SKILL.md',
+                        1
+                    )
+            except Exception as e:
+                self.report.add_error(
+                    'content',
+                    f'Error parsing YAML frontmatter: {e}',
+                    'SKILL.md',
+                    1
+                )
+
+        # Check code block language tags
+        code_blocks_without_lang = re.findall(r'```\n[^`]', content)
+        if code_blocks_without_lang:
+            self.report.add_warning(
+                'content',
+                f'Found {len(code_blocks_without_lang)} code blocks without language tags',
+                'SKILL.md'
+            )
+
+        # Check for "When to Use" section
+        if 'when to use' not in content.lower():
+            self.report.add_warning(
+                'content',
+                'Missing "When to Use This Skill" section',
+                'SKILL.md'
+            )
+        else:
+            self.report.add_info(
+                'content',
+                '✓ Found "When to Use" section',
+                'SKILL.md'
+            )
+
+        # Check reference files
+        if self.references_dir.exists():
+            ref_files = list(self.references_dir.glob('*.md'))
+            if ref_files:
+                self.report.add_info(
+                    'content',
+                    f'✓ Found {len(ref_files)} reference files',
+                    'references/'
+                )
+
+                # Check if references are mentioned in SKILL.md
+                mentioned_refs = 0
+                for ref_file in ref_files:
+                    if ref_file.name in content:
+                        mentioned_refs += 1
+
+                if mentioned_refs == 0:
+                    self.report.add_warning(
+                        'content',
+                        'Reference files exist but none are mentioned in SKILL.md',
+                        'SKILL.md'
+                    )
+
+    def _check_links(self):
+        """Check internal markdown links."""
+        if not self.skill_md_path.exists():
+            return
+
+        content = self.skill_md_path.read_text(encoding='utf-8')
+
+        # Find all markdown links [text](path)
+        link_pattern = re.compile(r'\[([^\]]+)\]\(([^)]+)\)')
+        links = link_pattern.findall(content)
+
+        broken_links = []
+
+        for text, link in links:
+            # Skip external links (http/https)
+            if link.startswith('http://') or link.startswith('https://'):
+                continue
+
+            # Skip anchor links
+            if link.startswith('#'):
+                continue
+
+            # Check if file exists (relative to SKILL.md)
+            link_path = self.skill_dir / link
+            if not link_path.exists():
+                broken_links.append((text, link))
+
+        if broken_links:
+            for text, link in broken_links:
+                self.report.add_warning(
+                    'links',
+                    f'Broken link: [{text}]({link})',
+                    'SKILL.md'
+                )
+        else:
+            if links:
+                internal_links = [l for t, l in links if not l.startswith('http')]
+                if internal_links:
+                    self.report.add_info(
+                        'links',
+                        f'✓ All {len(internal_links)} internal links are valid',
+                        'SKILL.md'
+                    )
+
+
+def print_report(report: QualityReport, verbose: bool = False):
+    """Print quality report to console.
+
+    Args:
+        report: Quality report to print
+        verbose: Show all info messages
+    """
+    print("\n" + "=" * 60)
+    print(f"QUALITY REPORT: {report.skill_name}")
+    print("=" * 60)
+    print()
+
+    # Quality score
+    print(f"Quality Score: {report.quality_score:.1f}/100 (Grade: {report.quality_grade})")
+    print()
+
+    # Errors
+    if report.errors:
+        print(f"❌ ERRORS ({len(report.errors)}):")
+        for issue in report.errors:
+            location = f" ({issue.file}:{issue.line})" if issue.file and issue.line else f" ({issue.file})" if issue.file else ""
+            print(f"   [{issue.category}] {issue.message}{location}")
+        print()
+
+    # Warnings
+    if report.warnings:
+        print(f"⚠️  WARNINGS ({len(report.warnings)}):")
+        for issue in report.warnings:
+            location = f" ({issue.file}:{issue.line})" if issue.file and issue.line else f" ({issue.file})" if issue.file else ""
+            print(f"   [{issue.category}] {issue.message}{location}")
+        print()
+
+    # Info (only in verbose mode)
+    if verbose and report.info:
+        print(f"ℹ️  INFO ({len(report.info)}):")
+        for issue in report.info:
+            location = f" ({issue.file})" if issue.file else ""
+            print(f"   [{issue.category}] {issue.message}{location}")
+        print()
+
+    # Summary
+    if report.is_excellent:
+        print("✅ EXCELLENT! No issues found.")
+    elif not report.has_errors:
+        print("✓ GOOD! No errors, but some warnings to review.")
+    else:
+        print("❌ NEEDS IMPROVEMENT! Please fix errors before packaging.")
+
+    print()
+
+
+def main():
+    """Main entry point."""
+    import argparse
+
+    parser = argparse.ArgumentParser(
+        description="Check skill quality and generate report",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  # Basic quality check
+  python3 quality_checker.py output/react/
+
+  # Verbose mode (show all info)
+  python3 quality_checker.py output/godot/ --verbose
+
+  # Exit with error code if issues found
+  python3 quality_checker.py output/django/ --strict
+"""
+    )
+
+    parser.add_argument(
+        'skill_directory',
+        help='Path to skill directory (e.g., output/react/)'
+    )
+
+    parser.add_argument(
+        '--verbose', '-v',
+        action='store_true',
+        help='Show all info messages'
+    )
+
+    parser.add_argument(
+        '--strict',
+        action='store_true',
+        help='Exit with error code if any warnings or errors found'
+    )
+
+    args = parser.parse_args()
+
+    # Check if directory exists
+    skill_dir = Path(args.skill_directory)
+    if not skill_dir.exists():
+        print(f"❌ Directory not found: {skill_dir}")
+        sys.exit(1)
+
+    # Run quality checks
+    checker = SkillQualityChecker(skill_dir)
+    report = checker.check_all()
+
+    # Print report
+    print_report(report, verbose=args.verbose)
+
+    # Exit code
+    if args.strict and (report.has_errors or report.has_warnings):
+        sys.exit(1)
+    elif report.has_errors:
+        sys.exit(1)
+    else:
+        sys.exit(0)
+
+
+if __name__ == "__main__":
+    main()

+ 228 - 0
libs/external/Skill_Seekers-development/src/skill_seekers/cli/run_tests.py

@@ -0,0 +1,228 @@
+#!/usr/bin/env python3
+"""
+Test Runner for Skill Seeker
+Runs all test suites and generates a comprehensive test report
+"""
+
+import sys
+import unittest
+import os
+from io import StringIO
+from pathlib import Path
+
+
+class ColoredTextTestResult(unittest.TextTestResult):
+    """Custom test result class with colored output"""
+
+    # ANSI color codes
+    GREEN = '\033[92m'
+    RED = '\033[91m'
+    YELLOW = '\033[93m'
+    BLUE = '\033[94m'
+    RESET = '\033[0m'
+    BOLD = '\033[1m'
+
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.test_results = []
+
+    def addSuccess(self, test):
+        super().addSuccess(test)
+        self.test_results.append(('PASS', test))
+        if self.showAll:
+            self.stream.write(f"{self.GREEN}✓ PASS{self.RESET}\n")
+        elif self.dots:
+            self.stream.write(f"{self.GREEN}.{self.RESET}")
+            self.stream.flush()
+
+    def addError(self, test, err):
+        super().addError(test, err)
+        self.test_results.append(('ERROR', test))
+        if self.showAll:
+            self.stream.write(f"{self.RED}✗ ERROR{self.RESET}\n")
+        elif self.dots:
+            self.stream.write(f"{self.RED}E{self.RESET}")
+            self.stream.flush()
+
+    def addFailure(self, test, err):
+        super().addFailure(test, err)
+        self.test_results.append(('FAIL', test))
+        if self.showAll:
+            self.stream.write(f"{self.RED}✗ FAIL{self.RESET}\n")
+        elif self.dots:
+            self.stream.write(f"{self.RED}F{self.RESET}")
+            self.stream.flush()
+
+    def addSkip(self, test, reason):
+        super().addSkip(test, reason)
+        self.test_results.append(('SKIP', test))
+        if self.showAll:
+            self.stream.write(f"{self.YELLOW}⊘ SKIP{self.RESET}\n")
+        elif self.dots:
+            self.stream.write(f"{self.YELLOW}s{self.RESET}")
+            self.stream.flush()
+
+
+class ColoredTextTestRunner(unittest.TextTestRunner):
+    """Custom test runner with colored output"""
+    resultclass = ColoredTextTestResult
+
+
+def discover_tests(test_dir='tests'):
+    """Discover all test files in the tests directory"""
+    loader = unittest.TestLoader()
+    start_dir = test_dir
+    pattern = 'test_*.py'
+
+    suite = loader.discover(start_dir, pattern=pattern)
+    return suite
+
+
+def run_specific_suite(suite_name):
+    """Run a specific test suite"""
+    loader = unittest.TestLoader()
+
+    suite_map = {
+        'config': 'tests.test_config_validation',
+        'features': 'tests.test_scraper_features',
+        'integration': 'tests.test_integration'
+    }
+
+    if suite_name not in suite_map:
+        print(f"Unknown test suite: {suite_name}")
+        print(f"Available suites: {', '.join(suite_map.keys())}")
+        return None
+
+    module_name = suite_map[suite_name]
+    try:
+        suite = loader.loadTestsFromName(module_name)
+        return suite
+    except Exception as e:
+        print(f"Error loading test suite '{suite_name}': {e}")
+        return None
+
+
+def print_summary(result):
+    """Print a detailed test summary"""
+    total = result.testsRun
+    passed = total - len(result.failures) - len(result.errors) - len(result.skipped)
+    failed = len(result.failures)
+    errors = len(result.errors)
+    skipped = len(result.skipped)
+
+    print("\n" + "="*70)
+    print("TEST SUMMARY")
+    print("="*70)
+
+    # Overall stats
+    print(f"\n{ColoredTextTestResult.BOLD}Total Tests:{ColoredTextTestResult.RESET} {total}")
+    print(f"{ColoredTextTestResult.GREEN}✓ Passed:{ColoredTextTestResult.RESET} {passed}")
+    if failed > 0:
+        print(f"{ColoredTextTestResult.RED}✗ Failed:{ColoredTextTestResult.RESET} {failed}")
+    if errors > 0:
+        print(f"{ColoredTextTestResult.RED}✗ Errors:{ColoredTextTestResult.RESET} {errors}")
+    if skipped > 0:
+        print(f"{ColoredTextTestResult.YELLOW}⊘ Skipped:{ColoredTextTestResult.RESET} {skipped}")
+
+    # Success rate
+    if total > 0:
+        success_rate = (passed / total) * 100
+        color = ColoredTextTestResult.GREEN if success_rate == 100 else \
+                ColoredTextTestResult.YELLOW if success_rate >= 80 else \
+                ColoredTextTestResult.RED
+        print(f"\n{color}Success Rate: {success_rate:.1f}%{ColoredTextTestResult.RESET}")
+
+    # Category breakdown
+    if hasattr(result, 'test_results'):
+        print(f"\n{ColoredTextTestResult.BOLD}Test Breakdown by Category:{ColoredTextTestResult.RESET}")
+
+        categories = {}
+        for status, test in result.test_results:
+            test_name = str(test)
+            # Extract test class name
+            if '.' in test_name:
+                class_name = test_name.split('.')[0].split()[-1]
+                if class_name not in categories:
+                    categories[class_name] = {'PASS': 0, 'FAIL': 0, 'ERROR': 0, 'SKIP': 0}
+                categories[class_name][status] += 1
+
+        for category, stats in sorted(categories.items()):
+            total_cat = sum(stats.values())
+            passed_cat = stats['PASS']
+            print(f"  {category}: {passed_cat}/{total_cat} passed")
+
+    print("\n" + "="*70)
+
+    # Return status
+    return failed == 0 and errors == 0
+
+
+def main():
+    """Main test runner"""
+    import argparse
+
+    parser = argparse.ArgumentParser(
+        description='Run tests for Skill Seeker',
+        formatter_class=argparse.RawDescriptionHelpFormatter
+    )
+
+    parser.add_argument('--suite', '-s', type=str,
+                       help='Run specific test suite (config, features, integration)')
+    parser.add_argument('--verbose', '-v', action='store_true',
+                       help='Verbose output (show each test)')
+    parser.add_argument('--quiet', '-q', action='store_true',
+                       help='Quiet output (minimal output)')
+    parser.add_argument('--failfast', '-f', action='store_true',
+                       help='Stop on first failure')
+    parser.add_argument('--list', '-l', action='store_true',
+                       help='List all available tests')
+
+    args = parser.parse_args()
+
+    # Set verbosity
+    verbosity = 1
+    if args.verbose:
+        verbosity = 2
+    elif args.quiet:
+        verbosity = 0
+
+    print(f"\n{ColoredTextTestResult.BOLD}{'='*70}{ColoredTextTestResult.RESET}")
+    print(f"{ColoredTextTestResult.BOLD}SKILL SEEKER TEST SUITE{ColoredTextTestResult.RESET}")
+    print(f"{ColoredTextTestResult.BOLD}{'='*70}{ColoredTextTestResult.RESET}\n")
+
+    # Discover or load specific suite
+    if args.suite:
+        print(f"Running test suite: {ColoredTextTestResult.BLUE}{args.suite}{ColoredTextTestResult.RESET}\n")
+        suite = run_specific_suite(args.suite)
+        if suite is None:
+            return 1
+    else:
+        print(f"Running {ColoredTextTestResult.BLUE}all tests{ColoredTextTestResult.RESET}\n")
+        suite = discover_tests()
+
+    # List tests
+    if args.list:
+        print("\nAvailable tests:\n")
+        for test_group in suite:
+            for test in test_group:
+                print(f"  - {test}")
+        print()
+        return 0
+
+    # Run tests
+    runner = ColoredTextTestRunner(
+        verbosity=verbosity,
+        failfast=args.failfast
+    )
+
+    result = runner.run(suite)
+
+    # Print summary
+    success = print_summary(result)
+
+    # Return appropriate exit code
+    return 0 if success else 1
+
+
+if __name__ == '__main__':
+    sys.exit(main())

+ 320 - 0
libs/external/Skill_Seekers-development/src/skill_seekers/cli/split_config.py

@@ -0,0 +1,320 @@
+#!/usr/bin/env python3
+"""
+Config Splitter for Large Documentation Sites
+
+Splits large documentation configs into multiple smaller, focused skill configs.
+Supports multiple splitting strategies: category-based, size-based, and automatic.
+"""
+
+import json
+import sys
+import argparse
+from pathlib import Path
+from typing import Dict, List, Any, Tuple
+from collections import defaultdict
+
+
+class ConfigSplitter:
+    """Splits large documentation configs into multiple focused configs"""
+
+    def __init__(self, config_path: str, strategy: str = "auto", target_pages: int = 5000):
+        self.config_path = Path(config_path)
+        self.strategy = strategy
+        self.target_pages = target_pages
+        self.config = self.load_config()
+        self.base_name = self.config['name']
+
+    def load_config(self) -> Dict[str, Any]:
+        """Load configuration from file"""
+        try:
+            with open(self.config_path, 'r') as f:
+                return json.load(f)
+        except FileNotFoundError:
+            print(f"❌ Error: Config file not found: {self.config_path}")
+            sys.exit(1)
+        except json.JSONDecodeError as e:
+            print(f"❌ Error: Invalid JSON in config file: {e}")
+            sys.exit(1)
+
+    def get_split_strategy(self) -> str:
+        """Determine split strategy"""
+        # Check if strategy is defined in config
+        if 'split_strategy' in self.config:
+            config_strategy = self.config['split_strategy']
+            if config_strategy != "none":
+                return config_strategy
+
+        # Use provided strategy or auto-detect
+        if self.strategy == "auto":
+            max_pages = self.config.get('max_pages', 500)
+
+            if max_pages < 5000:
+                print(f"ℹ️  Small documentation ({max_pages} pages) - no splitting needed")
+                return "none"
+            elif max_pages < 10000 and 'categories' in self.config:
+                print(f"ℹ️  Medium documentation ({max_pages} pages) - category split recommended")
+                return "category"
+            elif 'categories' in self.config and len(self.config['categories']) >= 3:
+                print(f"ℹ️  Large documentation ({max_pages} pages) - router + categories recommended")
+                return "router"
+            else:
+                print(f"ℹ️  Large documentation ({max_pages} pages) - size-based split")
+                return "size"
+
+        return self.strategy
+
+    def split_by_category(self, create_router: bool = False) -> List[Dict[str, Any]]:
+        """Split config by categories"""
+        if 'categories' not in self.config:
+            print("❌ Error: No categories defined in config")
+            sys.exit(1)
+
+        categories = self.config['categories']
+        split_categories = self.config.get('split_config', {}).get('split_by_categories')
+
+        # If specific categories specified, use only those
+        if split_categories:
+            categories = {k: v for k, v in categories.items() if k in split_categories}
+
+        configs = []
+
+        for category_name, keywords in categories.items():
+            # Create new config for this category
+            new_config = self.config.copy()
+            new_config['name'] = f"{self.base_name}-{category_name}"
+            new_config['description'] = f"{self.base_name.capitalize()} - {category_name.replace('_', ' ').title()}. {self.config.get('description', '')}"
+
+            # Update URL patterns to focus on this category
+            url_patterns = new_config.get('url_patterns', {})
+
+            # Add category keywords to includes
+            includes = url_patterns.get('include', [])
+            for keyword in keywords:
+                if keyword.startswith('/'):
+                    includes.append(keyword)
+
+            if includes:
+                url_patterns['include'] = list(set(includes))
+                new_config['url_patterns'] = url_patterns
+
+            # Keep only this category
+            new_config['categories'] = {category_name: keywords}
+
+            # Remove split config from child
+            if 'split_strategy' in new_config:
+                del new_config['split_strategy']
+            if 'split_config' in new_config:
+                del new_config['split_config']
+
+            # Adjust max_pages estimate
+            if 'max_pages' in new_config:
+                new_config['max_pages'] = self.target_pages
+
+            configs.append(new_config)
+
+        print(f"✅ Created {len(configs)} category-based configs")
+
+        # Optionally create router config
+        if create_router:
+            router_config = self.create_router_config(configs)
+            configs.insert(0, router_config)
+            print(f"✅ Created router config: {router_config['name']}")
+
+        return configs
+
+    def split_by_size(self) -> List[Dict[str, Any]]:
+        """Split config by size (page count)"""
+        max_pages = self.config.get('max_pages', 500)
+        num_splits = (max_pages + self.target_pages - 1) // self.target_pages
+
+        configs = []
+
+        for i in range(num_splits):
+            new_config = self.config.copy()
+            part_num = i + 1
+            new_config['name'] = f"{self.base_name}-part{part_num}"
+            new_config['description'] = f"{self.base_name.capitalize()} - Part {part_num}. {self.config.get('description', '')}"
+            new_config['max_pages'] = self.target_pages
+
+            # Remove split config from child
+            if 'split_strategy' in new_config:
+                del new_config['split_strategy']
+            if 'split_config' in new_config:
+                del new_config['split_config']
+
+            configs.append(new_config)
+
+        print(f"✅ Created {len(configs)} size-based configs ({self.target_pages} pages each)")
+        return configs
+
+    def create_router_config(self, sub_configs: List[Dict[str, Any]]) -> Dict[str, Any]:
+        """Create a router config that references sub-skills"""
+        router_name = self.config.get('split_config', {}).get('router_name', self.base_name)
+
+        router_config = {
+            "name": router_name,
+            "description": self.config.get('description', ''),
+            "base_url": self.config['base_url'],
+            "selectors": self.config['selectors'],
+            "url_patterns": self.config.get('url_patterns', {}),
+            "rate_limit": self.config.get('rate_limit', 0.5),
+            "max_pages": 500,  # Router only needs overview pages
+            "_router": True,
+            "_sub_skills": [cfg['name'] for cfg in sub_configs],
+            "_routing_keywords": {
+                cfg['name']: list(cfg.get('categories', {}).keys())
+                for cfg in sub_configs
+            }
+        }
+
+        return router_config
+
+    def split(self) -> List[Dict[str, Any]]:
+        """Execute split based on strategy"""
+        strategy = self.get_split_strategy()
+
+        print(f"\n{'='*60}")
+        print(f"CONFIG SPLITTER: {self.base_name}")
+        print(f"{'='*60}")
+        print(f"Strategy: {strategy}")
+        print(f"Target pages per skill: {self.target_pages}")
+        print("")
+
+        if strategy == "none":
+            print("ℹ️  No splitting required")
+            return [self.config]
+
+        elif strategy == "category":
+            return self.split_by_category(create_router=False)
+
+        elif strategy == "router":
+            create_router = self.config.get('split_config', {}).get('create_router', True)
+            return self.split_by_category(create_router=create_router)
+
+        elif strategy == "size":
+            return self.split_by_size()
+
+        else:
+            print(f"❌ Error: Unknown strategy: {strategy}")
+            sys.exit(1)
+
+    def save_configs(self, configs: List[Dict[str, Any]], output_dir: Path = None) -> List[Path]:
+        """Save configs to files"""
+        if output_dir is None:
+            output_dir = self.config_path.parent
+
+        output_dir = Path(output_dir)
+        output_dir.mkdir(parents=True, exist_ok=True)
+
+        saved_files = []
+
+        for config in configs:
+            filename = f"{config['name']}.json"
+            filepath = output_dir / filename
+
+            with open(filepath, 'w') as f:
+                json.dump(config, f, indent=2)
+
+            saved_files.append(filepath)
+            print(f"  💾 Saved: {filepath}")
+
+        return saved_files
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Split large documentation configs into multiple focused skills",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  # Auto-detect strategy
+  python3 split_config.py configs/godot.json
+
+  # Use category-based split
+  python3 split_config.py configs/godot.json --strategy category
+
+  # Use router + categories
+  python3 split_config.py configs/godot.json --strategy router
+
+  # Custom target size
+  python3 split_config.py configs/godot.json --target-pages 3000
+
+  # Dry run (don't save files)
+  python3 split_config.py configs/godot.json --dry-run
+
+Split Strategies:
+  none     - No splitting (single skill)
+  auto     - Automatically choose best strategy
+  category - Split by categories defined in config
+  router   - Create router + category-based sub-skills
+  size     - Split by page count
+        """
+    )
+
+    parser.add_argument(
+        'config',
+        help='Path to config file (e.g., configs/godot.json)'
+    )
+
+    parser.add_argument(
+        '--strategy',
+        choices=['auto', 'none', 'category', 'router', 'size'],
+        default='auto',
+        help='Splitting strategy (default: auto)'
+    )
+
+    parser.add_argument(
+        '--target-pages',
+        type=int,
+        default=5000,
+        help='Target pages per skill (default: 5000)'
+    )
+
+    parser.add_argument(
+        '--output-dir',
+        help='Output directory for configs (default: same as input)'
+    )
+
+    parser.add_argument(
+        '--dry-run',
+        action='store_true',
+        help='Show what would be created without saving files'
+    )
+
+    args = parser.parse_args()
+
+    # Create splitter
+    splitter = ConfigSplitter(args.config, args.strategy, args.target_pages)
+
+    # Split config
+    configs = splitter.split()
+
+    if args.dry_run:
+        print(f"\n{'='*60}")
+        print("DRY RUN - No files saved")
+        print(f"{'='*60}")
+        print(f"Would create {len(configs)} config files:")
+        for cfg in configs:
+            is_router = cfg.get('_router', False)
+            router_marker = " (ROUTER)" if is_router else ""
+            print(f"  📄 {cfg['name']}.json{router_marker}")
+    else:
+        print(f"\n{'='*60}")
+        print("SAVING CONFIGS")
+        print(f"{'='*60}")
+        saved_files = splitter.save_configs(configs, args.output_dir)
+
+        print(f"\n{'='*60}")
+        print("NEXT STEPS")
+        print(f"{'='*60}")
+        print("1. Review generated configs")
+        print("2. Scrape each config:")
+        for filepath in saved_files:
+            print(f"     skill-seekers scrape --config {filepath}")
+        print("3. Package skills:")
+        print("     skill-seekers-package-multi configs/<name>-*.json")
+        print("")
+
+
+if __name__ == "__main__":
+    main()

+ 192 - 0
libs/external/Skill_Seekers-development/src/skill_seekers/cli/test_unified_simple.py

@@ -0,0 +1,192 @@
+#!/usr/bin/env python3
+"""
+Simple Integration Tests for Unified Multi-Source Scraper
+
+Focuses on real-world usage patterns rather than unit tests.
+"""
+
+import os
+import sys
+import json
+import tempfile
+from pathlib import Path
+
+# Add CLI to path
+sys.path.insert(0, str(Path(__file__).parent))
+
+from .config_validator import validate_config
+
+def test_validate_existing_unified_configs():
+    """Test that all existing unified configs are valid"""
+    configs_dir = Path(__file__).parent.parent / 'configs'
+
+    unified_configs = [
+        'godot_unified.json',
+        'react_unified.json',
+        'django_unified.json',
+        'fastapi_unified.json'
+    ]
+
+    for config_name in unified_configs:
+        config_path = configs_dir / config_name
+        if config_path.exists():
+            print(f"\n✓ Validating {config_name}...")
+            validator = validate_config(str(config_path))
+            assert validator.is_unified, f"{config_name} should be unified format"
+            assert validator.needs_api_merge(), f"{config_name} should need API merging"
+            print(f"  Sources: {len(validator.config['sources'])}")
+            print(f"  Merge mode: {validator.config.get('merge_mode')}")
+
+
+def test_backward_compatibility():
+    """Test that legacy configs still work"""
+    configs_dir = Path(__file__).parent.parent / 'configs'
+
+    legacy_configs = [
+        'react.json',
+        'godot.json',
+        'django.json'
+    ]
+
+    for config_name in legacy_configs:
+        config_path = configs_dir / config_name
+        if config_path.exists():
+            print(f"\n✓ Validating legacy {config_name}...")
+            validator = validate_config(str(config_path))
+            assert not validator.is_unified, f"{config_name} should be legacy format"
+            print(f"  Format: Legacy")
+
+
+def test_create_temp_unified_config():
+    """Test creating a unified config from scratch"""
+    config = {
+        "name": "test_unified",
+        "description": "Test unified config",
+        "merge_mode": "rule-based",
+        "sources": [
+            {
+                "type": "documentation",
+                "base_url": "https://example.com/docs",
+                "extract_api": True,
+                "max_pages": 50
+            },
+            {
+                "type": "github",
+                "repo": "test/repo",
+                "include_code": True,
+                "code_analysis_depth": "surface"
+            }
+        ]
+    }
+
+    with tempfile.NamedTemporaryFile(mode='w', suffix='.json', delete=False) as f:
+        json.dump(config, f)
+        config_path = f.name
+
+    try:
+        print("\n✓ Validating temp unified config...")
+        validator = validate_config(config_path)
+        assert validator.is_unified
+        assert validator.needs_api_merge()
+        assert len(validator.config['sources']) == 2
+        print("  ✓ Config is valid unified format")
+        print(f"  Sources: {len(validator.config['sources'])}")
+    finally:
+        os.unlink(config_path)
+
+
+def test_mixed_source_types():
+    """Test config with documentation, GitHub, and PDF sources"""
+    config = {
+        "name": "test_mixed",
+        "description": "Test mixed sources",
+        "merge_mode": "rule-based",
+        "sources": [
+            {
+                "type": "documentation",
+                "base_url": "https://example.com"
+            },
+            {
+                "type": "github",
+                "repo": "test/repo"
+            },
+            {
+                "type": "pdf",
+                "path": "/path/to/manual.pdf"
+            }
+        ]
+    }
+
+    with tempfile.NamedTemporaryFile(mode='w', suffix='.json', delete=False) as f:
+        json.dump(config, f)
+        config_path = f.name
+
+    try:
+        print("\n✓ Validating mixed source types...")
+        validator = validate_config(config_path)
+        assert validator.is_unified
+        assert len(validator.config['sources']) == 3
+
+        # Check each source type
+        source_types = [s['type'] for s in validator.config['sources']]
+        assert 'documentation' in source_types
+        assert 'github' in source_types
+        assert 'pdf' in source_types
+        print("  ✓ All 3 source types validated")
+    finally:
+        os.unlink(config_path)
+
+
+def test_config_validation_errors():
+    """Test that invalid configs are rejected"""
+    # Invalid source type
+    config = {
+        "name": "test",
+        "description": "Test",
+        "sources": [
+            {"type": "invalid_type", "url": "https://example.com"}
+        ]
+    }
+
+    with tempfile.NamedTemporaryFile(mode='w', suffix='.json', delete=False) as f:
+        json.dump(config, f)
+        config_path = f.name
+
+    try:
+        print("\n✓ Testing invalid source type...")
+        try:
+            # validate_config() calls .validate() automatically
+            validator = validate_config(config_path)
+            assert False, "Should have raised error for invalid source type"
+        except ValueError as e:
+            assert "Invalid" in str(e) or "invalid" in str(e)
+            print("  ✓ Invalid source type correctly rejected")
+    finally:
+        os.unlink(config_path)
+
+
+# Run tests
+if __name__ == '__main__':
+    print("=" * 60)
+    print("Running Unified Scraper Integration Tests")
+    print("=" * 60)
+
+    try:
+        test_validate_existing_unified_configs()
+        test_backward_compatibility()
+        test_create_temp_unified_config()
+        test_mixed_source_types()
+        test_config_validation_errors()
+
+        print("\n" + "=" * 60)
+        print("✅ All integration tests passed!")
+        print("=" * 60)
+
+    except AssertionError as e:
+        print(f"\n❌ Test failed: {e}")
+        sys.exit(1)
+    except Exception as e:
+        print(f"\n❌ Unexpected error: {e}")
+        import traceback
+        traceback.print_exc()
+        sys.exit(1)

+ 450 - 0
libs/external/Skill_Seekers-development/src/skill_seekers/cli/unified_scraper.py

@@ -0,0 +1,450 @@
+#!/usr/bin/env python3
+"""
+Unified Multi-Source Scraper
+
+Orchestrates scraping from multiple sources (documentation, GitHub, PDF),
+detects conflicts, merges intelligently, and builds unified skills.
+
+This is the main entry point for unified config workflow.
+
+Usage:
+    skill-seekers unified --config configs/godot_unified.json
+    skill-seekers unified --config configs/react_unified.json --merge-mode claude-enhanced
+"""
+
+import os
+import sys
+import json
+import logging
+import argparse
+import subprocess
+from pathlib import Path
+from typing import Dict, List, Any, Optional
+
+# Import validators and scrapers
+try:
+    from config_validator import ConfigValidator, validate_config
+    from conflict_detector import ConflictDetector
+    from merge_sources import RuleBasedMerger, ClaudeEnhancedMerger
+    from unified_skill_builder import UnifiedSkillBuilder
+except ImportError as e:
+    print(f"Error importing modules: {e}")
+    print("Make sure you're running from the project root directory")
+    sys.exit(1)
+
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(levelname)s - %(message)s'
+)
+logger = logging.getLogger(__name__)
+
+
+class UnifiedScraper:
+    """
+    Orchestrates multi-source scraping and merging.
+
+    Main workflow:
+    1. Load and validate unified config
+    2. Scrape all sources (docs, GitHub, PDF)
+    3. Detect conflicts between sources
+    4. Merge intelligently (rule-based or Claude-enhanced)
+    5. Build unified skill
+    """
+
+    def __init__(self, config_path: str, merge_mode: Optional[str] = None):
+        """
+        Initialize unified scraper.
+
+        Args:
+            config_path: Path to unified config JSON
+            merge_mode: Override config merge_mode ('rule-based' or 'claude-enhanced')
+        """
+        self.config_path = config_path
+
+        # Validate and load config
+        logger.info(f"Loading config: {config_path}")
+        self.validator = validate_config(config_path)
+        self.config = self.validator.config
+
+        # Determine merge mode
+        self.merge_mode = merge_mode or self.config.get('merge_mode', 'rule-based')
+        logger.info(f"Merge mode: {self.merge_mode}")
+
+        # Storage for scraped data
+        self.scraped_data = {}
+
+        # Output paths
+        self.name = self.config['name']
+        self.output_dir = f"output/{self.name}"
+        self.data_dir = f"output/{self.name}_unified_data"
+
+        os.makedirs(self.output_dir, exist_ok=True)
+        os.makedirs(self.data_dir, exist_ok=True)
+
+    def scrape_all_sources(self):
+        """
+        Scrape all configured sources.
+
+        Routes to appropriate scraper based on source type.
+        """
+        logger.info("=" * 60)
+        logger.info("PHASE 1: Scraping all sources")
+        logger.info("=" * 60)
+
+        if not self.validator.is_unified:
+            logger.warning("Config is not unified format, converting...")
+            self.config = self.validator.convert_legacy_to_unified()
+
+        sources = self.config.get('sources', [])
+
+        for i, source in enumerate(sources):
+            source_type = source['type']
+            logger.info(f"\n[{i+1}/{len(sources)}] Scraping {source_type} source...")
+
+            try:
+                if source_type == 'documentation':
+                    self._scrape_documentation(source)
+                elif source_type == 'github':
+                    self._scrape_github(source)
+                elif source_type == 'pdf':
+                    self._scrape_pdf(source)
+                else:
+                    logger.warning(f"Unknown source type: {source_type}")
+            except Exception as e:
+                logger.error(f"Error scraping {source_type}: {e}")
+                logger.info("Continuing with other sources...")
+
+        logger.info(f"\n✅ Scraped {len(self.scraped_data)} sources successfully")
+
+    def _scrape_documentation(self, source: Dict[str, Any]):
+        """Scrape documentation website."""
+        # Create temporary config for doc scraper
+        doc_config = {
+            'name': f"{self.name}_docs",
+            'base_url': source['base_url'],
+            'selectors': source.get('selectors', {}),
+            'url_patterns': source.get('url_patterns', {}),
+            'categories': source.get('categories', {}),
+            'rate_limit': source.get('rate_limit', 0.5),
+            'max_pages': source.get('max_pages', 100)
+        }
+
+        # Write temporary config
+        temp_config_path = os.path.join(self.data_dir, 'temp_docs_config.json')
+        with open(temp_config_path, 'w') as f:
+            json.dump(doc_config, f, indent=2)
+
+        # Run doc_scraper as subprocess
+        logger.info(f"Scraping documentation from {source['base_url']}")
+
+        doc_scraper_path = Path(__file__).parent / "doc_scraper.py"
+        cmd = [sys.executable, str(doc_scraper_path), '--config', temp_config_path]
+
+        result = subprocess.run(cmd, capture_output=True, text=True)
+
+        if result.returncode != 0:
+            logger.error(f"Documentation scraping failed: {result.stderr}")
+            return
+
+        # Load scraped data
+        docs_data_file = f"output/{doc_config['name']}_data/summary.json"
+
+        if os.path.exists(docs_data_file):
+            with open(docs_data_file, 'r') as f:
+                summary = json.load(f)
+
+            self.scraped_data['documentation'] = {
+                'pages': summary.get('pages', []),
+                'data_file': docs_data_file
+            }
+
+            logger.info(f"✅ Documentation: {summary.get('total_pages', 0)} pages scraped")
+        else:
+            logger.warning("Documentation data file not found")
+
+        # Clean up temp config
+        if os.path.exists(temp_config_path):
+            os.remove(temp_config_path)
+
+    def _scrape_github(self, source: Dict[str, Any]):
+        """Scrape GitHub repository."""
+        sys.path.insert(0, str(Path(__file__).parent))
+
+        try:
+            from github_scraper import GitHubScraper
+        except ImportError:
+            logger.error("github_scraper.py not found")
+            return
+
+        # Create config for GitHub scraper
+        github_config = {
+            'repo': source['repo'],
+            'name': f"{self.name}_github",
+            'github_token': source.get('github_token'),
+            'include_issues': source.get('include_issues', True),
+            'max_issues': source.get('max_issues', 100),
+            'include_changelog': source.get('include_changelog', True),
+            'include_releases': source.get('include_releases', True),
+            'include_code': source.get('include_code', True),
+            'code_analysis_depth': source.get('code_analysis_depth', 'surface'),
+            'file_patterns': source.get('file_patterns', []),
+            'local_repo_path': source.get('local_repo_path')  # Pass local_repo_path from config
+        }
+
+        # Scrape
+        logger.info(f"Scraping GitHub repository: {source['repo']}")
+        scraper = GitHubScraper(github_config)
+        github_data = scraper.scrape()
+
+        # Save data
+        github_data_file = os.path.join(self.data_dir, 'github_data.json')
+        with open(github_data_file, 'w') as f:
+            json.dump(github_data, f, indent=2, ensure_ascii=False)
+
+        self.scraped_data['github'] = {
+            'data': github_data,
+            'data_file': github_data_file
+        }
+
+        logger.info(f"✅ GitHub: Repository scraped successfully")
+
+    def _scrape_pdf(self, source: Dict[str, Any]):
+        """Scrape PDF document."""
+        sys.path.insert(0, str(Path(__file__).parent))
+
+        try:
+            from pdf_scraper import PDFToSkillConverter
+        except ImportError:
+            logger.error("pdf_scraper.py not found")
+            return
+
+        # Create config for PDF scraper
+        pdf_config = {
+            'name': f"{self.name}_pdf",
+            'pdf': source['path'],
+            'extract_tables': source.get('extract_tables', False),
+            'ocr': source.get('ocr', False),
+            'password': source.get('password')
+        }
+
+        # Scrape
+        logger.info(f"Scraping PDF: {source['path']}")
+        converter = PDFToSkillConverter(pdf_config)
+        pdf_data = converter.extract_all()
+
+        # Save data
+        pdf_data_file = os.path.join(self.data_dir, 'pdf_data.json')
+        with open(pdf_data_file, 'w') as f:
+            json.dump(pdf_data, f, indent=2, ensure_ascii=False)
+
+        self.scraped_data['pdf'] = {
+            'data': pdf_data,
+            'data_file': pdf_data_file
+        }
+
+        logger.info(f"✅ PDF: {len(pdf_data.get('pages', []))} pages extracted")
+
+    def detect_conflicts(self) -> List:
+        """
+        Detect conflicts between documentation and code.
+
+        Only applicable if both documentation and GitHub sources exist.
+
+        Returns:
+            List of conflicts
+        """
+        logger.info("\n" + "=" * 60)
+        logger.info("PHASE 2: Detecting conflicts")
+        logger.info("=" * 60)
+
+        if not self.validator.needs_api_merge():
+            logger.info("No API merge needed (only one API source)")
+            return []
+
+        # Get documentation and GitHub data
+        docs_data = self.scraped_data.get('documentation', {})
+        github_data = self.scraped_data.get('github', {})
+
+        if not docs_data or not github_data:
+            logger.warning("Missing documentation or GitHub data for conflict detection")
+            return []
+
+        # Load data files
+        with open(docs_data['data_file'], 'r') as f:
+            docs_json = json.load(f)
+
+        with open(github_data['data_file'], 'r') as f:
+            github_json = json.load(f)
+
+        # Detect conflicts
+        detector = ConflictDetector(docs_json, github_json)
+        conflicts = detector.detect_all_conflicts()
+
+        # Save conflicts
+        conflicts_file = os.path.join(self.data_dir, 'conflicts.json')
+        detector.save_conflicts(conflicts, conflicts_file)
+
+        # Print summary
+        summary = detector.generate_summary(conflicts)
+        logger.info(f"\n📊 Conflict Summary:")
+        logger.info(f"   Total: {summary['total']}")
+        logger.info(f"   By Type:")
+        for ctype, count in summary['by_type'].items():
+            if count > 0:
+                logger.info(f"     - {ctype}: {count}")
+        logger.info(f"   By Severity:")
+        for severity, count in summary['by_severity'].items():
+            if count > 0:
+                emoji = '🔴' if severity == 'high' else '🟡' if severity == 'medium' else '🟢'
+                logger.info(f"     {emoji} {severity}: {count}")
+
+        return conflicts
+
+    def merge_sources(self, conflicts: List):
+        """
+        Merge data from multiple sources.
+
+        Args:
+            conflicts: List of detected conflicts
+        """
+        logger.info("\n" + "=" * 60)
+        logger.info(f"PHASE 3: Merging sources ({self.merge_mode})")
+        logger.info("=" * 60)
+
+        if not conflicts:
+            logger.info("No conflicts to merge")
+            return None
+
+        # Get data files
+        docs_data = self.scraped_data.get('documentation', {})
+        github_data = self.scraped_data.get('github', {})
+
+        # Load data
+        with open(docs_data['data_file'], 'r') as f:
+            docs_json = json.load(f)
+
+        with open(github_data['data_file'], 'r') as f:
+            github_json = json.load(f)
+
+        # Choose merger
+        if self.merge_mode == 'claude-enhanced':
+            merger = ClaudeEnhancedMerger(docs_json, github_json, conflicts)
+        else:
+            merger = RuleBasedMerger(docs_json, github_json, conflicts)
+
+        # Merge
+        merged_data = merger.merge_all()
+
+        # Save merged data
+        merged_file = os.path.join(self.data_dir, 'merged_data.json')
+        with open(merged_file, 'w') as f:
+            json.dump(merged_data, f, indent=2, ensure_ascii=False)
+
+        logger.info(f"✅ Merged data saved: {merged_file}")
+
+        return merged_data
+
+    def build_skill(self, merged_data: Optional[Dict] = None):
+        """
+        Build final unified skill.
+
+        Args:
+            merged_data: Merged API data (if conflicts were resolved)
+        """
+        logger.info("\n" + "=" * 60)
+        logger.info("PHASE 4: Building unified skill")
+        logger.info("=" * 60)
+
+        # Load conflicts if they exist
+        conflicts = []
+        conflicts_file = os.path.join(self.data_dir, 'conflicts.json')
+        if os.path.exists(conflicts_file):
+            with open(conflicts_file, 'r') as f:
+                conflicts_data = json.load(f)
+                conflicts = conflicts_data.get('conflicts', [])
+
+        # Build skill
+        builder = UnifiedSkillBuilder(
+            self.config,
+            self.scraped_data,
+            merged_data,
+            conflicts
+        )
+
+        builder.build()
+
+        logger.info(f"✅ Unified skill built: {self.output_dir}/")
+
+    def run(self):
+        """
+        Execute complete unified scraping workflow.
+        """
+        logger.info("\n" + "🚀 " * 20)
+        logger.info(f"Unified Scraper: {self.config['name']}")
+        logger.info("🚀 " * 20 + "\n")
+
+        try:
+            # Phase 1: Scrape all sources
+            self.scrape_all_sources()
+
+            # Phase 2: Detect conflicts (if applicable)
+            conflicts = self.detect_conflicts()
+
+            # Phase 3: Merge sources (if conflicts exist)
+            merged_data = None
+            if conflicts:
+                merged_data = self.merge_sources(conflicts)
+
+            # Phase 4: Build skill
+            self.build_skill(merged_data)
+
+            logger.info("\n" + "✅ " * 20)
+            logger.info("Unified scraping complete!")
+            logger.info("✅ " * 20 + "\n")
+
+            logger.info(f"📁 Output: {self.output_dir}/")
+            logger.info(f"📁 Data: {self.data_dir}/")
+
+        except KeyboardInterrupt:
+            logger.info("\n\n⚠️  Scraping interrupted by user")
+            sys.exit(1)
+        except Exception as e:
+            logger.error(f"\n\n❌ Error during scraping: {e}")
+            import traceback
+            traceback.print_exc()
+            sys.exit(1)
+
+
+def main():
+    """Main entry point."""
+    parser = argparse.ArgumentParser(
+        description='Unified multi-source scraper',
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  # Basic usage with unified config
+  skill-seekers unified --config configs/godot_unified.json
+
+  # Override merge mode
+  skill-seekers unified --config configs/react_unified.json --merge-mode claude-enhanced
+
+  # Backward compatible with legacy configs
+  skill-seekers unified --config configs/react.json
+        """
+    )
+
+    parser.add_argument('--config', '-c', required=True,
+                       help='Path to unified config JSON file')
+    parser.add_argument('--merge-mode', '-m',
+                       choices=['rule-based', 'claude-enhanced'],
+                       help='Override config merge mode')
+
+    args = parser.parse_args()
+
+    # Create and run scraper
+    scraper = UnifiedScraper(args.config, args.merge_mode)
+    scraper.run()
+
+
+if __name__ == '__main__':
+    main()

+ 444 - 0
libs/external/Skill_Seekers-development/src/skill_seekers/cli/unified_skill_builder.py

@@ -0,0 +1,444 @@
+#!/usr/bin/env python3
+"""
+Unified Skill Builder
+
+Generates final skill structure from merged multi-source data:
+- SKILL.md with merged APIs and conflict warnings
+- references/ with organized content by source
+- Inline conflict markers (⚠️)
+- Separate conflicts summary section
+
+Supports mixed sources (documentation, GitHub, PDF) and highlights
+discrepancies transparently.
+"""
+
+import os
+import json
+import logging
+from pathlib import Path
+from typing import Dict, List, Any, Optional
+
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+
+class UnifiedSkillBuilder:
+    """
+    Builds unified skill from multi-source data.
+    """
+
+    def __init__(self, config: Dict, scraped_data: Dict,
+                 merged_data: Optional[Dict] = None, conflicts: Optional[List] = None):
+        """
+        Initialize skill builder.
+
+        Args:
+            config: Unified config dict
+            scraped_data: Dict of scraped data by source type
+            merged_data: Merged API data (if conflicts were resolved)
+            conflicts: List of detected conflicts
+        """
+        self.config = config
+        self.scraped_data = scraped_data
+        self.merged_data = merged_data
+        self.conflicts = conflicts or []
+
+        self.name = config['name']
+        self.description = config['description']
+        self.skill_dir = f"output/{self.name}"
+
+        # Create directories
+        os.makedirs(self.skill_dir, exist_ok=True)
+        os.makedirs(f"{self.skill_dir}/references", exist_ok=True)
+        os.makedirs(f"{self.skill_dir}/scripts", exist_ok=True)
+        os.makedirs(f"{self.skill_dir}/assets", exist_ok=True)
+
+    def build(self):
+        """Build complete skill structure."""
+        logger.info(f"Building unified skill: {self.name}")
+
+        # Generate main SKILL.md
+        self._generate_skill_md()
+
+        # Generate reference files by source
+        self._generate_references()
+
+        # Generate conflicts report (if any)
+        if self.conflicts:
+            self._generate_conflicts_report()
+
+        logger.info(f"✅ Unified skill built: {self.skill_dir}/")
+
+    def _generate_skill_md(self):
+        """Generate main SKILL.md file."""
+        skill_path = os.path.join(self.skill_dir, 'SKILL.md')
+
+        # Generate skill name (lowercase, hyphens only, max 64 chars)
+        skill_name = self.name.lower().replace('_', '-').replace(' ', '-')[:64]
+
+        # Truncate description to 1024 chars if needed
+        desc = self.description[:1024] if len(self.description) > 1024 else self.description
+
+        content = f"""---
+name: {skill_name}
+description: {desc}
+---
+
+# {self.name.title()}
+
+{self.description}
+
+## 📚 Sources
+
+This skill combines knowledge from multiple sources:
+
+"""
+
+        # List sources
+        for source in self.config.get('sources', []):
+            source_type = source['type']
+            if source_type == 'documentation':
+                content += f"- ✅ **Documentation**: {source.get('base_url', 'N/A')}\n"
+                content += f"  - Pages: {source.get('max_pages', 'unlimited')}\n"
+            elif source_type == 'github':
+                content += f"- ✅ **GitHub Repository**: {source.get('repo', 'N/A')}\n"
+                content += f"  - Code Analysis: {source.get('code_analysis_depth', 'surface')}\n"
+                content += f"  - Issues: {source.get('max_issues', 0)}\n"
+            elif source_type == 'pdf':
+                content += f"- ✅ **PDF Document**: {source.get('path', 'N/A')}\n"
+
+        # Data quality section
+        if self.conflicts:
+            content += f"\n## ⚠️ Data Quality\n\n"
+            content += f"**{len(self.conflicts)} conflicts detected** between sources.\n\n"
+
+            # Count by type
+            by_type = {}
+            for conflict in self.conflicts:
+                ctype = conflict.type if hasattr(conflict, 'type') else conflict.get('type', 'unknown')
+                by_type[ctype] = by_type.get(ctype, 0) + 1
+
+            content += "**Conflict Breakdown:**\n"
+            for ctype, count in by_type.items():
+                content += f"- {ctype}: {count}\n"
+
+            content += f"\nSee `references/conflicts.md` for detailed conflict information.\n"
+
+        # Merged API section (if available)
+        if self.merged_data:
+            content += self._format_merged_apis()
+
+        # Quick reference from each source
+        content += "\n## 📖 Reference Documentation\n\n"
+        content += "Organized by source:\n\n"
+
+        for source in self.config.get('sources', []):
+            source_type = source['type']
+            content += f"- [{source_type.title()}](references/{source_type}/)\n"
+
+        # When to use this skill
+        content += f"\n## 💡 When to Use This Skill\n\n"
+        content += f"Use this skill when you need to:\n"
+        content += f"- Understand how to use {self.name}\n"
+        content += f"- Look up API documentation\n"
+        content += f"- Find usage examples\n"
+
+        if 'github' in self.scraped_data:
+            content += f"- Check for known issues or recent changes\n"
+            content += f"- Review release history\n"
+
+        content += "\n---\n\n"
+        content += "*Generated by Skill Seeker's unified multi-source scraper*\n"
+
+        with open(skill_path, 'w', encoding='utf-8') as f:
+            f.write(content)
+
+        logger.info(f"Created SKILL.md")
+
+    def _format_merged_apis(self) -> str:
+        """Format merged APIs section with inline conflict warnings."""
+        if not self.merged_data:
+            return ""
+
+        content = "\n## 🔧 API Reference\n\n"
+        content += "*Merged from documentation and code analysis*\n\n"
+
+        apis = self.merged_data.get('apis', {})
+
+        if not apis:
+            return content + "*No APIs to display*\n"
+
+        # Group APIs by status
+        matched = {k: v for k, v in apis.items() if v.get('status') == 'matched'}
+        conflicts = {k: v for k, v in apis.items() if v.get('status') == 'conflict'}
+        docs_only = {k: v for k, v in apis.items() if v.get('status') == 'docs_only'}
+        code_only = {k: v for k, v in apis.items() if v.get('status') == 'code_only'}
+
+        # Show matched APIs first
+        if matched:
+            content += "### ✅ Verified APIs\n\n"
+            content += "*Documentation and code agree*\n\n"
+            for api_name, api_data in list(matched.items())[:10]:  # Limit to first 10
+                content += self._format_api_entry(api_data, inline_conflict=False)
+
+        # Show conflicting APIs with warnings
+        if conflicts:
+            content += "\n### ⚠️ APIs with Conflicts\n\n"
+            content += "*Documentation and code differ*\n\n"
+            for api_name, api_data in list(conflicts.items())[:10]:
+                content += self._format_api_entry(api_data, inline_conflict=True)
+
+        # Show undocumented APIs
+        if code_only:
+            content += f"\n### 💻 Undocumented APIs\n\n"
+            content += f"*Found in code but not in documentation ({len(code_only)} total)*\n\n"
+            for api_name, api_data in list(code_only.items())[:5]:
+                content += self._format_api_entry(api_data, inline_conflict=False)
+
+        # Show removed/missing APIs
+        if docs_only:
+            content += f"\n### 📖 Documentation-Only APIs\n\n"
+            content += f"*Documented but not found in code ({len(docs_only)} total)*\n\n"
+            for api_name, api_data in list(docs_only.items())[:5]:
+                content += self._format_api_entry(api_data, inline_conflict=False)
+
+        content += f"\n*See references/api/ for complete API documentation*\n"
+
+        return content
+
+    def _format_api_entry(self, api_data: Dict, inline_conflict: bool = False) -> str:
+        """Format a single API entry."""
+        name = api_data.get('name', 'Unknown')
+        signature = api_data.get('merged_signature', name)
+        description = api_data.get('merged_description', '')
+        warning = api_data.get('warning', '')
+
+        entry = f"#### `{signature}`\n\n"
+
+        if description:
+            entry += f"{description}\n\n"
+
+        # Add inline conflict warning
+        if inline_conflict and warning:
+            entry += f"⚠️ **Conflict**: {warning}\n\n"
+
+            # Show both versions if available
+            conflict = api_data.get('conflict', {})
+            if conflict:
+                docs_info = conflict.get('docs_info')
+                code_info = conflict.get('code_info')
+
+                if docs_info and code_info:
+                    entry += "**Documentation says:**\n"
+                    entry += f"```\n{docs_info.get('raw_signature', 'N/A')}\n```\n\n"
+                    entry += "**Code implementation:**\n"
+                    entry += f"```\n{self._format_code_signature(code_info)}\n```\n\n"
+
+        # Add source info
+        source = api_data.get('source', 'unknown')
+        entry += f"*Source: {source}*\n\n"
+
+        entry += "---\n\n"
+
+        return entry
+
+    def _format_code_signature(self, code_info: Dict) -> str:
+        """Format code signature for display."""
+        name = code_info.get('name', '')
+        params = code_info.get('parameters', [])
+        return_type = code_info.get('return_type')
+
+        param_strs = []
+        for param in params:
+            param_str = param.get('name', '')
+            if param.get('type_hint'):
+                param_str += f": {param['type_hint']}"
+            if param.get('default'):
+                param_str += f" = {param['default']}"
+            param_strs.append(param_str)
+
+        sig = f"{name}({', '.join(param_strs)})"
+        if return_type:
+            sig += f" -> {return_type}"
+
+        return sig
+
+    def _generate_references(self):
+        """Generate reference files organized by source."""
+        logger.info("Generating reference files...")
+
+        # Generate references for each source type
+        if 'documentation' in self.scraped_data:
+            self._generate_docs_references()
+
+        if 'github' in self.scraped_data:
+            self._generate_github_references()
+
+        if 'pdf' in self.scraped_data:
+            self._generate_pdf_references()
+
+        # Generate merged API reference if available
+        if self.merged_data:
+            self._generate_merged_api_reference()
+
+    def _generate_docs_references(self):
+        """Generate references from documentation source."""
+        docs_dir = os.path.join(self.skill_dir, 'references', 'documentation')
+        os.makedirs(docs_dir, exist_ok=True)
+
+        # Create index
+        index_path = os.path.join(docs_dir, 'index.md')
+        with open(index_path, 'w') as f:
+            f.write("# Documentation\n\n")
+            f.write("Reference from official documentation.\n\n")
+
+        logger.info("Created documentation references")
+
+    def _generate_github_references(self):
+        """Generate references from GitHub source."""
+        github_dir = os.path.join(self.skill_dir, 'references', 'github')
+        os.makedirs(github_dir, exist_ok=True)
+
+        github_data = self.scraped_data['github']['data']
+
+        # Create README reference
+        if github_data.get('readme'):
+            readme_path = os.path.join(github_dir, 'README.md')
+            with open(readme_path, 'w') as f:
+                f.write("# Repository README\n\n")
+                f.write(github_data['readme'])
+
+        # Create issues reference
+        if github_data.get('issues'):
+            issues_path = os.path.join(github_dir, 'issues.md')
+            with open(issues_path, 'w') as f:
+                f.write("# GitHub Issues\n\n")
+                f.write(f"{len(github_data['issues'])} recent issues.\n\n")
+
+                for issue in github_data['issues'][:20]:
+                    f.write(f"## #{issue['number']}: {issue['title']}\n\n")
+                    f.write(f"**State**: {issue['state']}\n")
+                    if issue.get('labels'):
+                        f.write(f"**Labels**: {', '.join(issue['labels'])}\n")
+                    f.write(f"**URL**: {issue.get('url', 'N/A')}\n\n")
+
+        # Create releases reference
+        if github_data.get('releases'):
+            releases_path = os.path.join(github_dir, 'releases.md')
+            with open(releases_path, 'w') as f:
+                f.write("# Releases\n\n")
+
+                for release in github_data['releases'][:10]:
+                    f.write(f"## {release['tag_name']}: {release.get('name', 'N/A')}\n\n")
+                    f.write(f"**Published**: {release.get('published_at', 'N/A')[:10]}\n\n")
+                    if release.get('body'):
+                        f.write(release['body'][:500])
+                        f.write("\n\n")
+
+        logger.info("Created GitHub references")
+
+    def _generate_pdf_references(self):
+        """Generate references from PDF source."""
+        pdf_dir = os.path.join(self.skill_dir, 'references', 'pdf')
+        os.makedirs(pdf_dir, exist_ok=True)
+
+        # Create index
+        index_path = os.path.join(pdf_dir, 'index.md')
+        with open(index_path, 'w') as f:
+            f.write("# PDF Documentation\n\n")
+            f.write("Reference from PDF document.\n\n")
+
+        logger.info("Created PDF references")
+
+    def _generate_merged_api_reference(self):
+        """Generate merged API reference file."""
+        api_dir = os.path.join(self.skill_dir, 'references', 'api')
+        os.makedirs(api_dir, exist_ok=True)
+
+        api_path = os.path.join(api_dir, 'merged_api.md')
+
+        with open(api_path, 'w') as f:
+            f.write("# Merged API Reference\n\n")
+            f.write("*Combined from documentation and code analysis*\n\n")
+
+            apis = self.merged_data.get('apis', {})
+
+            for api_name in sorted(apis.keys()):
+                api_data = apis[api_name]
+                entry = self._format_api_entry(api_data, inline_conflict=True)
+                f.write(entry)
+
+        logger.info(f"Created merged API reference ({len(apis)} APIs)")
+
+    def _generate_conflicts_report(self):
+        """Generate detailed conflicts report."""
+        conflicts_path = os.path.join(self.skill_dir, 'references', 'conflicts.md')
+
+        with open(conflicts_path, 'w') as f:
+            f.write("# Conflict Report\n\n")
+            f.write(f"Found **{len(self.conflicts)}** conflicts between sources.\n\n")
+
+            # Group by severity
+            high = [c for c in self.conflicts if (hasattr(c, 'severity') and c.severity == 'high') or c.get('severity') == 'high']
+            medium = [c for c in self.conflicts if (hasattr(c, 'severity') and c.severity == 'medium') or c.get('severity') == 'medium']
+            low = [c for c in self.conflicts if (hasattr(c, 'severity') and c.severity == 'low') or c.get('severity') == 'low']
+
+            f.write("## Severity Breakdown\n\n")
+            f.write(f"- 🔴 **High**: {len(high)} (action required)\n")
+            f.write(f"- 🟡 **Medium**: {len(medium)} (review recommended)\n")
+            f.write(f"- 🟢 **Low**: {len(low)} (informational)\n\n")
+
+            # List high severity conflicts
+            if high:
+                f.write("## 🔴 High Severity\n\n")
+                f.write("*These conflicts require immediate attention*\n\n")
+
+                for conflict in high:
+                    api_name = conflict.api_name if hasattr(conflict, 'api_name') else conflict.get('api_name', 'Unknown')
+                    diff = conflict.difference if hasattr(conflict, 'difference') else conflict.get('difference', 'N/A')
+
+                    f.write(f"### {api_name}\n\n")
+                    f.write(f"**Issue**: {diff}\n\n")
+
+            # List medium severity
+            if medium:
+                f.write("## 🟡 Medium Severity\n\n")
+
+                for conflict in medium[:20]:  # Limit to 20
+                    api_name = conflict.api_name if hasattr(conflict, 'api_name') else conflict.get('api_name', 'Unknown')
+                    diff = conflict.difference if hasattr(conflict, 'difference') else conflict.get('difference', 'N/A')
+
+                    f.write(f"### {api_name}\n\n")
+                    f.write(f"{diff}\n\n")
+
+        logger.info(f"Created conflicts report")
+
+
+if __name__ == '__main__':
+    # Test with mock data
+    import sys
+
+    if len(sys.argv) < 2:
+        print("Usage: python unified_skill_builder.py <config.json>")
+        sys.exit(1)
+
+    config_path = sys.argv[1]
+
+    with open(config_path, 'r') as f:
+        config = json.load(f)
+
+    # Mock scraped data
+    scraped_data = {
+        'github': {
+            'data': {
+                'readme': '# Test Repository',
+                'issues': [],
+                'releases': []
+            }
+        }
+    }
+
+    builder = UnifiedSkillBuilder(config, scraped_data)
+    builder.build()
+
+    print(f"\n✅ Test skill built in: output/{config['name']}/")

+ 175 - 0
libs/external/Skill_Seekers-development/src/skill_seekers/cli/upload_skill.py

@@ -0,0 +1,175 @@
+#!/usr/bin/env python3
+"""
+Automatic Skill Uploader
+Uploads a skill .zip file to Claude using the Anthropic API
+
+Usage:
+    # Set API key (one-time)
+    export ANTHROPIC_API_KEY=sk-ant-...
+
+    # Upload skill
+    python3 upload_skill.py output/react.zip
+    python3 upload_skill.py output/godot.zip
+"""
+
+import os
+import sys
+import json
+import argparse
+from pathlib import Path
+
+# Import utilities
+try:
+    from utils import (
+        get_api_key,
+        get_upload_url,
+        print_upload_instructions,
+        validate_zip_file
+    )
+except ImportError:
+    sys.path.insert(0, str(Path(__file__).parent))
+    from utils import (
+        get_api_key,
+        get_upload_url,
+        print_upload_instructions,
+        validate_zip_file
+    )
+
+
+def upload_skill_api(zip_path):
+    """
+    Upload skill to Claude via Anthropic API
+
+    Args:
+        zip_path: Path to skill .zip file
+
+    Returns:
+        tuple: (success, message)
+    """
+    # Check for requests library
+    try:
+        import requests
+    except ImportError:
+        return False, "requests library not installed. Run: pip install requests"
+
+    # Validate zip file
+    is_valid, error_msg = validate_zip_file(zip_path)
+    if not is_valid:
+        return False, error_msg
+
+    # Get API key
+    api_key = get_api_key()
+    if not api_key:
+        return False, "ANTHROPIC_API_KEY not set. Run: export ANTHROPIC_API_KEY=sk-ant-..."
+
+    zip_path = Path(zip_path)
+    skill_name = zip_path.stem
+
+    print(f"📤 Uploading skill: {skill_name}")
+    print(f"   Source: {zip_path}")
+    print(f"   Size: {zip_path.stat().st_size:,} bytes")
+    print()
+
+    # Prepare API request
+    api_url = "https://api.anthropic.com/v1/skills"
+    headers = {
+        "x-api-key": api_key,
+        "anthropic-version": "2023-06-01",
+        "anthropic-beta": "skills-2025-10-02"
+    }
+
+    try:
+        # Read zip file
+        with open(zip_path, 'rb') as f:
+            zip_data = f.read()
+
+        # Upload skill
+        print("⏳ Uploading to Anthropic API...")
+
+        files = {
+            'files[]': (zip_path.name, zip_data, 'application/zip')
+        }
+
+        response = requests.post(
+            api_url,
+            headers=headers,
+            files=files,
+            timeout=60
+        )
+
+        # Check response
+        if response.status_code == 200:
+            print()
+            print("✅ Skill uploaded successfully!")
+            print()
+            print("Your skill is now available in Claude at:")
+            print(f"   {get_upload_url()}")
+            print()
+            return True, "Upload successful"
+
+        elif response.status_code == 401:
+            return False, "Authentication failed. Check your ANTHROPIC_API_KEY"
+
+        elif response.status_code == 400:
+            error_msg = response.json().get('error', {}).get('message', 'Unknown error')
+            return False, f"Invalid skill format: {error_msg}"
+
+        else:
+            error_msg = response.json().get('error', {}).get('message', 'Unknown error')
+            return False, f"Upload failed ({response.status_code}): {error_msg}"
+
+    except requests.exceptions.Timeout:
+        return False, "Upload timed out. Try again or use manual upload"
+
+    except requests.exceptions.ConnectionError:
+        return False, "Connection error. Check your internet connection"
+
+    except Exception as e:
+        return False, f"Unexpected error: {str(e)}"
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Upload a skill .zip file to Claude via Anthropic API",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Setup:
+  1. Get your Anthropic API key from https://console.anthropic.com/
+  2. Set the API key:
+     export ANTHROPIC_API_KEY=sk-ant-...
+
+Examples:
+  # Upload skill
+  python3 upload_skill.py output/react.zip
+
+  # Upload with explicit path
+  python3 upload_skill.py /path/to/skill.zip
+
+Requirements:
+  - ANTHROPIC_API_KEY environment variable must be set
+  - requests library (pip install requests)
+        """
+    )
+
+    parser.add_argument(
+        'zip_file',
+        help='Path to skill .zip file (e.g., output/react.zip)'
+    )
+
+    args = parser.parse_args()
+
+    # Upload skill
+    success, message = upload_skill_api(args.zip_file)
+
+    if success:
+        sys.exit(0)
+    else:
+        print(f"\n❌ Upload failed: {message}")
+        print()
+        print("📝 Manual upload instructions:")
+        print_upload_instructions(args.zip_file)
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()

+ 224 - 0
libs/external/Skill_Seekers-development/src/skill_seekers/cli/utils.py

@@ -0,0 +1,224 @@
+#!/usr/bin/env python3
+"""
+Utility functions for Skill Seeker CLI tools
+"""
+
+import os
+import sys
+import subprocess
+import platform
+from pathlib import Path
+from typing import Optional, Tuple, Dict, Union
+
+
+def open_folder(folder_path: Union[str, Path]) -> bool:
+    """
+    Open a folder in the system file browser
+
+    Args:
+        folder_path: Path to folder to open
+
+    Returns:
+        bool: True if successful, False otherwise
+    """
+    folder_path = Path(folder_path).resolve()
+
+    if not folder_path.exists():
+        print(f"⚠️  Folder not found: {folder_path}")
+        return False
+
+    system = platform.system()
+
+    try:
+        if system == "Linux":
+            # Try xdg-open first (standard)
+            subprocess.run(["xdg-open", str(folder_path)], check=True)
+        elif system == "Darwin":  # macOS
+            subprocess.run(["open", str(folder_path)], check=True)
+        elif system == "Windows":
+            subprocess.run(["explorer", str(folder_path)], check=True)
+        else:
+            print(f"⚠️  Unknown operating system: {system}")
+            return False
+
+        return True
+
+    except subprocess.CalledProcessError:
+        print(f"⚠️  Could not open folder automatically")
+        return False
+    except FileNotFoundError:
+        print(f"⚠️  File browser not found on system")
+        return False
+
+
+def has_api_key() -> bool:
+    """
+    Check if ANTHROPIC_API_KEY is set in environment
+
+    Returns:
+        bool: True if API key is set, False otherwise
+    """
+    api_key = os.environ.get('ANTHROPIC_API_KEY', '').strip()
+    return len(api_key) > 0
+
+
+def get_api_key() -> Optional[str]:
+    """
+    Get ANTHROPIC_API_KEY from environment
+
+    Returns:
+        str: API key or None if not set
+    """
+    api_key = os.environ.get('ANTHROPIC_API_KEY', '').strip()
+    return api_key if api_key else None
+
+
+def get_upload_url() -> str:
+    """
+    Get the Claude skills upload URL
+
+    Returns:
+        str: Claude skills upload URL
+    """
+    return "https://claude.ai/skills"
+
+
+def print_upload_instructions(zip_path: Union[str, Path]) -> None:
+    """
+    Print clear upload instructions for manual upload
+
+    Args:
+        zip_path: Path to the .zip file to upload
+    """
+    zip_path = Path(zip_path)
+
+    print()
+    print("╔══════════════════════════════════════════════════════════╗")
+    print("║                     NEXT STEP                            ║")
+    print("╚══════════════════════════════════════════════════════════╝")
+    print()
+    print(f"📤 Upload to Claude: {get_upload_url()}")
+    print()
+    print(f"1. Go to {get_upload_url()}")
+    print("2. Click \"Upload Skill\"")
+    print(f"3. Select: {zip_path}")
+    print("4. Done! ✅")
+    print()
+
+
+def format_file_size(size_bytes: int) -> str:
+    """
+    Format file size in human-readable format
+
+    Args:
+        size_bytes: Size in bytes
+
+    Returns:
+        str: Formatted size (e.g., "45.3 KB")
+    """
+    if size_bytes < 1024:
+        return f"{size_bytes} bytes"
+    elif size_bytes < 1024 * 1024:
+        return f"{size_bytes / 1024:.1f} KB"
+    else:
+        return f"{size_bytes / (1024 * 1024):.1f} MB"
+
+
+def validate_skill_directory(skill_dir: Union[str, Path]) -> Tuple[bool, Optional[str]]:
+    """
+    Validate that a directory is a valid skill directory
+
+    Args:
+        skill_dir: Path to skill directory
+
+    Returns:
+        tuple: (is_valid, error_message)
+    """
+    skill_path = Path(skill_dir)
+
+    if not skill_path.exists():
+        return False, f"Directory not found: {skill_dir}"
+
+    if not skill_path.is_dir():
+        return False, f"Not a directory: {skill_dir}"
+
+    skill_md = skill_path / "SKILL.md"
+    if not skill_md.exists():
+        return False, f"SKILL.md not found in {skill_dir}"
+
+    return True, None
+
+
+def validate_zip_file(zip_path: Union[str, Path]) -> Tuple[bool, Optional[str]]:
+    """
+    Validate that a file is a valid skill .zip file
+
+    Args:
+        zip_path: Path to .zip file
+
+    Returns:
+        tuple: (is_valid, error_message)
+    """
+    zip_path = Path(zip_path)
+
+    if not zip_path.exists():
+        return False, f"File not found: {zip_path}"
+
+    if not zip_path.is_file():
+        return False, f"Not a file: {zip_path}"
+
+    if not zip_path.suffix == '.zip':
+        return False, f"Not a .zip file: {zip_path}"
+
+    return True, None
+
+
+def read_reference_files(skill_dir: Union[str, Path], max_chars: int = 100000, preview_limit: int = 40000) -> Dict[str, str]:
+    """Read reference files from a skill directory with size limits.
+
+    This function reads markdown files from the references/ subdirectory
+    of a skill, applying both per-file and total content limits.
+
+    Args:
+        skill_dir (str or Path): Path to skill directory
+        max_chars (int): Maximum total characters to read (default: 100000)
+        preview_limit (int): Maximum characters per file (default: 40000)
+
+    Returns:
+        dict: Dictionary mapping filename to content
+
+    Example:
+        >>> refs = read_reference_files('output/react/', max_chars=50000)
+        >>> len(refs)
+        5
+    """
+    from pathlib import Path
+
+    skill_path = Path(skill_dir)
+    references_dir = skill_path / "references"
+    references: Dict[str, str] = {}
+
+    if not references_dir.exists():
+        print(f"⚠ No references directory found at {references_dir}")
+        return references
+
+    total_chars = 0
+    for ref_file in sorted(references_dir.glob("*.md")):
+        if ref_file.name == "index.md":
+            continue
+
+        content = ref_file.read_text(encoding='utf-8')
+
+        # Limit size per file
+        if len(content) > preview_limit:
+            content = content[:preview_limit] + "\n\n[Content truncated...]"
+
+        references[ref_file.name] = content
+        total_chars += len(content)
+
+        # Stop if we've read enough
+        if total_chars > max_chars:
+            print(f"  ℹ Limiting input to {max_chars:,} characters")
+            break
+
+    return references

+ 596 - 0
libs/external/Skill_Seekers-development/src/skill_seekers/mcp/README.md

@@ -0,0 +1,596 @@
+# Skill Seeker MCP Server
+
+Model Context Protocol (MCP) server for Skill Seeker - enables Claude Code to generate documentation skills directly.
+
+## What is This?
+
+This MCP server allows Claude Code to use Skill Seeker's tools directly through natural language commands. Instead of running CLI commands manually, you can ask Claude Code to:
+
+- Generate config files for any documentation site
+- Estimate page counts before scraping
+- Scrape documentation and build skills
+- Package skills into `.zip` files
+- List and validate configurations
+- Split large documentation (10K-40K+ pages) into focused sub-skills
+- Generate intelligent router/hub skills for split documentation
+- **NEW:** Scrape PDF documentation and extract code/images
+
+## Quick Start
+
+### 1. Install Dependencies
+
+```bash
+# From repository root
+pip3 install -r mcp/requirements.txt
+pip3 install requests beautifulsoup4
+```
+
+### 2. Quick Setup (Automated)
+
+```bash
+# Run the setup script
+./setup_mcp.sh
+
+# Follow the prompts - it will:
+# - Install dependencies
+# - Test the server
+# - Generate configuration
+# - Guide you through Claude Code setup
+```
+
+### 3. Manual Setup
+
+Add to `~/.config/claude-code/mcp.json`:
+
+```json
+{
+  "mcpServers": {
+    "skill-seeker": {
+      "command": "python3",
+      "args": [
+        "/path/to/Skill_Seekers/mcp/server.py"
+      ],
+      "cwd": "/path/to/Skill_Seekers"
+    }
+  }
+}
+```
+
+**Replace `/path/to/Skill_Seekers`** with your actual repository path!
+
+### 4. Restart Claude Code
+
+Quit and reopen Claude Code (don't just close the window).
+
+### 5. Test
+
+In Claude Code, type:
+```
+List all available configs
+```
+
+You should see a list of preset configurations (Godot, React, Vue, etc.).
+
+## Available Tools
+
+The MCP server exposes 10 tools:
+
+### 1. `generate_config`
+Create a new configuration file for any documentation website.
+
+**Parameters:**
+- `name` (required): Skill name (e.g., "tailwind")
+- `url` (required): Documentation URL (e.g., "https://tailwindcss.com/docs")
+- `description` (required): When to use this skill
+- `max_pages` (optional): Maximum pages to scrape (default: 100)
+- `rate_limit` (optional): Delay between requests in seconds (default: 0.5)
+
+**Example:**
+```
+Generate config for Tailwind CSS at https://tailwindcss.com/docs
+```
+
+### 2. `estimate_pages`
+Estimate how many pages will be scraped from a config (fast, no data downloaded).
+
+**Parameters:**
+- `config_path` (required): Path to config file (e.g., "configs/react.json")
+- `max_discovery` (optional): Maximum pages to discover (default: 1000)
+
+**Example:**
+```
+Estimate pages for configs/react.json
+```
+
+### 3. `scrape_docs`
+Scrape documentation and build Claude skill.
+
+**Parameters:**
+- `config_path` (required): Path to config file
+- `enhance_local` (optional): Open terminal for local enhancement (default: false)
+- `skip_scrape` (optional): Use cached data (default: false)
+- `dry_run` (optional): Preview without saving (default: false)
+
+**Example:**
+```
+Scrape docs using configs/react.json
+```
+
+### 4. `package_skill`
+Package a skill directory into a `.zip` file ready for Claude upload. Automatically uploads if ANTHROPIC_API_KEY is set.
+
+**Parameters:**
+- `skill_dir` (required): Path to skill directory (e.g., "output/react/")
+- `auto_upload` (optional): Try to upload automatically if API key is available (default: true)
+
+**Example:**
+```
+Package skill at output/react/
+```
+
+### 5. `upload_skill`
+Upload a skill .zip file to Claude automatically (requires ANTHROPIC_API_KEY).
+
+**Parameters:**
+- `skill_zip` (required): Path to skill .zip file (e.g., "output/react.zip")
+
+**Example:**
+```
+Upload output/react.zip using upload_skill
+```
+
+### 6. `list_configs`
+List all available preset configurations.
+
+**Parameters:** None
+
+**Example:**
+```
+List all available configs
+```
+
+### 7. `validate_config`
+Validate a config file for errors.
+
+**Parameters:**
+- `config_path` (required): Path to config file
+
+**Example:**
+```
+Validate configs/godot.json
+```
+
+### 8. `split_config`
+Split large documentation config into multiple focused skills. For 10K+ page documentation.
+
+**Parameters:**
+- `config_path` (required): Path to config JSON file (e.g., "configs/godot.json")
+- `strategy` (optional): Split strategy - "auto", "none", "category", "router", "size" (default: "auto")
+- `target_pages` (optional): Target pages per skill (default: 5000)
+- `dry_run` (optional): Preview without saving files (default: false)
+
+**Example:**
+```
+Split configs/godot.json using router strategy with 5000 pages per skill
+```
+
+**Strategies:**
+- **auto** - Intelligently detects best strategy based on page count and config
+- **category** - Split by documentation categories (creates focused sub-skills)
+- **router** - Create router/hub skill + specialized sub-skills (RECOMMENDED for 10K+ pages)
+- **size** - Split every N pages (for docs without clear categories)
+
+### 9. `generate_router`
+Generate router/hub skill for split documentation. Creates intelligent routing to sub-skills.
+
+**Parameters:**
+- `config_pattern` (required): Config pattern for sub-skills (e.g., "configs/godot-*.json")
+- `router_name` (optional): Router skill name (inferred from configs if not provided)
+
+**Example:**
+```
+Generate router for configs/godot-*.json
+```
+
+**What it does:**
+- Analyzes all sub-skill configs
+- Extracts routing keywords from categories and names
+- Creates router SKILL.md with intelligent routing logic
+- Users can ask questions naturally, router directs to appropriate sub-skill
+
+### 10. `scrape_pdf`
+Scrape PDF documentation and build Claude skill. Extracts text, code blocks, images, and tables from PDF files with advanced features.
+
+**Parameters:**
+- `config_path` (optional): Path to PDF config JSON file (e.g., "configs/manual_pdf.json")
+- `pdf_path` (optional): Direct PDF path (alternative to config_path)
+- `name` (optional): Skill name (required with pdf_path)
+- `description` (optional): Skill description
+- `from_json` (optional): Build from extracted JSON file (e.g., "output/manual_extracted.json")
+- `use_ocr` (optional): Use OCR for scanned PDFs (requires pytesseract)
+- `password` (optional): Password for encrypted PDFs
+- `extract_tables` (optional): Extract tables from PDF
+- `parallel` (optional): Process pages in parallel for faster extraction
+- `max_workers` (optional): Number of parallel workers (default: CPU count)
+
+**Examples:**
+```
+Scrape PDF at docs/manual.pdf and create skill named api-docs
+Create skill from configs/example_pdf.json
+Build skill from output/manual_extracted.json
+Scrape scanned PDF with OCR: --pdf docs/scanned.pdf --ocr
+Scrape encrypted PDF: --pdf docs/manual.pdf --password mypassword
+Extract tables: --pdf docs/data.pdf --extract-tables
+Fast parallel processing: --pdf docs/large.pdf --parallel --workers 8
+```
+
+**What it does:**
+- Extracts text and markdown from PDF pages
+- Detects code blocks using 3 methods (font, indent, pattern)
+- Detects programming language with confidence scoring (19+ languages)
+- Validates syntax and scores code quality (0-10 scale)
+- Extracts images with size filtering
+- **NEW:** Extracts tables from PDFs (Priority 2)
+- **NEW:** OCR support for scanned PDFs (Priority 2, requires pytesseract + Pillow)
+- **NEW:** Password-protected PDF support (Priority 2)
+- **NEW:** Parallel page processing for faster extraction (Priority 3)
+- **NEW:** Intelligent caching of expensive operations (Priority 3)
+- Detects chapters and creates page chunks
+- Categorizes content automatically
+- Generates complete skill structure (SKILL.md + references)
+
+**Performance:**
+- Sequential: ~30-60 seconds per 100 pages
+- Parallel (8 workers): ~10-20 seconds per 100 pages (3x faster)
+
+**See:** `docs/PDF_SCRAPER.md` for complete PDF documentation guide
+
+## Example Workflows
+
+### Generate a New Skill from Scratch
+
+```
+User: Generate config for Svelte at https://svelte.dev/docs
+
+Claude: ✅ Config created: configs/svelte.json
+
+User: Estimate pages for configs/svelte.json
+
+Claude: 📊 Estimated pages: 150
+
+User: Scrape docs using configs/svelte.json
+
+Claude: ✅ Skill created at output/svelte/
+
+User: Package skill at output/svelte/
+
+Claude: ✅ Created: output/svelte.zip
+      Ready to upload to Claude!
+```
+
+### Use Existing Preset
+
+```
+User: List all available configs
+
+Claude: [Shows all configs: godot, react, vue, django, fastapi, etc.]
+
+User: Scrape docs using configs/react.json
+
+Claude: ✅ Skill created at output/react/
+
+User: Package skill at output/react/
+
+Claude: ✅ Created: output/react.zip
+```
+
+### Validate Before Scraping
+
+```
+User: Validate configs/godot.json
+
+Claude: ✅ Config is valid!
+        Name: godot
+        Base URL: https://docs.godotengine.org/en/stable/
+        Max pages: 500
+        Rate limit: 0.5s
+
+User: Scrape docs using configs/godot.json
+
+Claude: [Starts scraping...]
+```
+
+### PDF Documentation - NEW
+
+```
+User: Scrape PDF at docs/api-manual.pdf and create skill named api-docs
+
+Claude: 📄 Scraping PDF documentation...
+        ✅ Extracted 120 pages
+        ✅ Found 45 code blocks (Python, JavaScript, C++)
+        ✅ Extracted 12 images
+        ✅ Created skill at output/api-docs/
+        📦 Package with: python3 cli/package_skill.py output/api-docs/
+
+User: Package skill at output/api-docs/
+
+Claude: ✅ Created: output/api-docs.zip
+        Ready to upload to Claude!
+```
+
+### Large Documentation (40K Pages)
+
+```
+User: Estimate pages for configs/godot.json
+
+Claude: 📊 Estimated pages: 40,000
+        ⚠️  Large documentation detected!
+        💡 Recommend splitting into multiple skills
+
+User: Split configs/godot.json using router strategy
+
+Claude: ✅ Split complete!
+        Created 5 sub-skills:
+        - godot-scripting.json (5,000 pages)
+        - godot-2d.json (8,000 pages)
+        - godot-3d.json (10,000 pages)
+        - godot-physics.json (6,000 pages)
+        - godot-shaders.json (11,000 pages)
+
+User: Scrape all godot sub-skills in parallel
+
+Claude: [Starts scraping all 5 configs in parallel...]
+        ✅ All skills created in 4-8 hours instead of 20-40!
+
+User: Generate router for configs/godot-*.json
+
+Claude: ✅ Router skill created at output/godot/
+        Routing logic:
+        - "scripting", "gdscript" → godot-scripting
+        - "2d", "sprites", "tilemap" → godot-2d
+        - "3d", "meshes", "camera" → godot-3d
+        - "physics", "collision" → godot-physics
+        - "shaders", "visual shader" → godot-shaders
+
+User: Package all godot skills
+
+Claude: ✅ 6 skills packaged:
+        - godot.zip (router)
+        - godot-scripting.zip
+        - godot-2d.zip
+        - godot-3d.zip
+        - godot-physics.zip
+        - godot-shaders.zip
+
+        Upload all to Claude!
+        Users just ask questions naturally - router handles routing!
+```
+
+## Architecture
+
+### Server Structure
+
+```
+mcp/
+├── server.py           # Main MCP server
+├── requirements.txt    # MCP dependencies
+└── README.md          # This file
+```
+
+### How It Works
+
+1. **Claude Code** sends MCP requests to the server
+2. **Server** routes requests to appropriate tool functions
+3. **Tools** call CLI scripts (`doc_scraper.py`, `estimate_pages.py`, etc.)
+4. **CLI scripts** perform actual work (scraping, packaging, etc.)
+5. **Results** returned to Claude Code via MCP protocol
+
+### Tool Implementation
+
+Each tool is implemented as an async function:
+
+```python
+async def generate_config_tool(args: dict) -> list[TextContent]:
+    """Generate a config file"""
+    # Create config JSON
+    # Save to configs/
+    # Return success message
+```
+
+Tools use `subprocess.run()` to call CLI scripts:
+
+```python
+result = subprocess.run([
+    sys.executable,
+    str(CLI_DIR / "doc_scraper.py"),
+    "--config", config_path
+], capture_output=True, text=True)
+```
+
+## Testing
+
+The MCP server has comprehensive test coverage:
+
+```bash
+# Run MCP server tests (25 tests)
+python3 -m pytest tests/test_mcp_server.py -v
+
+# Expected output: 25 passed in ~0.3s
+```
+
+### Test Coverage
+
+- **Server initialization** (2 tests)
+- **Tool listing** (2 tests)
+- **generate_config** (3 tests)
+- **estimate_pages** (3 tests)
+- **scrape_docs** (4 tests)
+- **package_skill** (3 tests)
+- **upload_skill** (2 tests)
+- **list_configs** (3 tests)
+- **validate_config** (3 tests)
+- **split_config** (3 tests)
+- **generate_router** (3 tests)
+- **Tool routing** (2 tests)
+- **Integration** (1 test)
+
+**Total: 34 tests | Pass rate: 100%**
+
+## Troubleshooting
+
+### MCP Server Not Loading
+
+**Symptoms:**
+- Tools don't appear in Claude Code
+- No response to skill-seeker commands
+
+**Solutions:**
+
+1. Check configuration:
+   ```bash
+   cat ~/.config/claude-code/mcp.json
+   ```
+
+2. Verify server can start:
+   ```bash
+   python3 mcp/server.py
+   # Should start without errors (Ctrl+C to exit)
+   ```
+
+3. Check dependencies:
+   ```bash
+   pip3 install -r mcp/requirements.txt
+   ```
+
+4. Completely restart Claude Code (quit and reopen)
+
+5. Check Claude Code logs:
+   - macOS: `~/Library/Logs/Claude Code/`
+   - Linux: `~/.config/claude-code/logs/`
+
+### "ModuleNotFoundError: No module named 'mcp'"
+
+```bash
+pip3 install -r mcp/requirements.txt
+```
+
+### Tools Appear But Don't Work
+
+**Solutions:**
+
+1. Verify `cwd` in config points to repository root
+2. Check CLI tools exist:
+   ```bash
+   ls cli/doc_scraper.py
+   ls cli/estimate_pages.py
+   ls cli/package_skill.py
+   ```
+
+3. Test CLI tools directly:
+   ```bash
+   python3 cli/doc_scraper.py --help
+   ```
+
+### Slow Operations
+
+1. Check rate limit in configs (increase if needed)
+2. Use smaller `max_pages` for testing
+3. Use `skip_scrape` to avoid re-downloading data
+
+## Advanced Configuration
+
+### Using Virtual Environment
+
+```bash
+# Create venv
+python3 -m venv venv
+source venv/bin/activate
+pip install -r mcp/requirements.txt
+pip install requests beautifulsoup4
+which python3  # Copy this path
+```
+
+Configure Claude Code to use venv Python:
+
+```json
+{
+  "mcpServers": {
+    "skill-seeker": {
+      "command": "/path/to/Skill_Seekers/venv/bin/python3",
+      "args": ["/path/to/Skill_Seekers/mcp/server.py"],
+      "cwd": "/path/to/Skill_Seekers"
+    }
+  }
+}
+```
+
+### Debug Mode
+
+Enable verbose logging:
+
+```json
+{
+  "mcpServers": {
+    "skill-seeker": {
+      "command": "python3",
+      "args": ["-u", "/path/to/Skill_Seekers/mcp/server.py"],
+      "cwd": "/path/to/Skill_Seekers",
+      "env": {
+        "DEBUG": "1"
+      }
+    }
+  }
+}
+```
+
+### With API Enhancement
+
+For API-based enhancement (requires Anthropic API key):
+
+```json
+{
+  "mcpServers": {
+    "skill-seeker": {
+      "command": "python3",
+      "args": ["/path/to/Skill_Seekers/mcp/server.py"],
+      "cwd": "/path/to/Skill_Seekers",
+      "env": {
+        "ANTHROPIC_API_KEY": "sk-ant-your-key-here"
+      }
+    }
+  }
+}
+```
+
+## Performance
+
+| Operation | Time | Notes |
+|-----------|------|-------|
+| List configs | <1s | Instant |
+| Generate config | <1s | Creates JSON file |
+| Validate config | <1s | Quick validation |
+| Estimate pages | 1-2min | Fast, no data download |
+| Split config | 1-3min | Analyzes and creates sub-configs |
+| Generate router | 10-30s | Creates router SKILL.md |
+| Scrape docs | 15-45min | First time only |
+| Scrape docs (40K pages) | 20-40hrs | Sequential |
+| Scrape docs (40K pages, parallel) | 4-8hrs | 5 skills in parallel |
+| Scrape (cached) | <1min | With `skip_scrape` |
+| Package skill | 5-10s | Creates .zip |
+| Package multi | 30-60s | Packages 5-10 skills |
+
+## Documentation
+
+- **Full Setup Guide**: [docs/MCP_SETUP.md](../docs/MCP_SETUP.md)
+- **Main README**: [README.md](../README.md)
+- **Usage Guide**: [docs/USAGE.md](../docs/USAGE.md)
+- **Testing Guide**: [docs/TESTING.md](../docs/TESTING.md)
+
+## Support
+
+- **Issues**: [GitHub Issues](https://github.com/yusufkaraaslan/Skill_Seekers/issues)
+- **Discussions**: [GitHub Discussions](https://github.com/yusufkaraaslan/Skill_Seekers/discussions)
+
+## License
+
+MIT License - See [LICENSE](../LICENSE) for details

Some files were not shown because too many files changed in this diff