For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
Goal: Fix fundamental issues in llms.txt handling: rename .txt→.md, download all 3 variants, remove truncation.
Architecture: Modify existing llms.txt download/parse/build workflow to handle multiple variants correctly, store with proper extensions, and preserve complete content without truncation.
Tech Stack: Python 3.10+, requests, BeautifulSoup4, existing Skill_Seekers architecture
Files:
cli/llms_txt_detector.pytests/test_llms_txt_detector.pyStep 1: Write failing test for detect_all() method
# tests/test_llms_txt_detector.py (add new test)
def test_detect_all_variants():
"""Test detecting all llms.txt variants"""
from unittest.mock import patch, Mock
detector = LlmsTxtDetector("https://hono.dev/docs")
with patch('cli.llms_txt_detector.requests.head') as mock_head:
# Mock responses for different variants
def mock_response(url, **kwargs):
response = Mock()
# All 3 variants exist for Hono
if 'llms-full.txt' in url or 'llms.txt' in url or 'llms-small.txt' in url:
response.status_code = 200
else:
response.status_code = 404
return response
mock_head.side_effect = mock_response
variants = detector.detect_all()
assert len(variants) == 3
assert any(v['variant'] == 'full' for v in variants)
assert any(v['variant'] == 'standard' for v in variants)
assert any(v['variant'] == 'small' for v in variants)
assert all('url' in v for v in variants)
Step 2: Run test to verify it fails
Run: source .venv/bin/activate && pytest tests/test_llms_txt_detector.py::test_detect_all_variants -v
Expected: FAIL with "AttributeError: 'LlmsTxtDetector' object has no attribute 'detect_all'"
Step 3: Implement detect_all() method
# cli/llms_txt_detector.py (add new method)
def detect_all(self) -> List[Dict[str, str]]:
"""
Detect all available llms.txt variants.
Returns:
List of dicts with 'url' and 'variant' keys for each found variant
"""
found_variants = []
for filename, variant in self.VARIANTS:
parsed = urlparse(self.base_url)
root_url = f"{parsed.scheme}://{parsed.netloc}"
url = f"{root_url}/{filename}"
if self._check_url_exists(url):
found_variants.append({
'url': url,
'variant': variant
})
return found_variants
Step 4: Add import for List and Dict at top of file
# cli/llms_txt_detector.py (add to imports)
from typing import Optional, Dict, List
Step 5: Run test to verify it passes
Run: source .venv/bin/activate && pytest tests/test_llms_txt_detector.py::test_detect_all_variants -v
Expected: PASS
Step 6: Commit
git add cli/llms_txt_detector.py tests/test_llms_txt_detector.py
git commit -m "feat: add detect_all() for multi-variant detection"
Files:
cli/llms_txt_downloader.pytests/test_llms_txt_downloader.pyStep 1: Write failing test for get_proper_filename() method
# tests/test_llms_txt_downloader.py (add new test)
def test_get_proper_filename():
"""Test filename conversion from .txt to .md"""
downloader = LlmsTxtDownloader("https://hono.dev/llms-full.txt")
filename = downloader.get_proper_filename()
assert filename == "llms-full.md"
assert not filename.endswith('.txt')
def test_get_proper_filename_standard():
"""Test standard variant naming"""
downloader = LlmsTxtDownloader("https://hono.dev/llms.txt")
filename = downloader.get_proper_filename()
assert filename == "llms.md"
def test_get_proper_filename_small():
"""Test small variant naming"""
downloader = LlmsTxtDownloader("https://hono.dev/llms-small.txt")
filename = downloader.get_proper_filename()
assert filename == "llms-small.md"
Step 2: Run test to verify it fails
Run: source .venv/bin/activate && pytest tests/test_llms_txt_downloader.py::test_get_proper_filename -v
Expected: FAIL with "AttributeError: 'LlmsTxtDownloader' object has no attribute 'get_proper_filename'"
Step 3: Implement get_proper_filename() method
# cli/llms_txt_downloader.py (add new method)
def get_proper_filename(self) -> str:
"""
Extract filename from URL and convert .txt to .md
Returns:
Proper filename with .md extension
Examples:
https://hono.dev/llms-full.txt -> llms-full.md
https://hono.dev/llms.txt -> llms.md
https://hono.dev/llms-small.txt -> llms-small.md
"""
# Extract filename from URL
from urllib.parse import urlparse
parsed = urlparse(self.url)
filename = parsed.path.split('/')[-1]
# Replace .txt with .md
if filename.endswith('.txt'):
filename = filename[:-4] + '.md'
return filename
Step 4: Run test to verify it passes
Run: source .venv/bin/activate && pytest tests/test_llms_txt_downloader.py::test_get_proper_filename -v
Expected: PASS (all 3 tests)
Step 5: Commit
git add cli/llms_txt_downloader.py tests/test_llms_txt_downloader.py
git commit -m "feat: add get_proper_filename() for .txt to .md conversion"
Files:
cli/doc_scraper.py:337-384 (_try_llms_txt method)tests/test_integration.pyStep 1: Write failing test for multi-variant download
# tests/test_integration.py (add to TestFullLlmsTxtWorkflow class)
def test_multi_variant_download(self):
"""Test downloading all 3 llms.txt variants"""
from unittest.mock import patch, Mock
import tempfile
import os
config = {
'name': 'test-multi-variant',
'base_url': 'https://hono.dev/docs'
}
# Mock all 3 variants
sample_full = "# Full\n" + "x" * 1000
sample_standard = "# Standard\n" + "x" * 200
sample_small = "# Small\n" + "x" * 500
with tempfile.TemporaryDirectory() as tmpdir:
with patch('cli.llms_txt_detector.requests.head') as mock_head, \
patch('cli.llms_txt_downloader.requests.get') as mock_get:
# Mock detection (all exist)
mock_head_response = Mock()
mock_head_response.status_code = 200
mock_head.return_value = mock_head_response
# Mock downloads
def mock_download(url, **kwargs):
response = Mock()
response.status_code = 200
if 'llms-full.txt' in url:
response.text = sample_full
elif 'llms-small.txt' in url:
response.text = sample_small
else: # llms.txt
response.text = sample_standard
return response
mock_get.side_effect = mock_download
# Run scraper
scraper = DocumentationScraper(config, dry_run=False)
result = scraper._try_llms_txt()
# Verify all 3 files created
refs_dir = os.path.join(scraper.skill_dir, 'references')
assert os.path.exists(os.path.join(refs_dir, 'llms-full.md'))
assert os.path.exists(os.path.join(refs_dir, 'llms.md'))
assert os.path.exists(os.path.join(refs_dir, 'llms-small.md'))
# Verify content not truncated
with open(os.path.join(refs_dir, 'llms-full.md')) as f:
content = f.read()
assert len(content) == len(sample_full)
Step 2: Run test to verify it fails
Run: source .venv/bin/activate && pytest tests/test_integration.py::TestFullLlmsTxtWorkflow::test_multi_variant_download -v
Expected: FAIL - only one file created, not all 3
Step 3: Modify _try_llms_txt() to use detect_all()
# cli/doc_scraper.py (replace _try_llms_txt method, lines 337-384)
def _try_llms_txt(self) -> bool:
"""
Try to use llms.txt instead of HTML scraping.
Downloads ALL available variants and stores with .md extension.
Returns:
True if llms.txt was found and processed successfully
"""
print(f"\n🔍 Checking for llms.txt at {self.base_url}...")
# Check for explicit config URL first
explicit_url = self.config.get('llms_txt_url')
if explicit_url:
print(f"\n📌 Using explicit llms_txt_url from config: {explicit_url}")
downloader = LlmsTxtDownloader(explicit_url)
content = downloader.download()
if content:
# Save with proper .md extension
filename = downloader.get_proper_filename()
filepath = os.path.join(self.skill_dir, "references", filename)
os.makedirs(os.path.dirname(filepath), exist_ok=True)
with open(filepath, 'w', encoding='utf-8') as f:
f.write(content)
print(f" 💾 Saved {filename} ({len(content)} chars)")
# Parse and save pages
parser = LlmsTxtParser(content)
pages = parser.parse()
if pages:
for page in pages:
self.save_page(page)
self.pages.append(page)
self.llms_txt_detected = True
self.llms_txt_variant = 'explicit'
return True
# Auto-detection: Find ALL variants
detector = LlmsTxtDetector(self.base_url)
variants = detector.detect_all()
if not variants:
print("ℹ️ No llms.txt found, using HTML scraping")
return False
print(f"✅ Found {len(variants)} llms.txt variant(s)")
# Download ALL variants
downloaded = {}
for variant_info in variants:
url = variant_info['url']
variant = variant_info['variant']
print(f" 📥 Downloading {variant}...")
downloader = LlmsTxtDownloader(url)
content = downloader.download()
if content:
filename = downloader.get_proper_filename()
downloaded[variant] = {
'content': content,
'filename': filename,
'size': len(content)
}
print(f" ✓ {filename} ({len(content)} chars)")
if not downloaded:
print("⚠️ Failed to download any variants, falling back to HTML scraping")
return False
# Save ALL variants to references/
os.makedirs(os.path.join(self.skill_dir, "references"), exist_ok=True)
for variant, data in downloaded.items():
filepath = os.path.join(self.skill_dir, "references", data['filename'])
with open(filepath, 'w', encoding='utf-8') as f:
f.write(data['content'])
print(f" 💾 Saved {data['filename']}")
# Parse LARGEST variant for skill building
largest = max(downloaded.items(), key=lambda x: x[1]['size'])
print(f"\n📄 Parsing {largest[1]['filename']} for skill building...")
parser = LlmsTxtParser(largest[1]['content'])
pages = parser.parse()
if not pages:
print("⚠️ Failed to parse llms.txt, falling back to HTML scraping")
return False
print(f" ✓ Parsed {len(pages)} sections")
# Save pages for skill building
for page in pages:
self.save_page(page)
self.pages.append(page)
self.llms_txt_detected = True
self.llms_txt_variants = list(downloaded.keys())
return True
Step 4: Add llms_txt_variants attribute to init
# cli/doc_scraper.py (in __init__ method, after llms_txt_variant line)
self.llms_txt_variants = [] # Track all downloaded variants
Step 5: Run test to verify it passes
Run: source .venv/bin/activate && pytest tests/test_integration.py::TestFullLlmsTxtWorkflow::test_multi_variant_download -v
Expected: PASS
Step 6: Commit
git add cli/doc_scraper.py tests/test_integration.py
git commit -m "feat: download all llms.txt variants with proper .md extension"
Files:
cli/doc_scraper.py:714-730 (create_reference_file method)Step 1: Write failing test for no truncation
# tests/test_integration.py (add new test)
def test_no_content_truncation():
"""Test that content is NOT truncated in reference files"""
from unittest.mock import Mock
import tempfile
import os
config = {
'name': 'test-no-truncate',
'base_url': 'https://example.com/docs'
}
# Create scraper with long content
scraper = DocumentationScraper(config, dry_run=False)
# Create page with content > 2500 chars
long_content = "x" * 5000
long_code = "y" * 1000
pages = [{
'title': 'Long Page',
'url': 'https://example.com/long',
'content': long_content,
'code_samples': [
{'code': long_code, 'language': 'python'}
],
'headings': []
}]
# Create reference file
scraper.create_reference_file('test', pages)
# Verify no truncation
ref_file = os.path.join(scraper.skill_dir, 'references', 'test.md')
with open(ref_file, 'r') as f:
content = f.read()
assert long_content in content # Full content included
assert long_code in content # Full code included
assert '[Content truncated]' not in content
assert '...' not in content or content.count('...') == 0
Step 2: Run test to verify it fails
Run: source .venv/bin/activate && pytest tests/test_integration.py::test_no_content_truncation -v
Expected: FAIL - content contains "[Content truncated]" or "..."
Step 3: Remove truncation from create_reference_file()
# cli/doc_scraper.py (modify create_reference_file method, lines 712-731)
# OLD (line 714-716):
# if page.get('content'):
# content = page['content'][:2500]
# if len(page['content']) > 2500:
# content += "\n\n*[Content truncated]*"
# NEW (replace with):
if page.get('content'):
content = page['content'] # NO TRUNCATION
lines.append(content)
lines.append("")
# OLD (line 728-730):
# lines.append(code[:600])
# if len(code) > 600:
# lines.append("...")
# NEW (replace with):
lines.append(code) # NO TRUNCATION
# No "..." suffix
Complete replacement of lines 712-731:
# cli/doc_scraper.py:712-731 (complete replacement)
# Content (NO TRUNCATION)
if page.get('content'):
lines.append(page['content'])
lines.append("")
# Code examples with language (NO TRUNCATION)
if page.get('code_samples'):
lines.append("**Examples:**\n")
for i, sample in enumerate(page['code_samples'][:4], 1):
lang = sample.get('language', 'unknown')
code = sample.get('code', sample if isinstance(sample, str) else '')
lines.append(f"Example {i} ({lang}):")
lines.append(f"```{lang}")
lines.append(code) # Full code, no truncation
lines.append("```\n")
Step 4: Run test to verify it passes
Run: source .venv/bin/activate && pytest tests/test_integration.py::test_no_content_truncation -v
Expected: PASS
Step 5: Run full test suite to check for regressions
Run: source .venv/bin/activate && pytest tests/ -v
Expected: All 201+ tests pass
Step 6: Commit
git add cli/doc_scraper.py tests/test_integration.py
git commit -m "feat: remove content truncation in reference files"
Files:
docs/plans/2025-10-24-active-skills-design.mdCHANGELOG.mdStep 1: Update design doc status
# docs/plans/2025-10-24-active-skills-design.md (update header)
**Status:** Phase 1 Implemented ✅
Step 2: Add CHANGELOG entry
# CHANGELOG.md (add new section at top)
## [Unreleased]
### Added - Phase 1: Active Skills Foundation
- Multi-variant llms.txt detection: downloads all 3 variants (full, standard, small)
- Automatic .txt → .md file extension conversion
- No content truncation: preserves complete documentation
- `detect_all()` method for finding all llms.txt variants
- `get_proper_filename()` for correct .md naming
### Changed
- `_try_llms_txt()` now downloads all available variants instead of just one
- Reference files now contain complete content (no 2500 char limit)
- Code samples now include full code (no 600 char limit)
### Fixed
- File extension bug: llms.txt files now saved as .md
- Content loss: 0% truncation (was 36%)
Step 3: Commit
git add docs/plans/2025-10-24-active-skills-design.md CHANGELOG.md
git commit -m "docs: update status for Phase 1 completion"
Files:
Step 1: Test with Hono config
Run: source .venv/bin/activate && python3 cli/doc_scraper.py --config configs/hono.json
Expected output:
🔍 Checking for llms.txt at https://hono.dev/docs...
📌 Using explicit llms_txt_url from config: https://hono.dev/llms-full.txt
💾 Saved llms-full.md (319000 chars)
📄 Parsing llms-full.md for skill building...
✓ Parsed 93 sections
✅ Used llms.txt (explicit) - skipping HTML scraping
Step 2: Verify all 3 files exist with correct extensions
Run: ls -lah output/hono/references/llms*.md
Expected:
llms-full.md 319k
llms.md 5.4k
llms-small.md 176k
Step 3: Verify no truncation in reference files
Run: grep -c "Content truncated" output/hono/references/*.md
Expected: 0 matches (no truncation messages)
Step 4: Check file sizes are correct
Run: wc -c output/hono/references/llms-full.md
Expected: Should match original download size (~319k), not reduced to 203k
Step 5: Verify all tests still pass
Run: source .venv/bin/activate && pytest tests/ -v
Expected: All tests pass (201+)
Technical:
User Experience:
docs/plans/2025-10-24-active-skills-design.mdTotal: ~1.5 hours