# Active Skills Phase 1: Foundation Implementation Plan > **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. **Goal:** Fix fundamental issues in llms.txt handling: rename .txt→.md, download all 3 variants, remove truncation. **Architecture:** Modify existing llms.txt download/parse/build workflow to handle multiple variants correctly, store with proper extensions, and preserve complete content without truncation. **Tech Stack:** Python 3.10+, requests, BeautifulSoup4, existing Skill_Seekers architecture --- ## Task 1: Add Multi-Variant Detection **Files:** - Modify: `cli/llms_txt_detector.py` - Test: `tests/test_llms_txt_detector.py` **Step 1: Write failing test for detect_all() method** ```python # tests/test_llms_txt_detector.py (add new test) def test_detect_all_variants(): """Test detecting all llms.txt variants""" from unittest.mock import patch, Mock detector = LlmsTxtDetector("https://hono.dev/docs") with patch('cli.llms_txt_detector.requests.head') as mock_head: # Mock responses for different variants def mock_response(url, **kwargs): response = Mock() # All 3 variants exist for Hono if 'llms-full.txt' in url or 'llms.txt' in url or 'llms-small.txt' in url: response.status_code = 200 else: response.status_code = 404 return response mock_head.side_effect = mock_response variants = detector.detect_all() assert len(variants) == 3 assert any(v['variant'] == 'full' for v in variants) assert any(v['variant'] == 'standard' for v in variants) assert any(v['variant'] == 'small' for v in variants) assert all('url' in v for v in variants) ``` **Step 2: Run test to verify it fails** Run: `source .venv/bin/activate && pytest tests/test_llms_txt_detector.py::test_detect_all_variants -v` Expected: FAIL with "AttributeError: 'LlmsTxtDetector' object has no attribute 'detect_all'" **Step 3: Implement detect_all() method** ```python # cli/llms_txt_detector.py (add new method) def detect_all(self) -> List[Dict[str, str]]: """ Detect all available llms.txt variants. Returns: List of dicts with 'url' and 'variant' keys for each found variant """ found_variants = [] for filename, variant in self.VARIANTS: parsed = urlparse(self.base_url) root_url = f"{parsed.scheme}://{parsed.netloc}" url = f"{root_url}/{filename}" if self._check_url_exists(url): found_variants.append({ 'url': url, 'variant': variant }) return found_variants ``` **Step 4: Add import for List and Dict at top of file** ```python # cli/llms_txt_detector.py (add to imports) from typing import Optional, Dict, List ``` **Step 5: Run test to verify it passes** Run: `source .venv/bin/activate && pytest tests/test_llms_txt_detector.py::test_detect_all_variants -v` Expected: PASS **Step 6: Commit** ```bash git add cli/llms_txt_detector.py tests/test_llms_txt_detector.py git commit -m "feat: add detect_all() for multi-variant detection" ``` --- ## Task 2: Add File Extension Renaming to Downloader **Files:** - Modify: `cli/llms_txt_downloader.py` - Test: `tests/test_llms_txt_downloader.py` **Step 1: Write failing test for get_proper_filename() method** ```python # tests/test_llms_txt_downloader.py (add new test) def test_get_proper_filename(): """Test filename conversion from .txt to .md""" downloader = LlmsTxtDownloader("https://hono.dev/llms-full.txt") filename = downloader.get_proper_filename() assert filename == "llms-full.md" assert not filename.endswith('.txt') def test_get_proper_filename_standard(): """Test standard variant naming""" downloader = LlmsTxtDownloader("https://hono.dev/llms.txt") filename = downloader.get_proper_filename() assert filename == "llms.md" def test_get_proper_filename_small(): """Test small variant naming""" downloader = LlmsTxtDownloader("https://hono.dev/llms-small.txt") filename = downloader.get_proper_filename() assert filename == "llms-small.md" ``` **Step 2: Run test to verify it fails** Run: `source .venv/bin/activate && pytest tests/test_llms_txt_downloader.py::test_get_proper_filename -v` Expected: FAIL with "AttributeError: 'LlmsTxtDownloader' object has no attribute 'get_proper_filename'" **Step 3: Implement get_proper_filename() method** ```python # cli/llms_txt_downloader.py (add new method) def get_proper_filename(self) -> str: """ Extract filename from URL and convert .txt to .md Returns: Proper filename with .md extension Examples: https://hono.dev/llms-full.txt -> llms-full.md https://hono.dev/llms.txt -> llms.md https://hono.dev/llms-small.txt -> llms-small.md """ # Extract filename from URL from urllib.parse import urlparse parsed = urlparse(self.url) filename = parsed.path.split('/')[-1] # Replace .txt with .md if filename.endswith('.txt'): filename = filename[:-4] + '.md' return filename ``` **Step 4: Run test to verify it passes** Run: `source .venv/bin/activate && pytest tests/test_llms_txt_downloader.py::test_get_proper_filename -v` Expected: PASS (all 3 tests) **Step 5: Commit** ```bash git add cli/llms_txt_downloader.py tests/test_llms_txt_downloader.py git commit -m "feat: add get_proper_filename() for .txt to .md conversion" ``` --- ## Task 3: Update _try_llms_txt() to Download All Variants **Files:** - Modify: `cli/doc_scraper.py:337-384` (_try_llms_txt method) - Test: `tests/test_integration.py` **Step 1: Write failing test for multi-variant download** ```python # tests/test_integration.py (add to TestFullLlmsTxtWorkflow class) def test_multi_variant_download(self): """Test downloading all 3 llms.txt variants""" from unittest.mock import patch, Mock import tempfile import os config = { 'name': 'test-multi-variant', 'base_url': 'https://hono.dev/docs' } # Mock all 3 variants sample_full = "# Full\n" + "x" * 1000 sample_standard = "# Standard\n" + "x" * 200 sample_small = "# Small\n" + "x" * 500 with tempfile.TemporaryDirectory() as tmpdir: with patch('cli.llms_txt_detector.requests.head') as mock_head, \ patch('cli.llms_txt_downloader.requests.get') as mock_get: # Mock detection (all exist) mock_head_response = Mock() mock_head_response.status_code = 200 mock_head.return_value = mock_head_response # Mock downloads def mock_download(url, **kwargs): response = Mock() response.status_code = 200 if 'llms-full.txt' in url: response.text = sample_full elif 'llms-small.txt' in url: response.text = sample_small else: # llms.txt response.text = sample_standard return response mock_get.side_effect = mock_download # Run scraper scraper = DocumentationScraper(config, dry_run=False) result = scraper._try_llms_txt() # Verify all 3 files created refs_dir = os.path.join(scraper.skill_dir, 'references') assert os.path.exists(os.path.join(refs_dir, 'llms-full.md')) assert os.path.exists(os.path.join(refs_dir, 'llms.md')) assert os.path.exists(os.path.join(refs_dir, 'llms-small.md')) # Verify content not truncated with open(os.path.join(refs_dir, 'llms-full.md')) as f: content = f.read() assert len(content) == len(sample_full) ``` **Step 2: Run test to verify it fails** Run: `source .venv/bin/activate && pytest tests/test_integration.py::TestFullLlmsTxtWorkflow::test_multi_variant_download -v` Expected: FAIL - only one file created, not all 3 **Step 3: Modify _try_llms_txt() to use detect_all()** ```python # cli/doc_scraper.py (replace _try_llms_txt method, lines 337-384) def _try_llms_txt(self) -> bool: """ Try to use llms.txt instead of HTML scraping. Downloads ALL available variants and stores with .md extension. Returns: True if llms.txt was found and processed successfully """ print(f"\n🔍 Checking for llms.txt at {self.base_url}...") # Check for explicit config URL first explicit_url = self.config.get('llms_txt_url') if explicit_url: print(f"\n📌 Using explicit llms_txt_url from config: {explicit_url}") downloader = LlmsTxtDownloader(explicit_url) content = downloader.download() if content: # Save with proper .md extension filename = downloader.get_proper_filename() filepath = os.path.join(self.skill_dir, "references", filename) os.makedirs(os.path.dirname(filepath), exist_ok=True) with open(filepath, 'w', encoding='utf-8') as f: f.write(content) print(f" 💾 Saved {filename} ({len(content)} chars)") # Parse and save pages parser = LlmsTxtParser(content) pages = parser.parse() if pages: for page in pages: self.save_page(page) self.pages.append(page) self.llms_txt_detected = True self.llms_txt_variant = 'explicit' return True # Auto-detection: Find ALL variants detector = LlmsTxtDetector(self.base_url) variants = detector.detect_all() if not variants: print("ℹ️ No llms.txt found, using HTML scraping") return False print(f"✅ Found {len(variants)} llms.txt variant(s)") # Download ALL variants downloaded = {} for variant_info in variants: url = variant_info['url'] variant = variant_info['variant'] print(f" 📥 Downloading {variant}...") downloader = LlmsTxtDownloader(url) content = downloader.download() if content: filename = downloader.get_proper_filename() downloaded[variant] = { 'content': content, 'filename': filename, 'size': len(content) } print(f" ✓ {filename} ({len(content)} chars)") if not downloaded: print("⚠️ Failed to download any variants, falling back to HTML scraping") return False # Save ALL variants to references/ os.makedirs(os.path.join(self.skill_dir, "references"), exist_ok=True) for variant, data in downloaded.items(): filepath = os.path.join(self.skill_dir, "references", data['filename']) with open(filepath, 'w', encoding='utf-8') as f: f.write(data['content']) print(f" 💾 Saved {data['filename']}") # Parse LARGEST variant for skill building largest = max(downloaded.items(), key=lambda x: x[1]['size']) print(f"\n📄 Parsing {largest[1]['filename']} for skill building...") parser = LlmsTxtParser(largest[1]['content']) pages = parser.parse() if not pages: print("⚠️ Failed to parse llms.txt, falling back to HTML scraping") return False print(f" ✓ Parsed {len(pages)} sections") # Save pages for skill building for page in pages: self.save_page(page) self.pages.append(page) self.llms_txt_detected = True self.llms_txt_variants = list(downloaded.keys()) return True ``` **Step 4: Add llms_txt_variants attribute to __init__** ```python # cli/doc_scraper.py (in __init__ method, after llms_txt_variant line) self.llms_txt_variants = [] # Track all downloaded variants ``` **Step 5: Run test to verify it passes** Run: `source .venv/bin/activate && pytest tests/test_integration.py::TestFullLlmsTxtWorkflow::test_multi_variant_download -v` Expected: PASS **Step 6: Commit** ```bash git add cli/doc_scraper.py tests/test_integration.py git commit -m "feat: download all llms.txt variants with proper .md extension" ``` --- ## Task 4: Remove Content Truncation **Files:** - Modify: `cli/doc_scraper.py:714-730` (create_reference_file method) **Step 1: Write failing test for no truncation** ```python # tests/test_integration.py (add new test) def test_no_content_truncation(): """Test that content is NOT truncated in reference files""" from unittest.mock import Mock import tempfile import os config = { 'name': 'test-no-truncate', 'base_url': 'https://example.com/docs' } # Create scraper with long content scraper = DocumentationScraper(config, dry_run=False) # Create page with content > 2500 chars long_content = "x" * 5000 long_code = "y" * 1000 pages = [{ 'title': 'Long Page', 'url': 'https://example.com/long', 'content': long_content, 'code_samples': [ {'code': long_code, 'language': 'python'} ], 'headings': [] }] # Create reference file scraper.create_reference_file('test', pages) # Verify no truncation ref_file = os.path.join(scraper.skill_dir, 'references', 'test.md') with open(ref_file, 'r') as f: content = f.read() assert long_content in content # Full content included assert long_code in content # Full code included assert '[Content truncated]' not in content assert '...' not in content or content.count('...') == 0 ``` **Step 2: Run test to verify it fails** Run: `source .venv/bin/activate && pytest tests/test_integration.py::test_no_content_truncation -v` Expected: FAIL - content contains "[Content truncated]" or "..." **Step 3: Remove truncation from create_reference_file()** ```python # cli/doc_scraper.py (modify create_reference_file method, lines 712-731) # OLD (line 714-716): # if page.get('content'): # content = page['content'][:2500] # if len(page['content']) > 2500: # content += "\n\n*[Content truncated]*" # NEW (replace with): if page.get('content'): content = page['content'] # NO TRUNCATION lines.append(content) lines.append("") # OLD (line 728-730): # lines.append(code[:600]) # if len(code) > 600: # lines.append("...") # NEW (replace with): lines.append(code) # NO TRUNCATION # No "..." suffix ``` **Complete replacement of lines 712-731:** ```python # cli/doc_scraper.py:712-731 (complete replacement) # Content (NO TRUNCATION) if page.get('content'): lines.append(page['content']) lines.append("") # Code examples with language (NO TRUNCATION) if page.get('code_samples'): lines.append("**Examples:**\n") for i, sample in enumerate(page['code_samples'][:4], 1): lang = sample.get('language', 'unknown') code = sample.get('code', sample if isinstance(sample, str) else '') lines.append(f"Example {i} ({lang}):") lines.append(f"```{lang}") lines.append(code) # Full code, no truncation lines.append("```\n") ``` **Step 4: Run test to verify it passes** Run: `source .venv/bin/activate && pytest tests/test_integration.py::test_no_content_truncation -v` Expected: PASS **Step 5: Run full test suite to check for regressions** Run: `source .venv/bin/activate && pytest tests/ -v` Expected: All 201+ tests pass **Step 6: Commit** ```bash git add cli/doc_scraper.py tests/test_integration.py git commit -m "feat: remove content truncation in reference files" ``` --- ## Task 5: Update Documentation **Files:** - Modify: `docs/plans/2025-10-24-active-skills-design.md` - Modify: `CHANGELOG.md` **Step 1: Update design doc status** ```markdown # docs/plans/2025-10-24-active-skills-design.md (update header) **Status:** Phase 1 Implemented ✅ ``` **Step 2: Add CHANGELOG entry** ```markdown # CHANGELOG.md (add new section at top) ## [Unreleased] ### Added - Phase 1: Active Skills Foundation - Multi-variant llms.txt detection: downloads all 3 variants (full, standard, small) - Automatic .txt → .md file extension conversion - No content truncation: preserves complete documentation - `detect_all()` method for finding all llms.txt variants - `get_proper_filename()` for correct .md naming ### Changed - `_try_llms_txt()` now downloads all available variants instead of just one - Reference files now contain complete content (no 2500 char limit) - Code samples now include full code (no 600 char limit) ### Fixed - File extension bug: llms.txt files now saved as .md - Content loss: 0% truncation (was 36%) ``` **Step 3: Commit** ```bash git add docs/plans/2025-10-24-active-skills-design.md CHANGELOG.md git commit -m "docs: update status for Phase 1 completion" ``` --- ## Task 6: Manual Verification **Files:** - None (manual testing) **Step 1: Test with Hono config** Run: `source .venv/bin/activate && python3 cli/doc_scraper.py --config configs/hono.json` **Expected output:** ``` 🔍 Checking for llms.txt at https://hono.dev/docs... 📌 Using explicit llms_txt_url from config: https://hono.dev/llms-full.txt 💾 Saved llms-full.md (319000 chars) 📄 Parsing llms-full.md for skill building... ✓ Parsed 93 sections ✅ Used llms.txt (explicit) - skipping HTML scraping ``` **Step 2: Verify all 3 files exist with correct extensions** Run: `ls -lah output/hono/references/llms*.md` Expected: ``` llms-full.md 319k llms.md 5.4k llms-small.md 176k ``` **Step 3: Verify no truncation in reference files** Run: `grep -c "Content truncated" output/hono/references/*.md` Expected: 0 matches (no truncation messages) **Step 4: Check file sizes are correct** Run: `wc -c output/hono/references/llms-full.md` Expected: Should match original download size (~319k), not reduced to 203k **Step 5: Verify all tests still pass** Run: `source .venv/bin/activate && pytest tests/ -v` Expected: All tests pass (201+) --- ## Completion Checklist - [ ] Task 1: Multi-variant detection (detect_all) - [ ] Task 2: File extension renaming (get_proper_filename) - [ ] Task 3: Download all variants (_try_llms_txt) - [ ] Task 4: Remove truncation (create_reference_file) - [ ] Task 5: Update documentation - [ ] Task 6: Manual verification - [ ] All tests passing - [ ] No regressions in existing functionality --- ## Success Criteria **Technical:** - ✅ All 3 variants downloaded when available - ✅ Files saved with .md extension (not .txt) - ✅ 0% content truncation (was 36%) - ✅ All existing tests pass - ✅ New tests cover all changes **User Experience:** - ✅ Hono skill has all 3 files: llms-full.md, llms.md, llms-small.md - ✅ Reference files contain complete documentation - ✅ No "[Content truncated]" messages in output --- ## Related Skills - @superpowers:test-driven-development - Used throughout for TDD approach - @superpowers:verification-before-completion - Used in Task 6 for manual verification --- ## Notes - This plan implements Phase 1 from `docs/plans/2025-10-24-active-skills-design.md` - Phase 2 (Catalog System) and Phase 3 (Active Scripts) will be separate plans - All changes maintain backward compatibility with existing HTML scraping - File extension fix (.txt → .md) is critical for proper skill functionality --- ## Estimated Time - Task 1: 15 minutes - Task 2: 15 minutes - Task 3: 30 minutes - Task 4: 20 minutes - Task 5: 10 minutes - Task 6: 15 minutes **Total: ~1.5 hours**