🔍 Comprehensive Test Audit Report

docs-engine | Generated: 2025-11-11

═══════════════════════════════════════════════════════════════════════════════

📊 Executive Summary

Current Status: 🔄 Below Coverage Threshold

Total Test Files:       16
Total Test Cases:       328
Test Lines of Code:     3,596
Source Files:           70
Coverage:               23.38% ❌ (Target: 40%)
All Tests Passing:      ✅ 327/328 passing

Coverage Breakdown:
  Lines:                23.38% ❌ (Target: 40%)
  Functions:            29.85% ❌ (Target: 40%)
  Branches:             18.11% ❌ (Target: 40%)
  Statements:           23.33% ❌ (Target: 40%)

Key Findings: • 40/70 source files (57%) have NO test coverage • 96 lines of duplicate test code identified (can be reduced to ~15 lines) • Tests that exist are generally high quality with good edge case coverage • No integration tests, E2E tests, or CI/CD pipeline detected • Security tests present (XSS, sanitization) - strong foundation

Overall Grade: C+ (Good foundation, needs expansion)

🎯 Critical Issues Requiring Immediate Action

1. Coverage Below Threshold ⚠️

Impact: HIGH | Effort: HIGH

Current:  23.38%
Target:   40%
Gap:      -16.62%

Recommendation: Prioritize adding tests for high-risk untested modules

2. Massive Test Duplication 🔴

Impact: MEDIUM | Effort: LOW

Location:  src/lib/plugins/callouts.test.ts (lines 49-145)
Issue:     9 nearly identical tests (96 lines)
Solution:  Parameterize with test.each() (reduce to ~15 lines)
Savings:   ~80 lines of code

3. Missing Critical Module Tests ⚠️

Impact: HIGH | Effort: HIGH

Untested High-Risk Modules:
• reference.ts         (287 lines) - Symbol reference plugin
• screenshot-service.ts (514 lines) - Screenshot capture service
• image-processor.ts   (318 lines) - Image processing pipeline
• circuit-breaker.ts   (217 lines) - Fault tolerance
• collapse.ts          (190 lines) - Collapsible sections
• toc.ts              (184 lines) - Table of contents generation

4. No CI/CD Integration ⚠️

Impact: MEDIUM | Effort: LOW

Issue:  No automated testing on push/PR
Risk:   Breaking changes can reach main branch undetected
Action: Add GitHub Actions workflow (see recommendations)

📋 Test Quality Assessment by Module

✅ EXCELLENT QUALITY (Keep as-is)

sanitize.test.ts

Rating:     ⭐⭐⭐⭐⭐ (5/5)
Coverage:   100%
Tests:      24
Strengths:  • Comprehensive XSS attack scenarios
            • Tests all sanitization contexts (HTML/SVG/Error)
            • Good security edge cases (protocol injection, event handlers)
            • Clear, focused tests
Gaps:       None - exemplary test suite

markdown.test.ts

Rating:     ⭐⭐⭐⭐ (4/5)
Coverage:   100%
Tests:      38
Strengths:  • Thorough edge case coverage
            • Tests all markdown utility functions
            • Good special character handling
Issues:     • Tests implementation details (timestamps, blank lines)
            • Some redundant tests (empty headers + empty rows)
Action:     Remove 5 redundant tests (lines 101-110, 112-115, 196-199, 220-224)

frontmatter.test.ts

Rating:     ⭐⭐⭐⭐ (4/5)
Coverage:   100%
Tests:      21
Strengths:  • Good YAML parsing edge cases
            • Tests title extraction hierarchy
            • Malformed input handling
Issues:     • Weak assertions (toBeDefined() instead of exact values)
            • Missing tests for multiple delimiters
Action:     Strengthen assertions, add 3 edge case tests

⚠️ GOOD QUALITY (Minor improvements needed)

symbol-resolver.test.ts

Rating:     ⭐⭐⭐⭐ (4/5)
Coverage:   84.94%
Tests:      21
Strengths:  • Tests all 6 symbol kinds
            • Ambiguity detection with suggestions
            • Path hint filtering
Gaps:       • Missing case sensitivity tests
            • Missing glob pattern tests
Action:     Add 3-5 tests for missing edge cases

rate-limiter.test.ts

Rating:     ⭐⭐⭐ (3/5)
Coverage:   66.66%
Tests:      45
Strengths:  • Good use of fake timers
            • Tests sliding windows
            • Per-user isolation
Issues:     • Redundant tests (concurrent + exceeding limit test same thing)
            • Tests implementation details (counting mechanism)
            • Unrealistic edge cases (100ms windows, empty identifiers)
Action:     Remove 4 redundant tests, add input validation tests

image-optimization.test.ts

Rating:     ⭐⭐⭐⭐ (4/5)
Coverage:   100%
Tests:      8
Strengths:  • Tests external URL handling
            • Configuration options tested
            • Base64 encoding verified
Gaps:       • Missing error handling tests (corrupt images)
            • Missing performance tests (very large images)
Action:     Add 5-7 error handling tests

🔴 NEEDS IMPROVEMENT (Refactoring required)

callouts.test.ts

Rating:     ⭐⭐ (2/5)
Coverage:   32.53%
Tests:      13
Critical:   🔴 MASSIVE CODE DUPLICATION
Issue:      Lines 49-145 contain 9 identical tests (96 lines)
            Only difference is callout type (NOTE, TIP, WARNING, etc.)

Before:
  test('should transform NOTE callout', () => { /* 10 lines */ });
  test('should transform TIP callout', () => { /* 10 lines */ });
  test('should transform WARNING callout', () => { /* 10 lines */ });
  ... 6 more identical tests ...

After (recommended):
  const types = [
    { type: 'NOTE', class: 'blue', title: 'Note' },
    { type: 'TIP', class: 'green', title: 'Tip' },
    // ...
  ];
  test.each(types)('should transform $type callout', ({ type, class, title }) => {
    // Single test implementation
  });

Savings:    96 lines → ~15 lines (81 lines saved)
Gaps:       • No tests for nested blockquotes
            • No tests for markdown in callout content
            • No case sensitivity tests
Action:     Refactor to parameterized tests, add 5 edge case tests

links.test.ts

Rating:     ⭐⭐ (2/5)
Coverage:   89.79%
Tests:      23
Critical:   🔴 DUPLICATE TOP-LEVEL FILE TESTS
Issue:      Lines 65-133 contain 7 tests that follow identical pattern
            (README, LICENSE, CONTRIBUTING, CHANGELOG, etc.)

Action:     Parameterize with test.each(), add query param tests
Savings:    ~40 lines of code
Gaps:       • No tests for URLs with query parameters
            • No tests for .MD vs .md case sensitivity
            • No tests for malformed URLs

code-highlight.test.ts

Rating:     ⭐⭐ (2/5)
Coverage:   83.33%
Tests:      13
Issues:     • Tests TypeScript interfaces at runtime (useless)
            • Tests plugin API shape (TypeScript's job)
            • Long justification comments (code smell)
            • Type casting everywhere (unsafe)
Action:     Remove 2 useless tests, improve assertions, add HTML injection tests
Savings:    ~30 lines of code

file-io.test.ts

Rating:     ⭐⭐ (2/5)
Coverage:   90%
Tests:      32
Issues:     • Duplicate UTF-8 tests (lines 54-62 and 106-114)
            • Circular dependencies (use writeFile to test readFile)
            • Over-granular countLines testing (8 tests for simple utility)
            • Tests implementation details (JSON indentation spaces)
Gaps:       • No permission error tests
            • No disk space error tests
            • No large file tests
            • No concurrent access tests
Action:     Remove 6 redundant tests, add 4 error handling tests
Savings:    ~80 lines of code

🚫 Tests to Remove or Consolidate

High Priority Removals (No Value)

1. TypeScript Type Tests (Remove entirely)

Location: src/lib/plugins/code-highlight.test.ts:256-265
Reason:   Tests TypeScript interface at runtime - TypeScript already validates this
Impact:   No loss of test coverage

2. API Shape Tests (Remove entirely)

Location: src/lib/plugins/code-highlight.test.ts:18-31
Reason:   Tests plugin API structure - no runtime value
Impact:   No loss of test coverage

3. Duplicate UTF-8 Tests (Consolidate to 1)

Location: src/lib/utils/file-io.test.ts:54-62 and 106-114
Reason:   Both test identical UTF-8 functionality
Action:   Keep one, remove the other
Savings:  8 lines

4. Circular Dependency Tests (Refactor)

Location: src/lib/utils/file-io.test.ts (multiple)
Reason:   Use writeFile to test readFile and vice versa
Action:   Use test fixtures instead
Risk:     Current approach could hide bugs in both functions

Medium Priority Consolidations

5. Callouts Type Tests (Parameterize)

Location: src/lib/plugins/callouts.test.ts:49-145
Before:   9 identical tests (96 lines)
After:    1 parameterized test (~15 lines)
Savings:  81 lines (84% reduction)

6. Top-Level File Links (Parameterize)

Location: src/lib/plugins/links.test.ts:65-133
Before:   7 similar tests (~40 lines)
After:    1 parameterized test (~8 lines)
Savings:  32 lines (80% reduction)

7. External Link Protocol Tests (Parameterize)

Location: src/lib/plugins/links.test.ts (multiple)
Before:   5 tests for different protocols (http, https, ftp, mailto, data)
After:    1 parameterized test
Savings:  ~20 lines

8. Redundant Rate Limiter Tests (Remove)

Location: src/lib/server/rate-limiter.test.ts
Remove:   • Lines 89-101: "concurrent requests" (duplicate of "block exceeding")
          • Lines 117-122: "empty identifier" (unrealistic edge case)
          • Lines 124-132: "100ms windows" (unrealistic edge case)
Savings:  ~30 lines

9. Over-Granular countLines Tests (Consolidate)

Location: src/lib/utils/file-io.test.ts:266-318
Before:   8 tests for simple line counting utility
After:    3-4 meaningful tests (empty, single, multiple, edge cases)
Savings:  ~30 lines

Total Potential Savings: ~273 lines of test code Total Potential Reduction: ~7.6% of test suite size

🔴 Critical Coverage Gaps (Priority Order)

Tier 1: High-Risk Untested Modules (Add immediately)

1. reference.ts (287 lines) - 0% coverage

Risk Level:   🔴 CRITICAL
Functionality: Symbol reference resolution and transformation
Why Critical:  • Core feature for documentation linking
               • Complex path resolution logic
               • Error-prone symbol matching
Tests Needed:  ~15-20 tests
Estimated LOC: ~300 lines
Priority:      1

2. screenshot-service.ts (514 lines) - 0% coverage

Risk Level:   🔴 CRITICAL
Functionality: Screenshot capture with Playwright
Why Critical:  • External process management
               • File I/O operations
               • Resource cleanup critical
               • Timeout handling
Tests Needed:  ~20-25 tests (mostly integration tests)
Estimated LOC: ~400 lines
Priority:      2
Note:          Consider separating unit tests from integration tests

3. circuit-breaker.ts (217 lines) - 0% coverage

Risk Level:   🔴 CRITICAL
Functionality: Fault tolerance and service protection
Why Critical:  • Complex state management
               • Time-based logic
               • Critical for production resilience
Tests Needed:  ~15-18 tests
Estimated LOC: ~250 lines
Priority:      3

4. image-processor.ts (318 lines) - 0% coverage

Risk Level:   🔴 CRITICAL
Functionality: Image optimization pipeline (Sharp)
Why Critical:  • File system operations
               • External library integration (Sharp)
               • Performance-sensitive
               • Error handling for corrupt images
Tests Needed:  ~18-22 tests
Estimated LOC: ~350 lines
Priority:      4

Tier 2: Important Feature Gaps (Add within 2 weeks)

5. collapse.ts (190 lines) - 0% coverage

Risk Level:   🟡 HIGH
Functionality: Collapsible section transformation
Tests Needed:  ~12-15 tests
Estimated LOC: ~200 lines
Priority:      5

6. toc.ts (184 lines) - 0% coverage

Risk Level:   🟡 HIGH
Functionality: Table of contents generation
Tests Needed:  ~12-15 tests
Estimated LOC: ~180 lines
Priority:      6

7. tabs.ts (135 lines) - 0% coverage

Risk Level:   🟡 HIGH
Functionality: Tab container transformation
Tests Needed:  ~10-12 tests
Estimated LOC: ~150 lines
Priority:      7

8. mermaid.ts (33 lines) - 0% coverage

Risk Level:   🟡 MEDIUM
Functionality: Mermaid diagram integration
Tests Needed:  ~6-8 tests
Estimated LOC: ~80 lines
Priority:      8

9. filetree.ts (45 lines) - 0% coverage

Risk Level:   🟡 MEDIUM
Functionality: File tree visualization
Tests Needed:  ~6-8 tests
Estimated LOC: ~70 lines
Priority:      9

Tier 3: Utility and Support Functions

10. symbol-generation.ts (851 lines) - 0% coverage

Risk Level:   🟡 HIGH (due to size)
Functionality: Symbol generation from source code
Tests Needed:  ~25-30 tests
Estimated LOC: ~500 lines
Priority:      10

11. tree-parser.ts (208 lines) - 0% coverage

Risk Level:   🟢 MEDIUM
Functionality: AST tree parsing utilities
Tests Needed:  ~12-15 tests
Estimated LOC: ~180 lines
Priority:      11

12. navigation-builder.ts + navigation-scanner.ts (374 lines combined) - 0% coverage

Risk Level:   🟢 MEDIUM
Functionality: Navigation structure building
Tests Needed:  ~15-18 tests combined
Estimated LOC: ~250 lines
Priority:      12

13. search.ts (163 lines) - 0% coverage

Risk Level:   🟢 MEDIUM
Functionality: Search implementation
Tests Needed:  ~10-12 tests
Estimated LOC: ~150 lines
Priority:      13
Note:          search-index.ts has good coverage, but actual search.ts doesn't

Tier 4: Low-Risk Utilities (Add as time permits)

• cli-executor.ts (80 lines) - 0% coverage
• logger.ts (64 lines) - 0% coverage
• highlighter.ts (74 lines) - 0% coverage
• base64.ts (95 lines) - 12% coverage (needs more)
• openapi-formatter.ts (363 lines) - 0% coverage
• symbol-renderer.ts (277 lines) - 0% coverage
• version.ts (20 lines) - 0% coverage
• html.ts (87 lines) - 42.85% coverage (needs more)
• date.ts - 100% coverage ✅

Estimated Total Effort:

New Tests Needed:     ~220-270 tests
New Test LOC:         ~4,000-5,000 lines
Time Estimate:        3-4 weeks (1 developer)
Coverage Increase:    23% → 60-70%

🧪 Missing Test Types

1. Integration Tests ❌

Status: Not present

Current:  Only unit tests exist
Need:     • Plugin pipeline integration
          • File I/O + Markdown processing flow
          • Symbol resolution + generation flow
          • Image optimization pipeline
          • Search indexing + querying flow

Example Integration Test:
  test('full markdown processing pipeline', async () => {
    const markdown = '# Title\n\n[!NOTE] Callout\n\n```js\ncode\n```';
    const result = await processMarkdown(markdown, allPlugins);
    expect(result.html).toMatchSnapshot();
    expect(result.metadata).toEqual({ ... });
  });

Estimated:    15-20 integration tests needed
LOC:          ~800-1000 lines
Priority:     HIGH

2. End-to-End Tests ❌

Status: Not present

Current:  No E2E tests
Need:     • Full documentation site generation
          • Search functionality E2E
          • Navigation generation E2E
          • Image optimization E2E

Tools:    Playwright (already in devDependencies ✅)
Example:  Generate full site → verify pages → test navigation → test search

Estimated:    10-15 E2E tests needed
LOC:          ~600-800 lines
Priority:     MEDIUM
Note:         Playwright already installed, just needs test setup

3. Performance Tests ❌

Status: Not present

Current:  No performance benchmarks
Need:     • Large file processing (10MB+ markdown files)
          • Thousands of symbols performance
          • Search index with 10k+ documents
          • Image batch processing (100+ images)

Example:
  test('should process large markdown file within 5s', async () => {
    const largeMarkdown = generateMarkdown(10000); // 10k lines
    const start = Date.now();
    await processMarkdown(largeMarkdown);
    expect(Date.now() - start).toBeLessThan(5000);
  });

Estimated:    8-12 performance tests
LOC:          ~300-400 lines
Priority:     LOW (add after coverage is above 60%)

4. Visual Regression Tests ❌

Status: Not present

Current:  No visual testing
Need:     • Callout rendering consistency
          • Code block styling
          • Table rendering
          • Symbol reference rendering

Tools:    Percy, Chromatic, or Playwright screenshots
Priority: LOW (nice-to-have)

5. Component Tests ❌

Status: Not present

Current:  No component tests (all Svelte components untested)
Files:    • All files in lib/components/ have 0% coverage
Need:     • Svelte component unit tests
          • @testing-library/svelte already installed ✅

Priority: MEDIUM (if components contain complex logic)

🎯 Test Best Practices Issues

❌ Current Issues

1. Testing Implementation Details

Location:  Multiple files
Examples:  • Checking for specific CSS class names
           • Testing JSON indentation (2 spaces vs 4)
           • Testing exact timestamp formats
           • Testing blank lines in output

Problem:   Tests break when implementation changes, even if behavior is correct
Fix:       Test behavior and outcomes, not internal implementation

2. Weak Assertions

Location:  Multiple files
Examples:  • expect(result).toBeDefined()
           • expect(result).toContain('shiki')
           • expect(count).toBeGreaterThan(0)

Problem:   Assertions are too vague to catch real bugs
Fix:       Use specific assertions:
           • expect(result).toEqual({ specific: 'value' })
           • expect(html).toMatch(/<pre class="shiki">/)
           • expect(count).toBe(expectedCount)

3. Magic Numbers and Strings

Location:  All test files
Examples:  • 60000 (milliseconds - what is this?)
           • 10 (rate limit - why 10?)
           • '/docs/' (hardcoded path)
           • 'shiki' (what if class name changes?)

Fix:       Extract to named constants:
           const ONE_MINUTE = 60_000;
           const RATE_LIMIT = 10;
           const DOCS_PATH_PREFIX = '/docs/';

4. Circular Test Dependencies

Location:  src/lib/utils/file-io.test.ts
Example:   Using writeFile() to test readFile()
           Using readFile() to test writeFile()

Problem:   If both functions have the same bug, tests will pass
Fix:       Use independent test fixtures or known good data

5. No Test Helpers or Fixtures

Current:   All mock data is inline in test files
Problem:   • Code duplication
           • Hard to maintain
           • Test data scattered everywhere

Fix:       Create test utilities:
           // tests/helpers/fixtures.ts
           export const mockMarkdownTree = () => ({ /* ... */ });
           export const mockFrontmatter = () => ({ /* ... */ });

6. Inconsistent Test Organization

Current:   Tests co-located with source files (*.test.ts)
Pros:      ✅ Easy to find related tests
           ✅ Encourages testing
Cons:      ❌ Harder to run only tests
           ❌ Clutters source directories

Recommendation: Keep current approach (co-location is good for this project size)

7. No Test Coverage Enforcement on CI

Current:   Coverage threshold set to 40% in config
           But no CI to enforce it ❌

Problem:   Coverage can drop without anyone noticing
Fix:       Add CI workflow with coverage checks (see recommendations)

✅ Things Done Well

1. Good Error Handling Tests

Examples:  • Malformed YAML gracefully handled
           • Invalid language in code blocks
           • Missing files in file-io
           • XSS attack prevention in sanitize

Keep:      Continue this pattern for all new tests

2. Security Testing

Tests:     • XSS prevention (sanitize.test.ts)
           • HTML injection
           • Protocol injection (javascript:, data:)
           • SVG attack vectors

Excellent: This is exemplary - expand to other modules

3. Good Use of Test Lifecycle Hooks

Pattern:   beforeEach() for setup
           afterEach() for cleanup
           Temporary directories properly cleaned up

Keep:      Continue this pattern consistently

4. Good Test Naming

Pattern:   "should [expected behavior]" convention
Examples:  • "should transform NOTE callout"
           • "should handle malformed YAML gracefully"
           • "should escape special characters"

Keep:      Maintain this clear, descriptive naming

🚀 CI/CD Recommendations

Current State: No CI/CD ❌

Missing: • No GitHub Actions workflows • No automated test runs on PR/push • No coverage reporting • No lint checks • No type checking in CI

Recommended GitHub Actions Workflow

Create: .github/workflows/ci.yml

name: CI

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    name: Test & Coverage
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - uses: pnpm/action-setup@v2
        with:
          version: 10

      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'pnpm'

      - name: Install dependencies
        run: pnpm install --frozen-lockfile

      - name: Run tests with coverage
        run: pnpm run test:coverage

      - name: Upload coverage to Codecov
        uses: codecov/codecov-action@v4
        with:
          file: ./coverage/lcov.info
          fail_ci_if_error: true

      - name: Check coverage threshold
        run: |
          if [ $(jq '.total.lines.pct' coverage/coverage-summary.json | cut -d. -f1) -lt 40 ]; then
            echo "Coverage is below 40% threshold"
            exit 1
          fi

  lint:
    name: Lint
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v2
        with:
          version: 10
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'pnpm'

      - run: pnpm install --frozen-lockfile
      - run: pnpm run lint

  typecheck:
    name: Type Check
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v2
        with:
          version: 10
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'pnpm'

      - run: pnpm install --frozen-lockfile
      - run: pnpm exec tsc --noEmit

Additional Workflows:

# .github/workflows/nightly.yml
# Run comprehensive tests nightly (including slow E2E tests)

name: Nightly Tests
on:
  schedule:
    - cron: '0 0 * * *'  # Midnight UTC
  workflow_dispatch:

jobs:
  e2e:
    name: E2E Tests
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v2
      - uses: actions/setup-node@v4
      - run: pnpm install --frozen-lockfile
      - run: pnpm exec playwright install --with-deps
      - run: pnpm run test:e2e  # When E2E tests exist

  performance:
    name: Performance Tests
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v2
      - uses: actions/setup-node@v4
      - run: pnpm install --frozen-lockfile
      - run: pnpm run test:perf  # When perf tests exist

🔧 Test Configuration Optimization

Current Vitest Config Analysis

File: vitest.config.ts

Current Settings:

{
  globals: true,
  environment: 'happy-dom',
  include: ['src/**/*.{test,spec}.{js,ts}'],
  coverage: {
    provider: 'v8',
    reporter: ['text', 'json', 'html', 'lcov'],
    include: ['src/lib/**/*.ts'],
    exclude: [/* standard exclusions */],
    thresholds: {
      lines: 40,
      functions: 40,
      branches: 40,
      statements: 40
    },
    all: true
  }
}

✅ Good Configurations: • Environment: happy-dom (lightweight, fast) ✅ • Coverage provider: v8 (fast, accurate) ✅ • Multiple reporters including lcov for CI ✅ • Threshold enforcement ✅ • all: true (measures all files, not just tested ones) ✅

🔧 Recommended Improvements:

// vitest.config.ts
import { defineConfig } from 'vitest/config';

export default defineConfig({
  test: {
    globals: true,
    environment: 'happy-dom',
    include: ['src/**/*.{test,spec}.{js,ts}'],

    // NEW: Separate test types
    exclude: [
      '**/node_modules/**',
      '**/dist/**',
      '**/.{idea,git,cache,output,temp}/**',
      '**/{e2e,perf}/**'  // Exclude from default run
    ],

    // NEW: Faster test runs
    pool: 'threads',
    poolOptions: {
      threads: {
        singleThread: false,
        useAtomics: true
      }
    },

    // NEW: Better error messages
    reporters: ['default', 'html'],
    outputFile: {
      html: './coverage/test-report.html'
    },

    // NEW: Fail fast for CI
    bail: process.env.CI ? 1 : undefined,

    // NEW: Timeouts
    testTimeout: 10000,
    hookTimeout: 10000,

    coverage: {
      provider: 'v8',
      reporter: ['text', 'json', 'html', 'lcov'],
      include: ['src/lib/**/*.ts'],
      exclude: [
        '**/node_modules/**',
        '**/dist/**',
        '**/*.d.ts',
        '**/*.config.*',
        '**/types.ts',
        '**/*.test.ts',
        '**/*.spec.ts',
        '**/*.svelte'  // Keep for now, may change later
      ],

      // CHANGED: Increase thresholds gradually
      thresholds: {
        lines: 40,      // Current: 23.38% → Target: 40%
        functions: 40,  // Current: 29.85% → Target: 40%
        branches: 40,   // Current: 18.11% → Target: 40%
        statements: 40  // Current: 23.33% → Target: 40%

        // Phase 2 targets (after adding high-risk tests):
        // lines: 60,
        // functions: 60,
        // branches: 50,
        // statements: 60
      },

      all: true,

      // NEW: Per-directory thresholds (optional)
      perFile: true,
      watermarks: {
        lines: [50, 80],
        functions: [50, 80],
        branches: [50, 80],
        statements: [50, 80]
      }
    }
  }
});

Add New Test Scripts to package.json:

{
  "scripts": {
    "test": "vitest",
    "test:ui": "vitest --ui",
    "test:run": "vitest run",
    "test:coverage": "vitest run --coverage",
    "test:watch": "vitest watch",

    // NEW: Separate test types
    "test:unit": "vitest run --config vitest.config.ts",
    "test:integration": "vitest run --config vitest.integration.config.ts",
    "test:e2e": "playwright test",
    "test:perf": "vitest run --config vitest.perf.config.ts",

    // NEW: CI-specific
    "test:ci": "vitest run --coverage --reporter=default --reporter=json",

    // NEW: Coverage helpers
    "coverage": "vitest run --coverage",
    "coverage:open": "open coverage/index.html",

    // NEW: Watch specific files
    "test:related": "vitest related"
  }
}

📊 Actionable Improvement Roadmap

Phase 1: Quick Wins (1-2 days)

Goal: Reduce technical debt, improve maintainability

✅ Tasks:
  ⬜ Refactor callouts.test.ts to use test.each() (1 hour)
     Impact: -81 lines, better maintainability

  ⬜ Refactor links.test.ts to parameterize tests (1 hour)
     Impact: -52 lines

  ⬜ Remove useless tests (TypeScript types, API shapes) (30 min)
     Impact: -20 lines, cleaner test suite

  ⬜ Add GitHub Actions CI workflow (2 hours)
     Impact: Automated testing on every PR

  ⬜ Set up Codecov or similar for coverage tracking (1 hour)
     Impact: Visualize coverage trends

Total Time: 1-2 days
Coverage Change: 0% (just cleanup)
Maintainability: ↑↑↑ High improvement

Phase 2: Critical Coverage (1-2 weeks)

Goal: Test high-risk untested modules

✅ Tasks:
  ⬜ Add tests for reference.ts (2 days)
     • 15-20 tests for symbol reference resolution
     • Edge cases for ambiguous references
     • Error handling tests
     Impact: +300 LOC, +~5% coverage

  ⬜ Add tests for circuit-breaker.ts (1 day)
     • State transition tests
     • Timeout handling
     • Failure threshold tests
     Impact: +250 LOC, +~3% coverage

  ⬜ Add tests for collapse.ts + toc.ts + tabs.ts (2 days)
     • Plugin transformation tests
     • Edge cases for each
     Impact: +530 LOC, +~6% coverage

  ⬜ Add tests for mermaid.ts + filetree.ts (1 day)
     • Basic transformation tests
     • Error handling
     Impact: +150 LOC, +~1% coverage

  ⬜ Add integration tests for plugins (2 days)
     • Full pipeline integration
     • Plugin interaction tests
     Impact: +800 LOC, +~2% coverage

Total Time: 1-2 weeks
Coverage Change: 23% → ~40% ✅
Risk Reduction: ↑↑↑ High

Phase 3: Comprehensive Coverage (2-3 weeks)

Goal: Reach 60%+ coverage

✅ Tasks:
  ⬜ Add tests for screenshot-service.ts (3 days)
     • Integration tests with Playwright
     • Resource cleanup tests
     • Timeout handling
     Impact: +400 LOC, +~6% coverage
     Note: Requires Playwright setup

  ⬜ Add tests for image-processor.ts (2 days)
     • Sharp integration tests
     • Error handling (corrupt images)
     • Performance tests
     Impact: +350 LOC, +~4% coverage

  ⬜ Add tests for symbol-generation.ts (3 days)
     • Source code parsing tests
     • Symbol extraction tests
     • Edge cases for complex types
     Impact: +500 LOC, +~10% coverage

  ⬜ Add tests for navigation modules (2 days)
     • navigation-builder.ts
     • navigation-scanner.ts
     • Integration tests
     Impact: +250 LOC, +~4% coverage

  ⬜ Add tests for search.ts (2 days)
     • Search algorithm tests
     • Ranking tests
     • Performance tests
     Impact: +150 LOC, +~2% coverage

Total Time: 2-3 weeks
Coverage Change: 40% → ~66% ✅
Risk Reduction: ↑↑↑ Very High

Phase 4: Advanced Testing (2-3 weeks)

Goal: E2E, performance, visual regression

✅ Tasks:
  ⬜ Set up Playwright E2E tests (3 days)
     • Full site generation E2E
     • Navigation E2E
     • Search E2E
     Impact: +800 LOC E2E tests

  ⬜ Add performance benchmarks (2 days)
     • Large file processing
     • Batch image optimization
     • Search indexing performance
     Impact: +400 LOC perf tests

  ⬜ Add visual regression tests (2 days)
     • Snapshot tests for rendering
     • CSS regression detection
     Impact: +300 LOC visual tests

  ⬜ Add component tests (1 week)
     • Test all Svelte components
     • Interaction tests
     Impact: +600 LOC, +~3% coverage

Total Time: 2-3 weeks
Coverage Change: 66% → ~70%+
Quality: ↑↑↑ Production-ready

Total Effort Summary

Phase 1 (Quick Wins):        1-2 days
Phase 2 (Critical):          1-2 weeks
Phase 3 (Comprehensive):     2-3 weeks
Phase 4 (Advanced):          2-3 weeks
───────────────────────────────────────
Total:                       6-9 weeks (1 developer)

Coverage Progression:
  Start:   23.38%
  Phase 1: 23%      (cleanup, no change)
  Phase 2: ~40%     (+16.62%) ✅ Threshold met
  Phase 3: ~66%     (+26%)
  Phase 4: ~70%+    (+4%)

Risk Reduction:
  Phase 1: ↑ Low (maintainability)
  Phase 2: ↑↑↑ Very High (critical modules)
  Phase 3: ↑↑ High (comprehensive coverage)
  Phase 4: ↑ Medium (quality of life)

💡 Specific Code Examples

Example 1: Refactoring Callouts Tests

Before (96 lines):

test('should transform NOTE callout', () => {
  const tree = createTree('[!NOTE] Note content');
  const result = transformCallout(tree);
  expect(result.html).toContain('callout-blue');
  expect(result.html).toContain('Note');
});

test('should transform TIP callout', () => {
  const tree = createTree('[!TIP] Tip content');
  const result = transformCallout(tree);
  expect(result.html).toContain('callout-green');
  expect(result.html).toContain('Tip');
});

// ... 7 more identical tests

After (15 lines):

const calloutTypes = [
  { type: 'NOTE', class: 'blue', title: 'Note', role: 'note' },
  { type: 'TIP', class: 'green', title: 'Tip', role: 'note' },
  { type: 'WARNING', class: 'yellow', title: 'Warning', role: 'alert' },
  { type: 'IMPORTANT', class: 'purple', title: 'Important', role: 'alert' },
  { type: 'CAUTION', class: 'red', title: 'Caution', role: 'alert' },
  { type: 'SUCCESS', class: 'success', title: 'Success', role: 'status' },
  { type: 'DANGER', class: 'danger', title: 'Danger', role: 'alert' },
  { type: 'INFO', class: 'info', title: 'Info', role: 'note' },
  { type: 'QUESTION', class: 'question', title: 'Question', role: 'note' }
];

test.each(calloutTypes)(
  'should transform $type callout',
  ({ type, class: cssClass, title, role }) => {
    const tree = createTree(`[!${type}] ${type} content`);
    const result = transformCallout(tree);

    expect(result.html).toContain(`callout-${cssClass}`);
    expect(result.html).toContain(title);
    expect(result.html).toContain(`role="${role}"`);
    expect(result.html).toContain(`aria-label="${title}"`);
  }
);

Benefits: • 81 lines removed (84% reduction) • Easier to add new callout types (add one line to array) • Single place to update test logic • Clear data-driven approach

Example 2: Adding Error Handling Tests

Missing (should add):

// src/lib/utils/file-io.test.ts

describe('error handling', () => {
  test('should throw on permission denied', async () => {
    const readOnlyFile = path.join(tmpDir, 'readonly.txt');
    await writeFile(readOnlyFile, 'content');
    fs.chmodSync(readOnlyFile, 0o000); // Remove all permissions

    await expect(readFile(readOnlyFile)).rejects.toThrow(/permission denied/i);

    // Cleanup
    fs.chmodSync(readOnlyFile, 0o644);
  });

  test('should throw on disk space exhausted', async () => {
    // Mock fs.writeFile to simulate ENOSPC error
    const mockWrite = vi.spyOn(fs.promises, 'writeFile');
    mockWrite.mockRejectedValueOnce(
      Object.assign(new Error('ENOSPC: no space left on device'), {
        code: 'ENOSPC'
      })
    );

    const file = path.join(tmpDir, 'test.txt');
    await expect(writeFile(file, 'content')).rejects.toThrow(/no space left/i);

    mockWrite.mockRestore();
  });

  test('should handle very large files', async () => {
    const largeContent = 'x'.repeat(10 * 1024 * 1024); // 10MB
    const file = path.join(tmpDir, 'large.txt');

    await writeFile(file, largeContent);
    const result = await readFile(file);

    expect(result.length).toBe(largeContent.length);
  }, 30000); // 30s timeout for large file
});

Example 3: Integration Test Example

Should add:

// src/lib/integration/markdown-pipeline.test.ts

import { describe, it, expect } from 'vitest';
import { unified } from 'unified';
import remarkParse from 'remark-parse';
import remarkDirective from 'remark-directive';
import { calloutsPlugin } from '../plugins/callouts';
import { codeHighlightPlugin } from '../plugins/code-highlight';
import { linksPlugin } from '../plugins/links';
import { katexPlugin } from '../plugins/katex';

describe('markdown processing pipeline', () => {
  const processor = unified()
    .use(remarkParse)
    .use(remarkDirective)
    .use(calloutsPlugin)
    .use(codeHighlightPlugin)
    .use(linksPlugin)
    .use(katexPlugin);

  it('should process complex markdown with all plugins', async () => {
    const markdown = `
# Documentation

[!NOTE] This is a note callout

Check out [guide.md](./guide.md) for more info.

\`\`\`typescript title="example.ts" showLineNumbers
function hello() {
  console.log('Hello World');
}
\`\`\`

Inline math: $E = mc^2$

Display math:

$$
\\int_{-\\infty}^{\\infty} e^{-x^2} dx = \\sqrt{\\pi}
$$
    `.trim();

    const result = await processor.process(markdown);
    const html = result.toString();

    // Verify all plugins ran
    expect(html).toContain('callout-blue');        // Callouts
    expect(html).toContain('/docs/guide');         // Links
    expect(html).toContain('class="shiki"');       // Code highlight
    expect(html).toContain('code-block-title');    // Code metadata
    expect(html).toContain('katex');               // Math rendering

    // Verify no plugin corrupted another's output
    expect(html).not.toContain('[!NOTE]');         // Callout processed
    expect(html).not.toContain('./guide.md');      // Link processed
    expect(html).not.toContain('```typescript');   // Code fence processed
    expect(html).not.toContain('$E = mc^2$');      // Inline math processed
  });

  it('should handle plugin errors gracefully', async () => {
    const markdown = `
[!NOTE] Valid callout

\`\`\`invalidlang123
code
\`\`\`

$\\invalid{latex}$
    `.trim();

    // Should not throw, but handle errors gracefully
    await expect(processor.process(markdown)).resolves.toBeDefined();
  });

  it('should maintain correct order of transformations', async () => {
    // Links should be processed before callouts
    // So callouts can contain links
    const markdown = `
[!NOTE] Check [guide.md](./guide.md)
    `.trim();

    const result = await processor.process(markdown);
    const html = result.toString();

    expect(html).toContain('callout');
    expect(html).toContain('/docs/guide');
  });
});

🎓 Testing Best Practices Guide

Test Naming Convention

❌ Bad:

test('test1', () => { ... });
test('file-io', () => { ... });
test('check read', () => { ... });

✅ Good:

test('should read UTF-8 file with special characters', () => { ... });
test('should throw error when file does not exist', () => { ... });
test('should handle empty files gracefully', () => { ... });

Pattern: should [expected behavior] [given context]

Assertion Quality

❌ Weak Assertions:

expect(result).toBeDefined();
expect(html).toContain('class');
expect(count).toBeGreaterThan(0);

✅ Strong Assertions:

expect(result).toEqual({ title: 'Title', content: 'Content' });
expect(html).toMatch(/<div class="callout-blue" role="note">/);
expect(count).toBe(5);

Test Independence

❌ Dependent Tests:

let sharedState;

test('test 1', () => {
  sharedState = someFunction();
});

test('test 2', () => {
  // Depends on test 1 running first
  expect(sharedState).toBe(expected);
});

✅ Independent Tests:

test('test 1', () => {
  const state = someFunction();
  expect(state).toBe(expected);
});

test('test 2', () => {
  const state = someFunction();
  expect(state).toBe(expected);
});

Avoid Magic Values

❌ Magic Values:

test('rate limiter', () => {
  for (let i = 0; i < 10; i++) {
    checkRateLimit('user');
  }
  vi.advanceTimersByTime(60000);
});

✅ Named Constants:

const RATE_LIMIT = 10;
const ONE_MINUTE_MS = 60_000;

test('rate limiter should allow up to limit', () => {
  for (let i = 0; i < RATE_LIMIT; i++) {
    checkRateLimit('user');
  }
  vi.advanceTimersByTime(ONE_MINUTE_MS);
});

Test Helpers and Fixtures

❌ Inline Mock Data Everywhere:

test('test 1', () => {
  const tree = { type: 'root', children: [{ type: 'paragraph', ... }] };
  // ...
});

test('test 2', () => {
  const tree = { type: 'root', children: [{ type: 'paragraph', ... }] };
  // ...
});

✅ Shared Test Helpers:

// tests/helpers/fixtures.ts
export function createMockTree(content: string) {
  return { type: 'root', children: [{ type: 'paragraph', ... }] };
}

// test file
import { createMockTree } from '../helpers/fixtures';

test('test 1', () => {
  const tree = createMockTree('content');
  // ...
});

📈 Coverage Improvement Tracking

Recommended Coverage Targets

Phase    Timeline    Lines    Functions  Branches  Statements  Status
───────  ──────────  ───────  ─────────  ────────  ──────────  ──────────
Current  Today       23.38%   29.85%     18.11%    23.33%      🔴 Below
Phase 1  Week 1      23%      30%        18%       23%         🔄 Cleanup
Phase 2  Week 3      40%      45%        35%       40%         ✅ Threshold
Phase 3  Week 6      60%      65%        50%       60%         ✅ Good
Phase 4  Week 9      70%+     75%+       60%+      70%+        ✅ Excellent

Module Priority for Coverage

Tier 1 - Critical (Add First):

Module                   Current   Target    Priority
─────────────────────    ───────   ──────    ────────
reference.ts             0%        90%       ⭐⭐⭐⭐⭐
circuit-breaker.ts       0%        85%       ⭐⭐⭐⭐⭐
screenshot-service.ts    0%        60%       ⭐⭐⭐⭐⭐
image-processor.ts       0%        80%       ⭐⭐⭐⭐⭐

Tier 2 - Important (Add Second):

Module                   Current   Target    Priority
─────────────────────    ───────   ──────    ────────
collapse.ts              0%        85%       ⭐⭐⭐⭐
toc.ts                   0%        90%       ⭐⭐⭐⭐
tabs.ts                  0%        85%       ⭐⭐⭐⭐
symbol-generation.ts     0%        75%       ⭐⭐⭐⭐

Tier 3 - Utilities (Add Third):

Module                   Current   Target    Priority
─────────────────────    ───────   ──────    ────────
navigation-builder.ts    0%        80%       ⭐⭐⭐
tree-parser.ts           0%        85%       ⭐⭐⭐
search.ts                0%        85%       ⭐⭐⭐
base64.ts                12%       70%       ⭐⭐⭐

Weekly Progress Tracking Template

Week of: [DATE]

Coverage:
  Lines:      [%] (target: [%]) [↑↓] [delta]
  Functions:  [%] (target: [%]) [↑↓] [delta]
  Branches:   [%] (target: [%]) [↑↓] [delta]
  Statements: [%] (target: [%]) [↑↓] [delta]

Tests Added: [N] tests ([M] LOC)
Modules Covered:
  • [module] - [N] tests - [%] coverage
  • [module] - [N] tests - [%] coverage

Issues Found:
  • [issue description]
  • [issue description]

Next Week Goals:
  • [goal]
  • [goal]

🚀 Conclusion

Summary of Findings

Strengths: ✅ Solid test foundation where tests exist ✅ Good edge case coverage in tested modules ✅ Excellent security testing (XSS, sanitization) ✅ Good test organization and naming conventions ✅ Proper cleanup and lifecycle management

Critical Issues: ❌ Coverage well below 40% threshold (currently 23.38%) ❌ 57% of source files completely untested ❌ 96 lines of duplicate test code identified ❌ No CI/CD pipeline for automated testing ❌ Missing integration and E2E tests

Risk Level: 🟡 MEDIUM-HIGH

High-risk untested modules exist (reference.ts, screenshot-service.ts,
circuit-breaker.ts, image-processor.ts) but core functionality has tests.

Immediate Action Items (This Week)

Refactor duplicate tests (callouts, links) - 2 hours
Set up GitHub Actions CI - 2 hours
Remove useless tests (TypeScript types) - 30 minutes
Start Phase 2: Add reference.ts tests - Ongoing

Long-term Roadmap (9 weeks)

Weeks 1-2:   Quick wins + cleanup
Weeks 3-4:   Critical coverage (reach 40%)
Weeks 5-7:   Comprehensive coverage (reach 60-70%)
Weeks 8-9:   Advanced testing (E2E, performance)

Success Metrics

Phase 2 Complete (Week 4): • Coverage ≥ 40% ✅ • All Tier 1 modules have tests • CI/CD running on all PRs

Phase 3 Complete (Week 7): • Coverage ≥ 60% • All high and medium risk modules tested • Integration tests in place

Phase 4 Complete (Week 9): • Coverage ≥ 70% • E2E tests running nightly • Performance benchmarks established • Visual regression testing (optional)

📚 Additional Resources

Vitest Documentation: • https://vitest.dev/guide/ • https://vitest.dev/guide/coverage.html • https://vitest.dev/api/

Testing Best Practices: • Kent C. Dodds - Testing Library principles • Martin Fowler - Test Pyramid • Google Testing Blog - Effective testing strategies

Tools to Consider: • Codecov - Coverage tracking and visualization • Playwright - E2E testing (already installed ✅) • Percy/Chromatic - Visual regression testing • Benchmarkjs - Performance benchmarking

Report Generated: 2025-11-11 docs-engine version: 2.0.0 Vitest version: 4.0.7

FilesExpand file tree

TEST_AUDIT_REPORT.md

Latest commit

History

TEST_AUDIT_REPORT.md

File metadata and controls

🔍 Comprehensive Test Audit Report

📊 Executive Summary

🎯 Critical Issues Requiring Immediate Action

1. Coverage Below Threshold ⚠️

2. Massive Test Duplication 🔴

3. Missing Critical Module Tests ⚠️

4. No CI/CD Integration ⚠️

📋 Test Quality Assessment by Module

✅ EXCELLENT QUALITY (Keep as-is)

⚠️ GOOD QUALITY (Minor improvements needed)

🔴 NEEDS IMPROVEMENT (Refactoring required)

🚫 Tests to Remove or Consolidate

High Priority Removals (No Value)

Medium Priority Consolidations

🔴 Critical Coverage Gaps (Priority Order)

Tier 1: High-Risk Untested Modules (Add immediately)

Tier 2: Important Feature Gaps (Add within 2 weeks)

Tier 3: Utility and Support Functions

Tier 4: Low-Risk Utilities (Add as time permits)

🧪 Missing Test Types

1. Integration Tests ❌

2. End-to-End Tests ❌

3. Performance Tests ❌

4. Visual Regression Tests ❌

5. Component Tests ❌

🎯 Test Best Practices Issues

❌ Current Issues

✅ Things Done Well

🚀 CI/CD Recommendations

Current State: No CI/CD ❌

Recommended GitHub Actions Workflow

🔧 Test Configuration Optimization

Current Vitest Config Analysis

📊 Actionable Improvement Roadmap

Phase 1: Quick Wins (1-2 days)

Phase 2: Critical Coverage (1-2 weeks)

Phase 3: Comprehensive Coverage (2-3 weeks)

Phase 4: Advanced Testing (2-3 weeks)

Total Effort Summary

💡 Specific Code Examples

Example 1: Refactoring Callouts Tests

Example 2: Adding Error Handling Tests

Example 3: Integration Test Example

🎓 Testing Best Practices Guide

Test Naming Convention

Assertion Quality

Test Independence

Avoid Magic Values

Test Helpers and Fixtures

📈 Coverage Improvement Tracking

Recommended Coverage Targets

Module Priority for Coverage

Weekly Progress Tracking Template

🚀 Conclusion

Summary of Findings

Immediate Action Items (This Week)

Long-term Roadmap (9 weeks)

Success Metrics

📚 Additional Resources