Use Case

AI Test Generation at Scale

The Problem

Test coverage tends to fall behind. Writing tests is time-consuming, and developers understandably prioritize shipping features. Over time, untested code accumulates and regressions become harder to catch.

How Stoneforge solves it

AI test generation with Stoneforge creates tests across your codebase in parallel. Instead of one developer writing tests file by file, multiple automated testing AI agents work simultaneously, each targeting different modules to boost your coverage fast.

Coverage audit, then parallel generation

Start by identifying the gaps. Stoneforge agents can scan your codebase, identify untested modules, and create a plan to cover them systematically.

# Create a test generation plan
sf plan create --title "Increase test coverage to 80%" \
  --description "Generate unit and integration tests for all modules below 50% coverage. Follow existing test patterns in __tests__/ directories."

# The Director creates scoped tasks:
# 1. Generate tests for src/services/auth/ (12% coverage)
# 2. Generate tests for src/services/billing/ (8% coverage)
# 3. Generate tests for src/api/routes/ (34% coverage)
# 4. Generate tests for src/utils/ (45% coverage)
# 5. Generate integration tests for API endpoints

Pattern-aware AI test generation

Agents study your existing tests before writing new ones. They pick up your testing conventions: which framework you use, how you structure test files, what mocking patterns you prefer. The goal is tests that look like your team wrote them.

# Configure workspace-level testing conventions
sf init

# In .stoneforge/prompts/worker.md, add:
# "Follow existing test patterns. Use Vitest with vi.mock().
#  Place test files adjacent to source files as *.test.ts.
#  Use factory functions for test data, not raw objects."

Every test runs before merge

Each generated test suite is executed in the agent’s worktree to verify everything passes. The Steward agent runs the full test suite again before merging, catching any interactions between newly generated tests — the same quality gate used in automated code review.

Target the riskiest code first

Focus AI test generation where it matters most. Prioritize modules that handle payments, authentication, or data integrity. Stoneforge’s priority system lets you ensure high-risk modules get tested first, reducing the chance of regressions in critical paths.

# High-priority: test the critical paths first
sf task create --title "Generate tests for payment processing" --priority 1
sf task create --title "Generate tests for auth middleware" --priority 1
sf task create --title "Generate tests for data export" --priority 2
sf task create --title "Generate tests for UI components" --priority 3

Related documentation

Frequently asked questions

What types of tests can AI test generation produce?
Agents can generate unit tests, integration tests, and end-to-end tests. They analyze your existing test patterns (framework, assertion style, mocking approach) and generate tests that match your conventions. Supported frameworks include Jest, Vitest, Pytest, Go testing, and more.
How does automated testing AI ensure generated tests are meaningful?
Agents analyze code paths, edge cases, and error handling to create tests that catch real bugs, not just tests that pass. Each agent runs the test suite to verify tests pass, and the Steward reviews test quality before merging.
Can I target specific directories for AI test generation?
Yes. You can scope test generation tasks to specific directories, file patterns, or modules. For example, generate tests only for the /api/routes/ directory, or target all files with less than 50% coverage.
Will AI-generated tests conflict with my existing test suite?
No. Agents analyze your existing tests before generating new ones to avoid duplication. They follow your naming conventions, use the same test utilities, and organize tests in the same directory structure you already use.
How much test coverage improvement can automated testing AI deliver?
Results vary depending on your starting coverage, codebase complexity, and language. Agents focus on untested code paths, edge cases, and error handlers. You'll get the biggest gains in codebases with large gaps in coverage.

Ready to get started?

Set up Stoneforge in under 30 seconds and start orchestrating AI agents in parallel.