Teaching your AI how to write tests with you

Everyone has opinions about how tests should look. Naming conventions, structure patterns, which mocking library to use. The problem with AI coding assistants is that they have collected opinions for their learnings too, and they’re usually not yours.

Ask an AI to write a test and you’ll get something that works. It’ll compile, it’ll pass, it’ll cover the method. But it won’t follow your naming convention. It’ll use a different assertion style. It’ll structure the test differently from the 200 tests already in your project. You fix it, move on, and tomorrow the AI writes tests its way again.

This is the testing version of the context problem covered in part 1 of this series. The AI doesn’t remember your preferences. It defaults to the most common patterns from its training data, not the patterns your team agreed on.

The /dev-tdd skill in the agentic dev workflow solves this by encoding your testing practices into a reusable workflow. You teach it once how your team writes tests. From then on, it follows your conventions in every session, across every repository.

Note: The skills shown here reflect my stack and conventions at the time of writing. They improve over time as the workflow learns from daily use. Your project will have different tools, different security concerns, different quality bars. These are examples of what’s possible, not prescriptions. Fork them, adjust them, or use them as inspiration for your own.

One command, two stacks

Type /dev-tdd and the skill detects your stack automatically. A .csproj file means .NET with xUnit and NSubstitute. An angular.json means Angular with Vitest and Testing Library. You never specify which framework to use.

/dev-tdd "profile email verification"

The skill routes to the right sub-skill, loads the conventions for that stack, and starts the TDD loop. Same command whether you’re working on a C# API or an Angular component.

It starts with a plan, not code

Before writing anything, the skill proposes test cases and waits for your input:

Feature: Profile email verification

  Test Cases
  ──────────
  VerifyEmail/
  1. With_correct_code_Should_succeed_and_set_email
  2. With_expired_code_Should_fail
  3. With_wrong_code_Should_fail
  4. When_already_verified_Should_return_already_verified
  5. With_null_code_Should_throw_validation

  ───────────────────────────────────────────────
  Proceed?  [y]es  [e]dit list  [a]dd more tests

This is where you shape the direction. Remove a test case that doesn’t apply. Add edge cases the AI didn’t think of. The plan aligns you and the AI before any code is written.

The “but it’s testing its own code” question

Let’s address this directly. If the AI writes both the implementation and the tests, isn’t it just testing assumptions it already made? If the logic is wrong, the tests will be wrong too.

Yes, that’s a real concern. And it’s important to understand what TDD does and doesn’t solve here.

When you use /dev-tdd, the test is written first. The implementation doesn’t exist yet. The test encodes your specification of what should happen: “when the code is expired, verification should fail.” That’s a business requirement, not an implementation detail. The AI writes a test for that requirement, then writes the minimum code to make it pass.

If the requirement itself is wrong, the test will encode the wrong behavior. But that’s true regardless of who writes it. A human developer can misunderstand a requirement just as easily.

Where the real value lies is in the structure. The RED→GREEN→REFACTOR loop forces the AI to write only enough code to pass each test. No over-engineering. No unused abstractions. And because the tests exist, you can refactor the implementation later with confidence.

The skill is not a substitute for reviewing the tests. Read them. Are they testing what you expect? Do the test names describe real scenarios? That review takes minutes, and it’s the most valuable quality gate you have.

Your conventions, not the internet’s

Here’s where the skill really earns its keep. These are the practices I’ve encoded into it. Yours will be different, and that’s the point.

Naming: `With_condition_Should_outcome`

Tests live inside nested classes named after the operation (see below), so the test name doesn’t repeat the operation. Not TestMethod1. Not ShouldWork. A name that tells you exactly what broke when it fails.

// Inside class VerifyEmail
With_correct_code_Should_succeed_and_set_email()
With_expired_code_Should_fail()

// Inside class PlaceOrder
With_empty_items_Should_throw_validation_exception()

// Inside class Toggle
When_archived_Should_fail()

When a test fails in CI, you read the class name plus the test name and know the scenario without opening the file.

AAA: Arrange, Act, Assert

Every test uses explicit AAA comments. Not because the AI needs them, but because the next person reading the test does. Scanning a test file with clear // Arrange, // Act, // Assert markers is faster than parsing the intent from code alone.

[Fact]
public async Task PlaceOrder_WithValidItems_ShouldCreateOrder()
{
    // Arrange
    var items = new[] { new OrderItem("SKU-001", 2, 9.99m) };

    // Act
    var result = await _sut.PlaceOrderAsync(items);

    // Assert
    Assert.NotNull(result);
    Assert.Equal(OrderStatus.Placed, result.Status);
}

System under test: always `sut`

The thing you’re testing is always called sut. Dependencies are set up in the constructor, not in each test. This makes every test file instantly scannable: find sut, you know what’s being tested. Find the constructor, you know what’s mocked.

public class OrderServiceTests
{
    private readonly IOrderRepository _repository;
    private readonly OrderService sut;

    public OrderServiceTests()
    {
        _repository = Substitute.For<IOrderRepository>();
        sut = new OrderService(_repository);
    }
}

Nested classes mirror the source

An aggregate with Create, Toggle, and Archive operations gets a test class with nested Create, Toggle, and Archive classes. Large test files stay navigable. It’s immediately obvious which tests cover which operation.

public class FeatureFlagTests
{
    public class Create
    {
        [Fact]
        public void With_valid_name_Should_set_name_and_disabled_state()
        {
            // Arrange & Act & Assert
        }
    }

    public class Toggle
    {
        [Fact]
        public void When_disabled_Should_enable()
        {
            // Arrange & Act & Assert
        }
    }
}

These conventions aren’t revolutionary. They’re choices. But they’re consistent across every test in every repository, because the skill enforces them without being reminded.

The .NET side

The conventions above apply to any .NET test. But the real value of encoding your testing practices into a skill shows up when your stack has patterns that the AI wouldn’t know from its training data. The backend TDD skill knows xUnit and NSubstitute, but it also knows the specific libraries and testing APIs your projects depend on.

Stack-specific patterns: an example

This is where the depth of skills becomes concrete. I have some projects that use my ErikLieben.FA.ES event sourcing library. Without the skill, the AI would try to test aggregates with mocks and direct property assignment.

With the skill, it uses the library’s AggregateTestBuilder and Given-When-Then pattern, which I prefer.

public class OrderTests
{
    private readonly TestContext context = TestSetup.GetContext(
        new ServiceCollection().BuildServiceProvider()
    );

    public class Ship
    {
        [Fact]
        public async Task With_valid_address_Should_append_shipped_event()
        {
            // Arrange
            var builder = AggregateTestBuilder
                .For<Order>("order-123", context)
                .Given(new OrderCreated("order-123", "customer-1"));

            // Act & Assert
            await builder
                .When(async order => await order.Ship("123 Main St"))
                .Then(result =>
                {
                    result.ShouldHaveAppended<OrderShipped>();
                    result.ShouldHaveProperty(o => o.Status, OrderStatus.Shipped);
                });
        }
    }
}

The library provides AggregateTestBuilder, ProjectionTestBuilder, and a TestClock for deterministic time testing. None of this is generic xUnit knowledge. The AI needs to know these APIs, the Given-When-Then pattern, and how aggregates rebuild state from events. That’s what the skill encodes. Without it, you’d explain these patterns every session.

Aspire runtime verification

After all tests pass, if .NET Aspire is running, the skill optionally checks for runtime errors that unit tests can’t catch. DI resolution failures, startup crashes, event store serialization issues after adding new event types, projection catch-up errors. These are the things that compile fine, pass all tests, and blow up the moment the app starts.

The Angular side

The frontend TDD skill targets modern Angular: standalone components, signals, zoneless change detection, @if/@for control flow.

Component testing with Vitest

describe('OrderListComponent', () => {
  let component: OrderListComponent;
  let fixture: ComponentFixture<OrderListComponent>;
  let orderService: { getOrders: ReturnType<typeof vi.fn> };

  beforeEach(async () => {
    orderService = { getOrders: vi.fn().mockReturnValue(of([])) };

    await TestBed.configureTestingModule({
      imports: [OrderListComponent],
      providers: [
        { provide: OrderService, useValue: orderService }
      ]
    }).compileComponents();

    fixture = TestBed.createComponent(OrderListComponent);
    component = fixture.componentInstance;
  });

  it('should show empty state when no orders', () => {
    // Arrange
    orderService.getOrders.mockReturnValue(of([]));

    // Act
    fixture.detectChanges();

    // Assert
    const empty = fixture.nativeElement
      .querySelector('[data-testid="empty-state"]');
    expect(empty).toBeTruthy();
  });
});

Testing Library for behavior-focused tests

it('should render orders', async () => {
    // Arrange
    const orderService = {
      getOrders: vi.fn().mockReturnValue(of([
        { id: '1', status: 'Placed', total: 19.98 }
      ]))
    };

    // Act
    await render(OrderListComponent, {
      providers: [
        { provide: OrderService, useValue: orderService }
      ]
    });

    // Assert
    expect(screen.getByTestId('order-row'))
      .toHaveTextContent('19.98');
});

Functional guards

Modern Angular uses functional guards (CanActivateFn) instead of class-based guards. The skill knows to test them with TestBed.runInInjectionContext():

it('should redirect to login when not authenticated', () => {
    // Arrange
    TestBed.configureTestingModule({
      providers: [
        { provide: AuthService,
          useValue: { isAuthenticated: () => false } },
        { provide: Router,
          useValue: { createUrlTree: vi.fn()
            .mockReturnValue('/login') } }
      ]
    });

    // Act
    const result = TestBed.runInInjectionContext(() =>
      authGuard(
        {} as ActivatedRouteSnapshot,
        {} as RouterStateSnapshot
      )
    );

    // Assert
    expect(result).toBe('/login');
});

E2E with Playwright

For end-to-end tests, the skill uses Playwright:

test('user can place an order', async ({ page }) => {
  await page.goto('/orders/new');

  await page.getByTestId('sku-input').fill('SKU-001');
  await page.getByTestId('quantity-input').fill('2');
  await page.getByTestId('add-item-button').click();
  await page.getByTestId('place-order-button').click();

  await expect(page.getByTestId('order-confirmation'))
    .toContainText('Order placed');
});

The RED→GREEN→REFACTOR loop

The skill doesn’t just generate tests. It drives the TDD loop interactively, showing progress after each test case:

  Progress
  ────────
  ✓ 1. With_correct_code_Should_succeed_and_set_email
  ✓ 2. With_expired_code_Should_fail
  ▸ 3. With_wrong_code_Should_fail                   ← next
  · 4. When_already_verified_Should_return_already_verified
  · 5. With_null_code_Should_throw_validation

  Tests: 2/5 done  |  All passing
  ──────────────────────────────────────
  [c]ontinue  [s]kip to coverage check  [q]uit

After all test cases pass, it checks coverage. If you’re below 80%, it identifies uncovered code paths and proposes additional tests:

  Coverage: 74% → target: 80%

  Uncovered areas:
  - ProfileAggregate line 42-48: error path when stream is empty
  - VerifyEndpoint line 28-35: authorization failure path

  Add tests for uncovered areas?  [y]es  [p]ick  [s]kip

You pick which gaps to fill. It writes more tests, loops RED→GREEN→REFACTOR, and re-checks coverage until you hit the target or decide to stop.

The conventions improve over time

As covered in part 1, the workflow has a self-learning loop. This applies to testing too.

Maybe you notice the skill keeps generating tests with fixture.detectChanges() when your project uses zoneless change detection and signals handle reactivity. You correct it. That correction gets picked up by /meta-continuous-learning and encoded into the skill. Next time, it uses signal-based assertions from the start.

Or maybe you add a new convention: all integration tests should use Testcontainers instead of in-memory databases. You update the skill file. Every project that pulls the updated skill follows the new convention automatically.

The skill file is just markdown. Editing it is as easy as editing a README. But the effect is that your testing standards evolve with your team, and every AI session follows the latest version.

Getting started

The /dev-tdd skill is part of the agentic-dev-workflow. If you’ve set up the workflow from part 1, it’s ready to use.

/dev-tdd "profile email verification"        # Auto-detect stack
/dev-tdd dotnet "payment processing"         # Force .NET
/dev-tdd angular "order list component"      # Force Angular
/dev-tdd dotnet "retry policy" --target 90   # Override coverage target

Start with a feature you need to build. Let the skill propose test cases. Edit the list until it matches what you want to test. Then let it drive the loop while you review each test as it’s written.

Your conventions. Your naming. Your structure. Consistently applied, without explaining them every session.