← Back to posts

Don't let your AI agent delegate the debug work to you: manage, monitor, and test your app with Aspire 13.2's CLI overhaul and new agent skills

Part 8 of 8 in the Agentic (.NET) developer workflow series
  1. Turning your AI tool into your pair programming companion
  2. Dependency updates that understand your code
  3. Teaching your AI how to write tests with you
  4. Quality gates that actually run: verification and security in the agentic workflow
  5. Documentation as a first-class concern in your agentic workflow
  6. AI-driven usability testing: a think-aloud study with a team of AI testers
  7. Building and evolving your own AI development skills
  8. Don't let your AI agent delegate the debug work to you: manage, monitor, and test your app with Aspire 13.2's CLI overhaul and new agent skills

You’ve discussed the feature with your AI agent, it wrote the code for it, but then what?

You start the app, open the browser, click around, look for bugs, find one, describe it back to the agent. You’re doing all the boring manual labor of verifying that what was built actually works.

With Aspire’s CLI overhaul in 13.2 and its new skills combined with Playwright CLI/skills the agent can manage and monitor your distributed app, open the browser, test the feature, and debug it. The tedious verify-and-fix loop becomes the agent’s job, not yours.

New to this series? This is part 8 of the Agentic (.NET) developer workflow series. Part 1 covers the persistent memory foundation, a good read to get started.

Part 4 introduced Aspire health checks as a verification phase in /dev-verify. That phase tells you something is wrong. It doesn’t tell you why.

This post picks up where part 4 ended and extends the skills of your agent: handing the entire verify-and-debug loop to the agent, with the Aspire CLI managing the backend and Playwright handling the browser.

What was missing from the pairing session

The agent could pair with you on everything up to “let’s try it.” After that, you were on your own. Two capabilities were missing: starting a distributed app with health-gated resources, and using the frontend like an actual user.

Think about the flow:

graph LR A[Discuss] --> B[Write code] --> C[Build] --> D[Try it] --> E[Start app] --> F[Click around] --> G[Works?] style A fill:#89b4fa,stroke:#585b70,color:#1e1e2e style B fill:#89b4fa,stroke:#585b70,color:#1e1e2e style C fill:#89b4fa,stroke:#585b70,color:#1e1e2e style D fill:#f9e2af,stroke:#585b70,color:#1e1e2e style E fill:#f38ba8,stroke:#585b70,color:#1e1e2e style F fill:#f38ba8,stroke:#585b70,color:#1e1e2e style G fill:#f38ba8,stroke:#585b70,color:#1e1e2e

Blue: the agent pairs with you. Yellow: the handoff. Red: the agent went blind.

The gap was at “let’s try it”: starting an AppHost with health-gated resources and using the frontend like an actual user. The Aspire MCP server and Playwright’s MCP integration could already do this, but required configuring servers and agent-specific setup. Aspire 13.2’s CLI commands bring the same capabilities through plain shell commands, no server dependency needed.

Without this, you become the tester. With it, the agent catches integration bugs itself: URL mismatches, CORS errors, Angular runtime failures, cross-service issues that only show when you check both the browser console and the backend telemetry.

The two CLI tools: Aspire CLI and Playwright CLI

Aspire CLI 13.2 manages the backend. Playwright CLI drives the browser. Both support --format Json for structured agent output. The key commands you’ll see in the demo:

Aspire CLI Playwright CLI
aspire start / aspire wait playwright-cli open <url>
aspire describe --format Json playwright-cli snapshot
aspire otel logs <resource> playwright-cli fill / click
aspire resource <r> restart playwright-cli console

If you’re using the agentic-dev-workflow, both skills are already included via /meta-bootstrap. From scratch, it’s aspire agent init + playwright-cli install --skills to generate the skill files.

Why CLI over MCP? Beyond simpler setup, there’s a context window cost. MCP tool schemas are injected at session start and included in every turn, whether you use them or not. For Playwright alone that’s ~3,500 tokens riding along with every message. A CLI skill loads once and doesn’t repeat. Over a long debugging session, the difference can be 4-32x in total token cost.

The demo: pairing on a search feature

Here’s a real pairing session. I asked the agent to add a search feature to an Angular + .NET Aspire sample app. What follows is the actual transcript: real commands, real outputs, real screenshots.

Act 1: “Add search to the items page”

The agent reads the existing code, adds a GET /api/items/search?q={query} endpoint to the .NET API, and adds a search input with results display to the Angular component.

But it makes a mistake. The frontend calls /api/item/search (missing the s in items).

This is realistic. The agent wrote both sides, but a one-character typo in a URL string breaks the integration. Unit tests don’t catch this. The build succeeds. TypeScript compiles. The .NET project builds. Everything looks green. (I know that this could be fixed with zod, but I needed a simple example case :-))

Act 2: “Let me start it and try it out”

Before, this is where the agent would hand it back to you: “Done, try it and let me know.” Now it does what any pair partner would do: start the app and test the feature.

Note: What follows is a real transcript from an actual pairing session. The commands, outputs, and screenshots are from a live run, feel free to scan the prompts; no need to read them fully.

# Build succeeded. Let me start the AppHost and make sure everything comes up.

$ dotnet build
# Build succeeded. 0 Warning(s) 0 Error(s)

$ aspire start
# ✔ Apphost started successfully.

$ aspire wait api --status healthy
# ✔ Resource 'api' is healthy.

$ aspire wait web --status healthy
# ✔ Resource 'web' is healthy.

Aspire dashboard showing all resources healthy dark
Aspire dashboard showing all resources healthy

Both resources are up. The agent discovers the frontend URL from aspire describe --format Json and opens the browser.

# Good, both resources are up. Let me find the frontend URL
# so I can open it and test the search feature.

$ aspire describe --format Json
# ... parses JSON ...
# Found: api at https://localhost:7239, web at http://localhost:46727

# Opening the frontend in a browser.

$ playwright-cli open http://localhost:46727 --browser=msedge --headed
$ playwright-cli snapshot

The snapshot returns the page structure as an element tree:

# The snapshot shows me the page structure:
- textbox "Search items by name..." [ref=e10]
- button "Search" [ref=e11] [cursor=pointer]
- button "Load Items (N+1 pattern)" [ref=e12] [cursor=pointer]
- button "Load Items (Projection)" [ref=e13] [cursor=pointer]
- button "Slow Downstream Call" [ref=e14] [cursor=pointer]

The app with the new search input dark
The app with the new search input

The search input rendered correctly. Time to try it.

Act 3: The bug surfaces

# The search input rendered correctly. Let me try searching for "Item 3".

$ playwright-cli fill e10 "Item 3"
$ playwright-cli click e11
# Console: 1 errors, 0 warnings

# That's not good — there's already a console error.
# Let me take a screenshot to see what the user would see.

$ playwright-cli screenshot

Search failed: 404 Not Found dark
Search failed: 404 Not Found

404 Not Found. The search endpoint isn’t being reached. A human pair partner would lean over and say “that URL looks wrong.” The agent does the same thing, but by checking the browser console and the backend telemetry.

Act 4: The investigation

# 404 Not Found. The search endpoint isn't being reached.
# Let me check the browser console for the full error.

$ playwright-cli console
# Total messages: 4 (Errors: 1, Warnings: 0)
#
# [LOG] Angular is running in development mode.
# [ERROR] Failed to load resource: the server responded with a status of
#         404 (Not Found) @ http://localhost:46727/api/item/search?q=Item%203

# The URL is /api/item/search — wait, that doesn't look right.
# Let me check what the backend has registered.

$ aspire otel logs api --format Json --severity Warning -n 5
# No error log for this request — it never reached a handler.
# A 404 at the routing level. The endpoint path doesn't match.

# Let me search the codebase for the search URL to compare
# what the backend registered vs what the frontend is calling.

$ grep -rn "/api/item.*search" .
# AspireSample.Api/Program.cs:189:  app.MapGet("/api/items/search", ...
# AspireSample.Web/.../items.component.ts:49:  `/api/item/search?q=...`

# Found it. The backend has /api/items/search (with 's')
# but I wrote /api/item/search (without 's') in the Angular service.
# A one-character typo. Build passed, TypeScript compiled,
# but the integration is broken.

# (The agent uses standard shell tools like grep alongside
# the Aspire and Playwright CLIs — whatever gets the answer fastest.)

This is the moment. The agent sees both sides of the mismatch simultaneously: the browser console showing the wrong URL, the backend showing no request ever arrived, and the codebase search confirming the paths don’t match. A human would alt-tab between DevTools and the terminal. The agent checks both in the same flow.

Act 5: The fix and verify

# Fixing the typo in items.component.ts:
- this.http.get<Item[]>(`/api/item/search?q=${encodeURIComponent(this.searchQuery)}`)
+ this.http.get<Item[]>(`/api/items/search?q=${encodeURIComponent(this.searchQuery)}`)
# Fixed. Let me restart the frontend and try again.

$ aspire resource web restart
# ✔ Resource 'web' restarted successfully.

$ aspire wait web --status healthy
# ✔ Resource 'web' is healthy.

$ playwright-cli goto http://localhost:46727/items
$ playwright-cli fill e10 "Item 3"
$ playwright-cli click e11
$ playwright-cli screenshot

Search results: Item 3 working correctly dark
Search results: Item 3 working correctly

# Search is working. Let me verify there are no errors
# on either side before wrapping up.

$ playwright-cli console
# Total messages: 2 (Errors: 0, Warnings: 0)

$ aspire otel logs api --format Json --severity Error -n 5
# No errors.

And the agent’s summary for memory:

Added search feature to the items page. I added a GET /api/items/search?q= endpoint to the API and a search input to the Angular component. While testing, I found a typo in the frontend URL — /api/item/search instead of /api/items/search — which caused a 404. Fixed it. Search is working: tested with “Item 3” and got the correct result.

That’s the shift. The agent didn’t hand it back to you with “done, try it and let me know.” It tested its own work, found a bug, fixed it, and verified the fix. You can go straight to reviewing the finished feature, or open the app yourself and try edge cases the agent didn’t think of.

Either way, you’re not the one doing the first pass anymore.

What your pair AI partner catches that tests might miss

Not every bug needs a full debug & test session. Unit tests, type checking, and linting catch most issues. But there’s a category of bugs that only appear when you actually use the app. The kind where a pair of eyes on the running app immediately shows the problem.

Contract mismatches. “You’re calling /api/item/search but the endpoint is /api/items/search.” Build succeeds, types match, but the integration is broken. This is exactly what happened in the demo. No type system catches a string typo in a URL.

Performance regressions. The page loads, no errors. But your pair partner checks the traces: “Why does this have 16 spans? The optimized version only has 2.” The N+1 query is invisible to the user but obvious in the trace waterfall.

N+1 trace waterfall showing 16 database spans dark
N+1 trace waterfall showing 16 database spans

Silent failures. The app “works” but shows stale data. Your pair partner checks the structured logs: “The projection is behind, there’s a catch-up warning.” No error, no HTTP failure. Only telemetry reveals it.

Structured logs showing error severity entries dark
Structured logs showing error severity entries

Frontend-only errors. TypeScript compiles but Angular throws NullInjectorError at runtime. Your pair partner reads the browser console: “Missing provider for SearchService, did you add it to the component?”

Cross-service errors. Browser shows “Failed to fetch.” API shows nothing. Your pair partner checks both sides: “The CORS header is missing, the request never made it to the API.” You need browser console AND server logs to see the full picture.

The agent as pair partner has an advantage humans don’t: it sees both sides simultaneously. When you’re debugging, you switch between browser DevTools and the terminal. The agent checks playwright-cli console and aspire otel logs in the same flow, correlating the browser error with the backend telemetry.

Setting it up

If you’re already using the agentic-dev-workflow, run /meta-upgrade and you’re done. The Aspire and Playwright CLI skills are included in the latest version, along with /dev-watch v2.0.0 which encodes the full start/discover/test sequence from the demo above.

For a fresh setup:

/meta-bootstrap                      # installs all skills including Aspire + Playwright

Or standalone without the workflow:

dotnet tool install -g aspire.cli    # Aspire CLI 13.2+
npm install -g @playwright/cli       # Playwright CLI
aspire agent init                    # generates skill file for your AI agent
playwright-cli install --skills      # generates playwright skill file

The /dev-watch skill ties it all together: it provides the runbook so the agent doesn’t have to figure out the sequence each time. Start the AppHost, discover resources, wait for health, open the frontend, interact, check for errors. It also tracks what the agent tested and what it found, which feeds into something I didn’t expect when I first built this.

Every testing session is a future E2E test

When the agent tests a feature manually, it’s doing exactly what you’d write a Playwright E2E test for. Navigate to a page. Fill a form. Click a button. Assert the result. It already knows the steps, the selectors, the expected outcomes. It just did them.

The workflow:

  1. Agent builds the feature
  2. Agent starts the app, opens the browser, tests manually (the pairing session from the demo)
  3. Agent writes a user story describing what it did and what it saw, saved to persistent memory
  4. Later, when you’re ready for E2E tests, the agent pulls that user story and translates it into a Playwright test

The user story the agent writes after testing might look like this:

## Search feature — manual test (2026-03-23)
- Navigated to /items
- Filled search input with "Item 3"
- Clicked Search button
- Expected: search results showing "Item 3 — Description for item 3"
- Actual: 404 error (fixed: URL typo /api/item/search → /api/items/search)
- After fix: correct results displayed, no console errors, no API errors

That’s enough context for a Playwright E2E test:

test('search filters items by name', async ({ page }) => {
  await page.goto('/items');
  await page.getByPlaceholder('Search items...').fill('Item 3');
  await page.getByRole('button', { name: 'Search' }).click();
  await expect(page.getByText('Item 3 — Description for item 3')).toBeVisible();
});

The agent doesn’t write the E2E test during the pairing session. That would slow down the flow. It writes the story. The story persists in memory (part 1’s memory system). When the time comes to add E2E coverage, all the raw material is already there: the steps, the selectors, the assertions, even the edge cases it discovered along the way.

This turns every manual testing session into E2E test documentation, for free. The agent was already doing the work. It just needs to write down what it did.

What the agent can and can’t catch

The agent won’t notice a subtle layout shift. It won’t tell you the button color doesn’t match the design spec, or that the UX flow feels awkward. It won’t catch that the search results should be sorted by relevance instead of alphabetically, or that the empty state message feels cold. Those are judgment calls that need a human eye.

It also won’t catch multi-user scenarios: race conditions when two users edit the same record, or that the search results leak data from another tenant. It reads what one browser and one set of logs show it. Concurrency and authorization edge cases still need your test suite and your thinking.

But the mechanical part: starting the app, trying the feature, checking for errors, reading the logs, correlating browser errors with backend traces — that’s exactly what a pair partner does during the testing phase. And it’s exactly what the agent can do now.

Summary

Aspire 13.2’s CLI overhaul and Playwright CLI give your agent the ability to manage, monitor, and test your distributed app without you doing the manual work. The context window cost is lower than MCP, setup is a single /meta-upgrade, and every testing session doubles as future E2E test material through persistent memory.

The next time your agent finishes a feature, don’t reach for the browser yourself. Tell it: “Start it, test it, and show me when it works.”

← Back to posts