Skill Guide: Turn Repeated Work Into Reusable Skills

Turn every correction into a permanent instruction

The is live, and you are already finding mistakes

You installed your , ran it a few times, and the output was close. Close, but you corrected the same thing twice. Maybe the AI invented an action item that nobody mentioned. Maybe it used the wrong tone for the follow-up email. Maybe it included internal notes in a client-facing summary.

Every one of those repeated corrections is a permanent instruction your does not have yet. When you notice a repeated correction, tell the AI: 'That mistake keeps happening. Update the to prevent it.' The AI adds it to the , where you collect the specific rules that prevent mistakes. Over time, the Gotchas section is what separates usable output from output you have to fix.

The writes itself from real use

You do not sit down and brainstorm gotchas. You collect them. Every time the AI produces output that needs correction, tell the AI to add that correction as a permanent instruction. After three real sessions, your will contain the specific judgment calls that make your produce output you trust.

markdown

## Gotchas

v1 (initial):
- Do not invent action items that were not explicitly stated.

v2 (after second use):
- Do not invent action items that were not explicitly stated.
- If a decision was discussed but not confirmed, mark it
  "tentative" rather than listing it as final.
- Speaker labels from Fathom are often wrong for the first
  30 seconds. Cross-reference with the attendee list.

v3 (after client used the recap):
- Do not invent action items that were not explicitly stated.
- If a decision was discussed but not confirmed, mark it
  "tentative" rather than listing it as final.
- Speaker labels from Fathom are often wrong for the first
  30 seconds. Cross-reference with the attendee list.
- Never include internal pricing discussions in the
  client-facing follow-up email.
- If the transcript mentions a deadline, verify it against
  the project tracker before listing it as confirmed.

A stylized teaching image showing run, correct, update, and test steps turning a correction into a gotcha. — The loop is simple: run the , correct the miss, update the Skill, then test it.

means the whole has changed

Sometimes the problem is not one missing gotcha. The whole has drifted from how you work. The is too broad, the procedure skips a step you now consider essential, or the output format no longer matches what you need. When corrections pile up and none of them are simple gotchas, it is time for a full review.

A Skill that needed surgery, not another gotcha

A weekly status-update started with five clean procedure steps. After eight weeks of corrections, it had fourteen gotchas, three contradictory formatting rules, and a style instruction that overrode the foundation voice file. The output had become robotic: every update read like a compliance form. A plain produced better status updates because the could exercise judgment without fighting the instructions.

The diagnosis followed the failure table below. The root cause was structural, not a single missing instruction. Three gotchas contradicted each other: one required bullet points, another required complete sentences, a third capped each item at fifteen words. Two procedure steps had been added for edge cases that occurred once and never recurred. The style instruction imposed a tone the user had since abandoned.

The fix was subtraction: seven gotchas removed, two procedure steps removed, and the conflicting style instruction deleted entirely. The went from producing output that needed heavy editing to producing output that needed a quick scan. When corrections accumulate faster than output quality improves, the Skill needs a structural review: remove instructions until output improves, then add back only the constraints that prevent problems you have seen.

Match the failure to the fix

Each failure type points to a different part of the Skill

What went wrong	Which part to update	Example fix
The AI runs this at the wrong time	Description ()	Make the more specific. Add 'Do not fire when...' language.
The AI does not recognize when to use the	Description ()	Add more phrases. Name the input shape.
The output is wrong or incomplete	Procedure + Output Format	Add a missing step. Clarify the output structure. Add an example.
Private information leaks into the output	Gotchas + working-rules.md	Add a boundary: 'Never include internal notes in client-facing output.'
The output sounds generic	Foundation files or scoped style references	Clarify the baseline style, add diverse samples, or move the specific voice into a scoped style file.
The same mistake keeps happening		Write the correction as a permanent instruction.
A global applies project-specific rules to the wrong	(global vs project vs tool-specific)	Move the to the project folder, or add a scope condition: 'Only apply when working in [project].'

A stylized fishbone teaching image showing trigger, procedure, format, example, boundary, foundation, and gotcha gaps feeding one bad output. — The same bad output can come from different parts of the , so the fix should target the real cause.

Ask AI to track what changed

When you ask AI to update a , tell it to note what changed and why. This lets you understand the Skill's evolution and roll back if a change makes output worse. You can say: 'Update the Skill with this correction and add a changelog note about what you changed.'

AI keeps the version history inside the file. Over time, this changelog becomes a readable record of how the Skill evolved from real use.

Evaluation scenarios catch regressions

Ask AI to create two or three evaluation scenarios for each important . One should be a normal case where the Skill should clearly help. One should be an where the Skill must preserve a boundary. One should be a should-not- case where a nearby request belongs somewhere else. Say: 'Create normal, edge, and should-not-trigger test scenarios for this Skill with sample input, expected output, and pass/fail criteria.' The AI drafts them; you review.

A stylized teaching image showing normal, edge, and skip evaluation scenarios routing to a pass/fail gate. — A useful evaluation set includes ordinary use, boundary cases, and nearby requests that should not the .

markdown

# Test: Basic Meeting Recap

## Input
A 15-minute transcript between two speakers discussing
a product launch timeline. One decision is confirmed:
launch date is March 15. One decision is discussed but
not confirmed: whether to include beta testers.

## Expected Output
- Decision table has exactly two rows.
- Launch date row shows status "confirmed."
- Beta tester row shows status "tentative."
- No fabricated action items beyond what the transcript
  explicitly states.
- Follow-up email does not mention internal pricing.

## Pass/Fail
PASS if all five criteria are met.
FAIL if any action item is fabricated or any status is wrong.

The evaluation bar is straightforward: does the reduce your editing time to touch-ups, or are you still rewriting the output? If you run the Skill and consistently need to reshape, restructure, or heavily correct the result, the Skill is not done. It needs more procedure, more gotchas, or better examples in its references folder.

Capability Skills and preference Skills need different tests

Some Skills teach the AI something it cannot do reliably on its own. A review that checks client emails against your company's required legal disclaimers, a format conversion that follows your organization's internal template, a data-entry procedure that enforces naming rules unique to your team: these are capability Skills. Without the instructions, the guesses or gets it wrong.

Other Skills capture how you prefer standard work done. The can already write a meeting recap or draft a follow-up email. Your specifies your format, your level of detail, your tone, and your boundaries. These are preference Skills. The model could produce something without them; it just would not match your way of working.

A stylized side-by-side teaching image showing capability Skills judged by correct rules and preference Skills judged by fit. — Capability Skills need correctness tests; preference Skills need fit and voice tests.

Whether your is a or a determines how you test corrections. When you correct a capability Skill, check whether the output got the facts right and applied the rules correctly. When you correct a preference Skill, check whether the output matches your samples and whether you would send it without editing. Capability Skills may become unnecessary as models improve at the underlying task. Preference Skills persist because your taste does not change with updates.

How thorough your testing needs to be depends on what happens when the output is wrong. A personal morning-briefing can get away with one test scenario and a quick scan. A client-facing compliance Skill needs five or more test scenarios, covering edge cases where the consequences of failure are real: missing disclaimers, unsupported claims, confidential data surfacing in the wrong section.

If a references a specific project, client, or regulatory framework, its evaluation scenarios should include a test where the Skill runs outside that . Does it produce inappropriate output when applied to unrelated work? If so, it belongs in a project folder, not in your global library.

Security essentials for every

Skills are instruction files, not data stores. Keeping this boundary clear prevents most security problems.

No passwords, API keys, or tokens anywhere in the folder.
No client names or identifying details in foundation files that travel with every .
Private data (client rosters, financial details, health information) stays in project-specific reference folders, not in portable Skills.
Approval boundaries are set for any action that sends, publishes, or shares output.
Scope your tool's folder access to the minimum needed. Do not point the AI at your entire Documents folder.

Maintenance priorities differ by task

Mini-project: run, correct, and improve

Your turn

Diagnose a skill that keeps going wrong

You'll trace one Skill failure to its root instruction, classify it by failure type, and produce the smallest fix with test scenarios.

Why this exercise matters

Corrections become valuable when they are added back to the . Diagnosing one failure at a time keeps the update small enough to review and prevents the assistant from rewriting the whole procedure unnecessarily.

You’ll leave with

A classified failure with the exact instruction (or missing instruction) that caused it
The smallest wording change that fixes the failure without breaking current behavior
A draft gotcha entry when needed, plus three test scenarios to catch the failure in the future

Use the prompt in order

1
Paste the Skill, output, and correction
Provide the instruction, bad output, and your fix together so the assistant can compare what was asked with what actually happened.
2
Check the failure label
Review whether the issue is a , procedure, output, example, approval, foundation, or gotcha problem because each one needs a different repair.
3
Approve the smallest update
Accept only the wording or test scenario that fixes this failure so the improves without becoming bloated.

Prompt details

Each detail will be inferred, or you will be asked to clarify in the chat.

3/3 details

3 details will be inferred, or you will be asked to clarify in the chat

Starter prompt text

Open the full text if you want to check what will be copied.

Show starter prompt

You are a Skill diagnostician. Your job is to trace one specific
failure back to the instruction that caused it and propose the
smallest update that prevents it from recurring.

**The evidence:**

Here is my main Skill instruction file:
Infer from what you know about me or ask if it is unclear

Here is the output that went wrong:
Infer from what you know about me or ask if it is unclear

Here is my correction:
Infer from what you know about me or ask if it is unclear

**Diagnostic process:**

1. Compare the instruction file to the bad output. Identify the
   specific instruction (or missing instruction) that let the
   failure happen.
2. Classify the failure using this taxonomy:
   - **Trigger too broad**: fired when it should not have
   - **Trigger too narrow**: did not fire when it should have
   - **Procedure gap**: unclear step, missing step, or wrong
     sequence
   - **Output format**: wrong structure, missing section, or
     unexpected formatting
   - **Missing example**: the AI guessed because no model output
     was available to follow
   - **Boundary violation**: privacy, approval, or scope rule was
     missing or unclear
   - **Foundation file gap**: voice, context, or working-rule
     mismatch
   - **New gotcha needed**: an edge case the instructions never
     addressed
3. If the failure could fit multiple classifications, name the
   primary cause and any contributing factors.

**For each classification, produce:**

- The exact quote from the instruction file that is responsible
  (or note that no relevant instruction exists)
- The smallest wording change that fixes this failure without
  breaking anything that currently works
- A draft gotcha entry if one is needed, written as a clear
  rule the AI can follow next time
- Three future test scenarios to catch this failure:
  a. Normal case that should now work correctly
  b. Edge case that probes the boundary of the fix
  c. Should-not-trigger case that confirms the fix does not
     over-correct

If any context is missing to diagnose accurately, ask me one or
two questions before proposing the update. Do not guess at what
went wrong.

What the answer should give you

You should receive a short diagnosis naming the failure type, the instruction excerpt involved, the smallest proposed edit, a gotcha if needed, and three future test scenarios.

I ran my three times with real input.
I noted every correction I made.
I asked AI to update the with those corrections.
The has a changelog note about what changed.
I classified each correction using the failure table.
I asked AI to create normal, edge, and should-not- evaluation scenarios.