Skill Guide: Turn Repeated Work Into Reusable Skills
Thomas Meli & Agent Team
73 min leftPage 41/81 (est.)40 left
7
Turn every correction into a permanent instruction
The is live, and you are already finding mistakes
You installed your , ran it a few times, and the output was close. Close, but you corrected the same thing twice. Maybe the AI invented an action item that nobody mentioned. Maybe it used the wrong tone for the follow-up email. Maybe it included internal notes in a client-facing summary.
Every one of those repeated corrections is a permanent instruction your does not have yet. When you notice a repeated correction, tell the AI: 'That mistake keeps happening. Update the to prevent it.' The AI adds it to the , where you collect the specific rules that prevent mistakes. Over time, the Gotchas section is what separates usable output from output you have to fix.
The writes itself from real use
You do not sit down and brainstorm gotchas. You collect them. Every time the AI produces output that needs correction, tell the AI to add that correction as a permanent instruction. After three real sessions, your will contain the specific judgment calls that make your produce output you trust.
markdown
## Gotchas
v1 (initial):
- Do not invent action items that were not explicitly stated.
v2 (after second use):
- Do not invent action items that were not explicitly stated.
- If a decision was discussed but not confirmed, mark it
"tentative" rather than listing it as final.
- Speaker labels from Fathom are often wrong for the first
30 seconds. Cross-reference with the attendee list.
v3 (after client used the recap):
- Do not invent action items that were not explicitly stated.
- If a decision was discussed but not confirmed, mark it
"tentative" rather than listing it as final.
- Speaker labels from Fathom are often wrong for the first
30 seconds. Cross-reference with the attendee list.
- Never include internal pricing discussions in the
client-facing follow-up email.
- If the transcript mentions a deadline, verify it against
the project tracker before listing it as confirmed.
The loop is simple: run the , correct the miss, update the Skill, then test it.
means the whole has changed
Sometimes the problem is not one missing gotcha. The whole has drifted from how you work. The is too broad, the procedure skips a step you now consider essential, or the output format no longer matches what you need. When corrections pile up and none of them are simple gotchas, it is time for a full review.
Match the failure to the fix
The same bad output can come from different parts of the , so the fix should target the real cause.
Ask AI to track what changed
When you ask AI to update a , tell it to note what changed and why. This lets you understand the Skill's evolution and roll back if a change makes output worse. You can say: 'Update the Skill with this correction and add a changelog note about what you changed.'
AI keeps the version history inside the file. Over time, this changelog becomes a readable record of how the Skill evolved from real use.
Evaluation scenarios catch regressions
Ask AI to create two or three evaluation scenarios for each important . One should be a normal case where the Skill should clearly help. One should be an where the Skill must preserve a boundary. One should be a should-not- case where a nearby request belongs somewhere else. Say: 'Create normal, edge, and should-not-trigger test scenarios for this Skill with sample input, expected output, and pass/fail criteria.' The AI drafts them; you review.
A useful evaluation set includes ordinary use, boundary cases, and nearby requests that should not the .
markdown
# Test: Basic Meeting Recap
## Input
A 15-minute transcript between two speakers discussing
a product launch timeline. One decision is confirmed:
launch date is March 15. One decision is discussed but
not confirmed: whether to include beta testers.
## Expected Output
- Decision table has exactly two rows.
- Launch date row shows status "confirmed."
- Beta tester row shows status "tentative."
- No fabricated action items beyond what the transcript
explicitly states.
- Follow-up email does not mention internal pricing.
## Pass/Fail
PASS if all five criteria are met.
FAIL if any action item is fabricated or any status is wrong.
The evaluation bar is straightforward: does the reduce your editing time to touch-ups, or are you still rewriting the output? If you run the Skill and consistently need to reshape, restructure, or heavily correct the result, the Skill is not done. It needs more procedure, more gotchas, or better examples in its references folder.
Capability Skills and preference Skills need different tests
Some Skills teach the AI something it cannot do reliably on its own. A review that checks client emails against your company's required legal disclaimers, a format conversion that follows your organization's internal template, a data-entry procedure that enforces naming rules unique to your team: these are capability Skills. Without the instructions, the guesses or gets it wrong.
Other Skills capture how you prefer standard work done. The can already write a meeting recap or draft a follow-up email. Your specifies your format, your level of detail, your tone, and your boundaries. These are preference Skills. The model could produce something without them; it just would not match your way of working.
Capability Skills need correctness tests; preference Skills need fit and voice tests.
Whether your is a or a determines how you test corrections. When you correct a capability Skill, check whether the output got the facts right and applied the rules correctly. When you correct a preference Skill, check whether the output matches your samples and whether you would send it without editing. Capability Skills may become unnecessary as models improve at the underlying task. Preference Skills persist because your taste does not change with updates.
How thorough your testing needs to be depends on what happens when the output is wrong. A personal morning-briefing can get away with one test scenario and a quick scan. A client-facing compliance Skill needs five or more test scenarios, covering edge cases where the consequences of failure are real: missing disclaimers, unsupported claims, confidential data surfacing in the wrong section.
If a references a specific project, client, or regulatory framework, its evaluation scenarios should include a test where the Skill runs outside that . Does it produce inappropriate output when applied to unrelated work? If so, it belongs in a project folder, not in your global library.
Security essentials for every
Skills are instruction files, not data stores. Keeping this boundary clear prevents most security problems.