Build Your Personal Assistant Operating System

3.8

Extract Structure From Anything the Assistant Reads

Raw text hides actionable information behind paragraphs of

A meeting transcript contains three action items, two decisions, one risk, and four follow-up commitments. All of them are buried in forty minutes of conversation. A customer feedback email contains a feature request, a usability complaint, and a compliment. All of them are mixed into three paragraphs of narrative.

Extraction is the of pulling structured records out of unstructured text. You define the fields you want (action items, decisions, risks, deadlines), and the assistant reads the and fills in those fields. The result is a record you can route, search, and act on, instead of a document you have to reread every time you need to find something in it.

This shows up everywhere in the system. The email extracts deadlines and commitments. The relationship module extracts contact details. The task module extracts action items. This chapter teaches the underlying pattern so you can build extraction templates for any .

A stylized teaching image showing raw text on the left being parsed into structured fields in a record on the right — An extraction template defines which fields to pull from raw text and how to validate them.

Extraction Templates

Promise	Reusable extraction patterns that turn any raw text (meeting notes, emails, feedback, reports) into structured records with defined fields, confidence levels, and validation checks.
Sources	Raw text from any source: meeting transcripts, email threads, customer feedback, project updates, documents, and notes.
Output	Structured records with named fields, each marked with a confidence level (high, medium, inferred) and validated against the extraction rules.
Saves to memory	Extraction templates saved as reusable patterns. Extracted records saved to for other modules to use.
Approval boundary	Extracts and structures freely. Routes extracted data to other modules for your review. Never acts on extracted information (sending replies, creating tasks) without your approval.
Failure behavior	If a required field cannot be found in the source, the record marks the field as missing rather than inventing content. The assistant notes which fields could not be extracted and why.
Review criteria	Every extracted field should trace back to a specific passage in the source. Inferred fields should be clearly marked. No field should contain information the source does not support.
Graduation trigger	When extraction templates consistently produce records that match what you would have extracted manually, the templates are ready for automated use.
Privacy tier	Inherits from the . Meeting notes with sensitive content inherit a higher .

The output schema defines exactly what the extraction produces

An extraction template starts with an output schema: the list of fields you want, what each field means, and what a valid value looks like.

The schema is the contract. When you hand raw meeting notes to the assistant and say 'extract using this template,' the output should match these fields. If a field cannot be filled, it appears in the record as missing with a note explaining why. The assistant never invents content to fill an empty field.

Confidence flags separate what the source said from what the assistant guessed

Every extracted field gets a confidence level:

High: the source text explicitly states this information. 'Sarah will send the report by Friday' maps directly to action_item: Sarah, send report, Friday.
Medium: the information is strongly implied. 'We should have the numbers by end of week' implies a deadline of Friday, but the exact date is not stated.
Inferred: the assistant connected information across passages to produce this field. 'The project lead mentioned needing more data' combined with 'Sarah handles all data requests' produces an inferred action for Sarah.

Inferred fields are where most extraction errors happen. The assistant is good at making plausible connections, which means it is also good at making plausible errors. When you review extracted records, check inferred fields first. A high-confidence field that traces to a specific quote is almost always correct. An inferred field that combines from multiple passages may be wrong.

Validation checks catch extraction errors before they route to other modules

After extraction, the template runs validation checks. Common checks include:

Completeness: are all required fields filled? If 'date' is required and missing, the record is flagged.
Consistency: do the fields agree with each other? An action item assigned to someone not on the attendee list is suspicious.
Duplicate detection: does this record duplicate something already in ? A follow-up that matches an existing task should be flagged rather than creating a duplicate.
Format validation: do dates, names, and numbers match expected patterns? 'Next Tuesday' should be resolved to a specific date.

Validation is what makes extraction safe to automate. Without it, a misextracted deadline routes to your task and creates a fake . With validation, the record is checked before it enters the system.

How this module breaks

What goes wrong	How you notice	What to correct once	What rule to save
The assistant extracts an action item from a hypothetical example discussed in the meeting, treating it as a real .	Your task list includes an item that was mentioned as a 'what if' scenario, not as an assignment.	Add a check to the extraction template: distinguish between commitments (someone agreed to do something) and discussions (someone described a possibility). Only commitments produce action items.	Extraction templates distinguish between commitments and discussions. Only explicit commitments (with an owner and a specific action) produce action item records.
Names are extracted inconsistently: 'Sarah Chen' in one field, 'Sarah' in another, 'S. Chen' in a third.	Routing and search break because the same person appears under multiple names. The relationship cannot match records.	Add a name normalization rule: the first time a person appears, use their full name. All subsequent references use the same form.	Names are normalized to full name on first reference. All subsequent extracted fields use the same normalized form. Ambiguous names are flagged for your confirmation.
The extraction template works well for one type of meeting but fails on a different format.	A brainstorm session produces records full of inferred action items that were ideas. A status update produces records missing risk fields because risks were discussed implicitly.	Create meeting-type variants of the extraction template. A status update template prioritizes decisions and blockers. A brainstorm template prioritizes ideas and open questions. A planning template prioritizes action items and timelines.	Extraction templates have variants for common source types. The assistant selects the best variant or asks which type of it is processing.

Your turn

Build an extraction template for any source type

You'll produce an extraction template with an output schema, confidence levels, validation checks, and test it against real source material.

Why this exercise matters

Extraction is the that connects raw information to your structured system. Every that reads depends on extraction done well.

You’ll leave with

An output schema with named fields, types, and required/optional status.
Confidence level assignments (high, medium, inferred) for each extracted field.
Validation checks for completeness, consistency, and format.
A tested extraction against one real source document.

Use the prompt in order

1
Define the fields you want extracted from this source type
List every piece of information you typically need from this source. For meeting notes: decisions, action items, risks, follow-ups, open questions. For customer feedback: feature requests, complaints, compliments, urgency. For each field, specify whether it is required or optional.
2
Set confidence levels and source-quoting rules
Tell the assistant to mark each field with high, medium, or inferred confidence. Require a source quote for every high-confidence field. Require an explanation for every inferred field.
3
Add validation checks
Define checks the assistant runs after extraction: required fields present, names consistent, dates resolved, no duplicates with existing records. Each check produces a pass or fail.
4
Test the template against real source material
Paste one real document and run the extraction. Compare the output to what you would have extracted manually. Correct any extraction errors and refine the template.

Prompt details

Each detail will be inferred, or you will be asked to clarify in the chat.

3/3 details

3 details will be inferred, or you will be asked to clarify in the chat

Starter prompt text

Open the full text if you want to check what will be copied.

Show starter prompt

Build an extraction template for this type of source material and test it on a real example.

Source type: Infer from what you know about me or ask if it is unclear

Fields I want extracted:
Infer from what you know about me or ask if it is unclear

For each extracted field:
- Mark confidence as high (directly stated), medium (strongly implied), or
  inferred (connected from multiple passages).

- Include the source quote for high-confidence fields.
- Explain the reasoning for inferred fields.

Validation checks to run after extraction:
- All required fields present
- Names normalized to consistent form
- Dates resolved to specific dates
- No duplicate commitments with what I already have

Now test this template on the following real source material:
Infer from what you know about me or ask if it is unclear

Show me the structured record, confidence flags, and validation results.

What the answer should give you

A structured record with: date (2026-05-08, high), attendees (Sarah Chen, David Park, high), decisions (2 entries with reasoning, high), action items (3 entries with owner and deadline, 2 high, 1 inferred), risks (1 entry, medium), follow-ups (2 entries, high), open questions (1 entry, high). Validation: all required fields present, names consistent, dates resolved. One inferred action item flagged for confirmation.