← Guides

Build Your Personal Assistant Operating System

Thomas Meli
170 min leftPage 77/169 (est.)92 left
3.8

Extract Structure From Anything the Assistant Reads

Raw text hides actionable information behind paragraphs of

A meeting transcript contains three action items, two decisions, one risk, and four follow-up commitments. All of them are buried in forty minutes of conversation. A customer feedback email contains a feature request, a usability complaint, and a compliment. All of them are mixed into three paragraphs of narrative.

Extraction is the of pulling structured records out of unstructured text. You define the fields you want (action items, decisions, risks, deadlines), and the assistant reads the and fills in those fields. The result is a record you can route, search, and act on, instead of a document you have to reread every time you need to find something in it.

This shows up everywhere in the system. The email extracts deadlines and commitments. The relationship module extracts contact details. The task module extracts action items. This chapter teaches the underlying pattern so you can build extraction templates for any .

A stylized teaching image showing raw text on the left being parsed into structured fields in a record on the right
An extraction template defines which fields to pull from raw text and how to validate them.

The output schema defines exactly what the extraction produces

An extraction template starts with an output schema: the list of fields you want, what each field means, and what a valid value looks like.

The schema is the contract. When you hand raw meeting notes to the assistant and say 'extract using this template,' the output should match these fields. If a field cannot be filled, it appears in the record as missing with a note explaining why. The assistant never invents content to fill an empty field.

Confidence flags separate what the source said from what the assistant guessed

Every extracted field gets a confidence level:

  • High: the source text explicitly states this information. 'Sarah will send the report by Friday' maps directly to action_item: Sarah, send report, Friday.
  • Medium: the information is strongly implied. 'We should have the numbers by end of week' implies a deadline of Friday, but the exact date is not stated.
  • Inferred: the assistant connected information across passages to produce this field. 'The project lead mentioned needing more data' combined with 'Sarah handles all data requests' produces an inferred action for Sarah.

Inferred fields are where most extraction errors happen. The assistant is good at making plausible connections, which means it is also good at making plausible errors. When you review extracted records, check inferred fields first. A high-confidence field that traces to a specific quote is almost always correct. An inferred field that combines from multiple passages may be wrong.

Validation checks catch extraction errors before they route to other modules

After extraction, the template runs validation checks. Common checks include:

  • Completeness: are all required fields filled? If 'date' is required and missing, the record is flagged.
  • Consistency: do the fields agree with each other? An action item assigned to someone not on the attendee list is suspicious.
  • Duplicate detection: does this record duplicate something already in ? A follow-up that matches an existing task should be flagged rather than creating a duplicate.
  • Format validation: do dates, names, and numbers match expected patterns? 'Next Tuesday' should be resolved to a specific date.

Validation is what makes extraction safe to automate. Without it, a misextracted deadline routes to your task and creates a fake . With validation, the record is checked before it enters the system.