Document AST

The AST is the canonical representation between the reader, exporters, and RTF writer. It is intentionally independent of the original RTF token stream and of any one output format.

Core Nodes

Documents contain block nodes such as paragraphs, lists, and tables. Paragraphs contain inline nodes such as text runs, fields, links, references, tabs, line breaks, and images.

Semantic Equality

Document.semantic_equals() compares document meaning rather than incidental source details. It coalesces adjacent compatible text runs and ignores source spans, while still comparing structure, styles, metadata, notes, images, tables, lists, and paragraph formatting.

Document AST for parsed RTF.

The AST is the canonical internal representation of an RTF document. It is independent of Markdown, Qt, and the original source byte layout. Reader modules produce these nodes; JSON, Markdown, and RTF writer modules consume them.

class rtfstruct.ast.Annotation(id, blocks=<factory>, author=None)[source]

Annotation or comment content stored separately from inline flow.

Parameters:
class rtfstruct.ast.AnnotationRef(id, label=None, span=None)[source]

Inline reference to an annotation stored on the document.

Parameters:
  • id (str)

  • label (str | None)

  • span (SourceSpan | None)

id

Stable internal annotation identifier.

Type:

str

label

Optional visible label.

Type:

str | None

span

Optional source span.

Type:

rtfstruct.ast.SourceSpan | None

class rtfstruct.ast.Color(red, green, blue, alpha=None)[source]

RGB color value.

Parameters:
  • red (int)

  • green (int)

  • blue (int)

  • alpha (int | None)

red

Red channel from 0 to 255.

Type:

int

green

Green channel from 0 to 255.

Type:

int

blue

Blue channel from 0 to 255.

Type:

int

alpha

Optional alpha channel from 0 to 255.

Type:

int | None

class rtfstruct.ast.Document(blocks=<factory>, metadata=<factory>, styles=<factory>, footnotes=<factory>, annotations=<factory>, images=<factory>, diagnostics=<factory>)[source]

Structured representation of a parsed RTF document.

Parameters:
blocks

Top-level document blocks in reading order.

Type:

list[rtfstruct.ast.Paragraph | rtfstruct.ast.ListBlock | rtfstruct.ast.Table]

metadata

Document metadata extracted from RTF info groups.

Type:

rtfstruct.ast.Metadata

styles

Style definitions detected or inferred during parsing.

Type:

rtfstruct.ast.StyleSheet

footnotes

Footnotes keyed by stable internal identifier.

Type:

dict[str, rtfstruct.ast.Footnote]

annotations

Comments or annotations keyed by stable internal identifier.

Type:

dict[str, rtfstruct.ast.Annotation]

images

Extracted or represented images keyed by stable internal identifier.

Type:

dict[str, rtfstruct.ast.Image]

diagnostics

Recoverable parser warnings and errors.

Type:

list[rtfstruct.diagnostics.Diagnostic]

semantic_equals(other)[source]

Return whether another document is semantically equivalent.

The initial implementation normalizes adjacent same-style text runs and compares supported Milestone 1 structure.

Parameters:

other (Document)

Return type:

bool

to_json(options=None)[source]

Export this document to deterministic JSON-compatible data.

Parameters:

options (JsonOptions | None)

Return type:

dict[str, object]

to_markdown(options=None)[source]

Export this document to Markdown.

Markdown export is intentionally minimal until JSON and AST behavior are stable.

Parameters:

options (MarkdownOptions | None)

Return type:

str

to_rtf()[source]

Export this document to RTF.

Return type:

str

class rtfstruct.ast.Field(instruction, children=<factory>, span=None)[source]

Unknown or unsupported RTF field inline node.

Parameters:
instruction

Raw field instruction.

Type:

str

children

Visible field result content.

Type:

list[rtfstruct.ast.TextRun | rtfstruct.ast.Link | rtfstruct.ast.Field | rtfstruct.ast.FootnoteRef | rtfstruct.ast.AnnotationRef | rtfstruct.ast.ImageInline | rtfstruct.ast.LineBreak | rtfstruct.ast.Tab]

span

Optional source span.

Type:

rtfstruct.ast.SourceSpan | None

class rtfstruct.ast.Footnote(id, blocks=<factory>)[source]

Footnote content stored separately from inline flow.

Parameters:
class rtfstruct.ast.FootnoteRef(id, label=None, span=None)[source]

Inline reference to a footnote stored on the document.

Parameters:
  • id (str)

  • label (str | None)

  • span (SourceSpan | None)

id

Stable internal footnote identifier.

Type:

str

label

Optional visible label.

Type:

str | None

span

Optional source span.

Type:

rtfstruct.ast.SourceSpan | None

class rtfstruct.ast.Image(id, content_type=None, data=None, path=None, alt_text=None, width_twips=None, height_twips=None, goal_width_twips=None, goal_height_twips=None, scale_x=None, scale_y=None)[source]

Image metadata and optional payload extracted from an RTF picture.

Parameters:
  • id (str)

  • content_type (str | None)

  • data (bytes | None)

  • path (str | None)

  • alt_text (str | None)

  • width_twips (int | None)

  • height_twips (int | None)

  • goal_width_twips (int | None)

  • goal_height_twips (int | None)

  • scale_x (int | None)

  • scale_y (int | None)

class rtfstruct.ast.ImageInline(id, alt_text=None, span=None)[source]

Inline reference to an image stored on the document.

Parameters:
  • id (str)

  • alt_text (str | None)

  • span (SourceSpan | None)

class rtfstruct.ast.LineBreak(span=None)[source]

Inline line break.

Parameters:

span (SourceSpan | None)

Hyperlink inline node.

Parameters:
target

Link target URL.

Type:

str

children

Visible link content.

Type:

list[rtfstruct.ast.TextRun | rtfstruct.ast.Link | rtfstruct.ast.Field | rtfstruct.ast.FootnoteRef | rtfstruct.ast.AnnotationRef | rtfstruct.ast.ImageInline | rtfstruct.ast.LineBreak | rtfstruct.ast.Tab]

instruction

Raw RTF field instruction where available.

Type:

str | None

span

Optional source span.

Type:

rtfstruct.ast.SourceSpan | None

class rtfstruct.ast.ListBlock(ordered, items=<factory>, list_id=None)[source]

Resolved list block assembled after parsing paragraphs.

Parameters:
  • ordered (bool)

  • items (list[ListItem])

  • list_id (int | None)

class rtfstruct.ast.ListItem(blocks=<factory>, level=0, marker=None)[source]

A list item containing one or more block nodes.

Parameters:
blocks

Blocks belonging to this list item.

Type:

list[rtfstruct.ast.Paragraph | rtfstruct.ast.ListBlock | rtfstruct.ast.Table]

level

Zero-based nesting level.

Type:

int

marker

Optional visible marker text where known.

Type:

str | None

class rtfstruct.ast.Metadata(title=None, subject=None, author=None, keywords=None, comment=None, company=None)[source]

Document metadata extracted from RTF info groups.

Parameters:
  • title (str | None)

  • subject (str | None)

  • author (str | None)

  • keywords (str | None)

  • comment (str | None)

  • company (str | None)

class rtfstruct.ast.Paragraph(children=<factory>, style=<factory>, span=None)[source]

Paragraph block containing inline children.

Parameters:
class rtfstruct.ast.ParagraphStyle(alignment=None, left_indent_twips=None, right_indent_twips=None, first_line_indent_twips=None, space_before_twips=None, space_after_twips=None, style_name=None, list_identity=None, list_level=None)[source]

Paragraph-level formatting and structural metadata.

Parameters:
  • alignment (str | None)

  • left_indent_twips (int | None)

  • right_indent_twips (int | None)

  • first_line_indent_twips (int | None)

  • space_before_twips (int | None)

  • space_after_twips (int | None)

  • style_name (str | None)

  • list_identity (int | None)

  • list_level (int | None)

class rtfstruct.ast.SourceSpan(start, end)[source]

Source range in the original input.

Parameters:
  • start (int)

  • end (int)

start

Inclusive source offset.

Type:

int

end

Exclusive source offset.

Type:

int

class rtfstruct.ast.StyleSheet(paragraph_styles=<factory>, text_styles=<factory>)[source]

Document stylesheet placeholder for parsed styles.

Parameters:
class rtfstruct.ast.Tab(span=None)[source]

Inline tab.

Parameters:

span (SourceSpan | None)

class rtfstruct.ast.Table(cells=<factory>, row_count=0, col_count=0)[source]

Resolved coordinate-based table.

Parameters:
  • cells (list[TableCell])

  • row_count (int)

  • col_count (int)

class rtfstruct.ast.TableCell(row, col, blocks=<factory>, rowspan=1, colspan=1, width_twips=None)[source]

Resolved table cell with explicit coordinates.

Parameters:
  • row (int)

  • col (int)

  • blocks (list[Paragraph | ListBlock | Table])

  • rowspan (int)

  • colspan (int)

  • width_twips (int | None)

row

Zero-based row coordinate.

Type:

int

col

Zero-based column coordinate.

Type:

int

blocks

Cell content.

Type:

list[rtfstruct.ast.Paragraph | rtfstruct.ast.ListBlock | rtfstruct.ast.Table]

rowspan

Number of rows spanned by the cell.

Type:

int

colspan

Number of columns spanned by the cell.

Type:

int

width_twips

Cell width where known.

Type:

int | None

class rtfstruct.ast.TextRun(text, style=<factory>, span=None)[source]

Styled text segment.

Parameters:
text

Text content.

Type:

str

style

Inline style applied to the text.

Type:

rtfstruct.ast.TextStyle

span

Optional source span.

Type:

rtfstruct.ast.SourceSpan | None

class rtfstruct.ast.TextStyle(bold=False, italic=False, underline=False, strikethrough=False, superscript=False, subscript=False, font_family=None, font_size_half_points=None, foreground=None, background=None)[source]

Inline text style.

TextStyle objects are immutable and are intended to be interned by parser state so repeated style combinations share identity in large documents.

Parameters:
  • bold (bool)

  • italic (bool)

  • underline (bool)

  • strikethrough (bool)

  • superscript (bool)

  • subscript (bool)

  • font_family (str | None)

  • font_size_half_points (int | None)

  • foreground (Color | None)

  • background (Color | None)

class rtfstruct.ast.TextStyleInterner[source]

Intern immutable TextStyle values by field tuple.

intern(style)[source]

Return a shared instance equivalent to style.

Parameters:

style (TextStyle) – Style value to intern.

Returns:

The canonical style instance.

Return type:

TextStyle

with_changes(style, **changes)[source]

Return an interned copy of style with dataclass field changes.

Parameters:
Return type:

TextStyle