Document AST¶
The AST is the canonical representation between the reader, exporters, and RTF writer. It is intentionally independent of the original RTF token stream and of any one output format.
Core Nodes¶
Documents contain block nodes such as paragraphs, lists, and tables. Paragraphs contain inline nodes such as text runs, fields, links, references, tabs, line breaks, and images.
Semantic Equality¶
Document.semantic_equals() compares document meaning rather than incidental
source details. It coalesces adjacent compatible text runs and ignores source
spans, while still comparing structure, styles, metadata, notes, images, tables,
lists, and paragraph formatting.
Document AST for parsed RTF.
The AST is the canonical internal representation of an RTF document. It is independent of Markdown, Qt, and the original source byte layout. Reader modules produce these nodes; JSON, Markdown, and RTF writer modules consume them.
- class rtfstruct.ast.Annotation(id, blocks=<factory>, author=None)[source]¶
Annotation or comment content stored separately from inline flow.
- class rtfstruct.ast.AnnotationRef(id, label=None, span=None)[source]¶
Inline reference to an annotation stored on the document.
- Parameters:
id (str)
label (str | None)
span (SourceSpan | None)
- id¶
Stable internal annotation identifier.
- Type:
str
- label¶
Optional visible label.
- Type:
str | None
- span¶
Optional source span.
- Type:
rtfstruct.ast.SourceSpan | None
- class rtfstruct.ast.Color(red, green, blue, alpha=None)[source]¶
RGB color value.
- Parameters:
red (int)
green (int)
blue (int)
alpha (int | None)
- red¶
Red channel from 0 to 255.
- Type:
int
- green¶
Green channel from 0 to 255.
- Type:
int
- blue¶
Blue channel from 0 to 255.
- Type:
int
- alpha¶
Optional alpha channel from 0 to 255.
- Type:
int | None
- class rtfstruct.ast.Document(blocks=<factory>, metadata=<factory>, styles=<factory>, footnotes=<factory>, annotations=<factory>, images=<factory>, diagnostics=<factory>)[source]¶
Structured representation of a parsed RTF document.
- Parameters:
metadata (Metadata)
styles (StyleSheet)
footnotes (dict[str, Footnote])
annotations (dict[str, Annotation])
images (dict[str, Image])
diagnostics (list[Diagnostic])
- blocks¶
Top-level document blocks in reading order.
- Type:
list[rtfstruct.ast.Paragraph | rtfstruct.ast.ListBlock | rtfstruct.ast.Table]
- metadata¶
Document metadata extracted from RTF info groups.
- Type:
- styles¶
Style definitions detected or inferred during parsing.
- Type:
- footnotes¶
Footnotes keyed by stable internal identifier.
- Type:
dict[str, rtfstruct.ast.Footnote]
- annotations¶
Comments or annotations keyed by stable internal identifier.
- Type:
dict[str, rtfstruct.ast.Annotation]
- images¶
Extracted or represented images keyed by stable internal identifier.
- Type:
dict[str, rtfstruct.ast.Image]
- diagnostics¶
Recoverable parser warnings and errors.
- Type:
- semantic_equals(other)[source]¶
Return whether another document is semantically equivalent.
The initial implementation normalizes adjacent same-style text runs and compares supported Milestone 1 structure.
- Parameters:
other (Document)
- Return type:
bool
- to_json(options=None)[source]¶
Export this document to deterministic JSON-compatible data.
- Parameters:
options (JsonOptions | None)
- Return type:
dict[str, object]
- to_markdown(options=None)[source]¶
Export this document to Markdown.
Markdown export is intentionally minimal until JSON and AST behavior are stable.
- Parameters:
options (MarkdownOptions | None)
- Return type:
str
- class rtfstruct.ast.Field(instruction, children=<factory>, span=None)[source]¶
Unknown or unsupported RTF field inline node.
- Parameters:
instruction (str)
children (list[TextRun | Link | Field | FootnoteRef | AnnotationRef | ImageInline | LineBreak | Tab])
span (SourceSpan | None)
- instruction¶
Raw field instruction.
- Type:
str
- children¶
Visible field result content.
- span¶
Optional source span.
- Type:
rtfstruct.ast.SourceSpan | None
- class rtfstruct.ast.Footnote(id, blocks=<factory>)[source]¶
Footnote content stored separately from inline flow.
- class rtfstruct.ast.FootnoteRef(id, label=None, span=None)[source]¶
Inline reference to a footnote stored on the document.
- Parameters:
id (str)
label (str | None)
span (SourceSpan | None)
- id¶
Stable internal footnote identifier.
- Type:
str
- label¶
Optional visible label.
- Type:
str | None
- span¶
Optional source span.
- Type:
rtfstruct.ast.SourceSpan | None
- class rtfstruct.ast.Image(id, content_type=None, data=None, path=None, alt_text=None, width_twips=None, height_twips=None, goal_width_twips=None, goal_height_twips=None, scale_x=None, scale_y=None)[source]¶
Image metadata and optional payload extracted from an RTF picture.
- Parameters:
id (str)
content_type (str | None)
data (bytes | None)
path (str | None)
alt_text (str | None)
width_twips (int | None)
height_twips (int | None)
goal_width_twips (int | None)
goal_height_twips (int | None)
scale_x (int | None)
scale_y (int | None)
- class rtfstruct.ast.ImageInline(id, alt_text=None, span=None)[source]¶
Inline reference to an image stored on the document.
- Parameters:
id (str)
alt_text (str | None)
span (SourceSpan | None)
- class rtfstruct.ast.LineBreak(span=None)[source]¶
Inline line break.
- Parameters:
span (SourceSpan | None)
- class rtfstruct.ast.Link(target, children=<factory>, instruction=None, span=None)[source]¶
Hyperlink inline node.
- Parameters:
target (str)
children (list[TextRun | Link | Field | FootnoteRef | AnnotationRef | ImageInline | LineBreak | Tab])
instruction (str | None)
span (SourceSpan | None)
- target¶
Link target URL.
- Type:
str
- children¶
Visible link content.
- instruction¶
Raw RTF field instruction where available.
- Type:
str | None
- span¶
Optional source span.
- Type:
rtfstruct.ast.SourceSpan | None
- class rtfstruct.ast.ListBlock(ordered, items=<factory>, list_id=None)[source]¶
Resolved list block assembled after parsing paragraphs.
- Parameters:
ordered (bool)
items (list[ListItem])
list_id (int | None)
- class rtfstruct.ast.ListItem(blocks=<factory>, level=0, marker=None)[source]¶
A list item containing one or more block nodes.
- blocks¶
Blocks belonging to this list item.
- Type:
list[rtfstruct.ast.Paragraph | rtfstruct.ast.ListBlock | rtfstruct.ast.Table]
- level¶
Zero-based nesting level.
- Type:
int
- marker¶
Optional visible marker text where known.
- Type:
str | None
- class rtfstruct.ast.Metadata(title=None, subject=None, author=None, keywords=None, comment=None, company=None)[source]¶
Document metadata extracted from RTF info groups.
- Parameters:
title (str | None)
subject (str | None)
author (str | None)
keywords (str | None)
comment (str | None)
company (str | None)
- class rtfstruct.ast.Paragraph(children=<factory>, style=<factory>, span=None)[source]¶
Paragraph block containing inline children.
- Parameters:
children (list[TextRun | Link | Field | FootnoteRef | AnnotationRef | ImageInline | LineBreak | Tab])
style (ParagraphStyle)
span (SourceSpan | None)
- class rtfstruct.ast.ParagraphStyle(alignment=None, left_indent_twips=None, right_indent_twips=None, first_line_indent_twips=None, space_before_twips=None, space_after_twips=None, style_name=None, list_identity=None, list_level=None)[source]¶
Paragraph-level formatting and structural metadata.
- Parameters:
alignment (str | None)
left_indent_twips (int | None)
right_indent_twips (int | None)
first_line_indent_twips (int | None)
space_before_twips (int | None)
space_after_twips (int | None)
style_name (str | None)
list_identity (int | None)
list_level (int | None)
- class rtfstruct.ast.SourceSpan(start, end)[source]¶
Source range in the original input.
- Parameters:
start (int)
end (int)
- start¶
Inclusive source offset.
- Type:
int
- end¶
Exclusive source offset.
- Type:
int
- class rtfstruct.ast.StyleSheet(paragraph_styles=<factory>, text_styles=<factory>)[source]¶
Document stylesheet placeholder for parsed styles.
- Parameters:
paragraph_styles (dict[str, ParagraphStyle])
text_styles (dict[str, TextStyle])
- class rtfstruct.ast.Tab(span=None)[source]¶
Inline tab.
- Parameters:
span (SourceSpan | None)
- class rtfstruct.ast.Table(cells=<factory>, row_count=0, col_count=0)[source]¶
Resolved coordinate-based table.
- Parameters:
cells (list[TableCell])
row_count (int)
col_count (int)
- class rtfstruct.ast.TableCell(row, col, blocks=<factory>, rowspan=1, colspan=1, width_twips=None)[source]¶
Resolved table cell with explicit coordinates.
- Parameters:
- row¶
Zero-based row coordinate.
- Type:
int
- col¶
Zero-based column coordinate.
- Type:
int
- blocks¶
Cell content.
- Type:
list[rtfstruct.ast.Paragraph | rtfstruct.ast.ListBlock | rtfstruct.ast.Table]
- rowspan¶
Number of rows spanned by the cell.
- Type:
int
- colspan¶
Number of columns spanned by the cell.
- Type:
int
- width_twips¶
Cell width where known.
- Type:
int | None
- class rtfstruct.ast.TextRun(text, style=<factory>, span=None)[source]¶
Styled text segment.
- Parameters:
text (str)
style (TextStyle)
span (SourceSpan | None)
- text¶
Text content.
- Type:
str
- style¶
Inline style applied to the text.
- Type:
- span¶
Optional source span.
- Type:
rtfstruct.ast.SourceSpan | None
- class rtfstruct.ast.TextStyle(bold=False, italic=False, underline=False, strikethrough=False, superscript=False, subscript=False, font_family=None, font_size_half_points=None, foreground=None, background=None)[source]¶
Inline text style.
TextStyle objects are immutable and are intended to be interned by parser state so repeated style combinations share identity in large documents.