Performance¶
rtfstruct uses streaming tokenization, list-backed text buffering, capped
diagnostics, and parser safety limits to avoid common malformed-input blowups.
Guardrails¶
ParserOptions.max_group_depthcaps nested RTF groups.ParserOptions.max_document_charscaps emitted document text.ParserOptions.max_diagnosticscaps retained diagnostics.Image payloads are captured only through explicit parser state and can be omitted from JSON output.
Benchmark¶
A dependency-free generated corpus benchmark is available:
python benchmarks/parse_generated.py --paragraphs 10000
The benchmark is intentionally simple. It is meant for local regression checks before adopting heavier benchmark tooling.