Module parse

Source
Expand description

Parsing support for the network document meta-format

The meta-format used by Tor network documents evolved over time from a legacy line-oriented format. It’s described more fully in Tor’s dir-spec.txt.

In brief, a network document is a sequence of tokenize::Items. Each Item starts with a keyword::Keyword, takes a number of arguments on the same line, and is optionally followed by a PEM-like base64-encoded object.

Individual document types define further restrictions on the Items. They may require Items with a particular keyword to have a certain number of arguments, to have (or not have) a particular kind of object, to appear a certain number of times, and so on.

More complex documents can be divided into parser::Sections. A Section might correspond to the header or footer of a longer document, or to a single stanza in a longer document.

To parse a document into a Section, the programmer defines a type of keyword that the document will use, using the decl_keyword! macro. The programmer then defines a parser::SectionRules object, containing a rules::TokenFmt describing the rules for each allowed keyword in the section. Finally, the programmer uses a tokenize::NetDocReader to tokenize the document, passing the stream of tokens to the SectionRules object to validate and parse it into a Section.

For multiple-section documents, this crate uses Itertools::peeking_take_while (via a [.pause_at](NetDocReader::pause_at) convenience method) and a batching_split_before module which can split a document item iterator into sections..

Modules§

keyword 🔒
Declaration for the Keyword trait.
macros 🔒
Declares macros to help implementing parsers.
parser 🔒
Based on a set of rules, validate a token stream and collect the tokens by type.
rules 🔒
Keywords for interpreting items and rules for validating them.
tokenize 🔒
Break a string into a set of directory-object Items.