Expand description
Parsing support for the network document meta-format
The meta-format used by Tor network documents evolved over time from a legacy line-oriented format. It’s described more fully in Tor’s dir-spec.txt.
In brief, a network document is a sequence of tokenize::Items. Each Item starts with a keyword::Keyword, takes a number of arguments on the same line, and is optionally followed by a PEM-like base64-encoded object.
Individual document types define further restrictions on the Items. They may require Items with a particular keyword to have a certain number of arguments, to have (or not have) a particular kind of object, to appear a certain number of times, and so on.
More complex documents can be divided into parser::Sections. A Section might correspond to the header or footer of a longer document, or to a single stanza in a longer document.
To parse a document into a Section, the programmer defines a type
of keyword that the document will use, using the
decl_keyword!
macro. The programmer then defines a
parser::SectionRules object, containing a rules::TokenFmt
describing the rules for each allowed keyword in the
section. Finally, the programmer uses a tokenize::NetDocReader
to tokenize the document, passing the stream of tokens to the
SectionRules object to validate and parse it into a Section.
For multiple-section documents, this crate uses
Itertools::peeking_take_while
(via a [.pause_at
](NetDocReader::pause_at) convenience method)
and a batching_split_before
module which can split
a document item iterator into sections..
Modules§
- keyword 🔒
- Declaration for the Keyword trait.
- macros 🔒
- Declares macros to help implementing parsers.
- parser 🔒
- Based on a set of rules, validate a token stream and collect the tokens by type.
- rules 🔒
- Keywords for interpreting items and rules for validating them.
- tokenize 🔒
- Break a string into a set of directory-object Items.