[section {PEG serialization format}] Here we specify the format used by the Parser Tools to serialize Parsing Expression Grammars as immutable values for transport, comparison, etc. [para] We distinguish between [term regular] and [term canonical] serializations. While a PEG may have more than one regular serialization only exactly one of them will be [term canonical]. [list_begin definitions][comment {-- serializations --}] [def {regular serialization}] [list_begin enumerated][comment {-- regular points --}] [enum] The serialization of any PEG is a nested Tcl dictionary. [enum] This dictionary holds a single key, [const pt::grammar::peg], and its value. This value holds the contents of the grammar. [enum] The contents of the grammar are a Tcl dictionary holding the set of nonterminal symbols and the starting expression. The relevant keys and their values are [list_begin definitions][comment {-- grammar keywords --}] [def [const rules]] The value is a Tcl dictionary whose keys are the names of the nonterminal symbols known to the grammar. [list_begin enumerated][comment {-- nonterminals --}] [enum] Each nonterminal symbol may occur only once. [enum] The empty string is not a legal nonterminal symbol. [enum] The value for each symbol is a Tcl dictionary itself. The relevant keys and their values in this dictionary are [list_begin definitions][comment {-- nonterminal keywords --}] [def [const is]] The value is the serialization of the parsing expression describing the symbols sentennial structure, as specified in the section [sectref {PE serialization format}]. [def [const mode]] The value can be one of three values specifying how a parser should handle the semantic value produced by the symbol. [include ../modes.inc] [list_end][comment {-- nonterminal keywords --}] [list_end][comment {-- nonterminals --}] [def [const start]] The value is the serialization of the start parsing expression of the grammar, as specified in the section [sectref {PE serialization format}]. [list_end][comment {-- grammar keywords --}] [enum] The terminal symbols of the grammar are specified implicitly as the set of all terminal symbols used in the start expression and on the RHS of the grammar rules. [list_end][comment {-- regular points --}] [def {canonical serialization}] The canonical serialization of a grammar has the format as specified in the previous item, and then additionally satisfies the constraints below, which make it unique among all the possible serializations of this grammar. [list_begin enumerated][comment {-- canonical points --}] [enum] The keys found in all the nested Tcl dictionaries are sorted in ascending dictionary order, as generated by Tcl's builtin command [cmd {lsort -increasing -dict}]. [enum] The string representation of the value is the canonical representation of a Tcl dictionary. I.e. it does not contain superfluous whitespace. [list_end][comment {-- canonical points --}] [list_end][comment {-- serializations --}] [subsection Example] Assuming the following PEG for simple mathematical expressions [para] [include ../example/expr_peg.inc] [para] then its canonical serialization (except for whitespace) is [para] [include ../example/expr_serial.inc] [para]