Skip to content

Annotated Network and Tree grammar

Tim Vaughan edited this page Mar 16, 2015 · 7 revisions

The following grammar constitutes a precise description of the annotated tree/network format used in the NEXUS files produced by MASTER. This format is largely based on the use of the Nexus meta-comment format used by BEAST and FigTree.

In addition, this grammar allows for hybrid nodes as required by the Extended Newick format. Unlike the that specification, however, we explicitly allow for multiple root nodes, which is important for the representation of general inheritance graphs.

The grammar itself is basically in EBNF but also has regexp-style character sets such as [0-9] and [a-zA-Z]. Quotation marks surround terminals.

NETWORK ::= ("[&" TYPE "]")? (NODE ("," NODE)*)? ";"

TYPE ::= [RU]

NODE ::= INTERNAL | LEAF
INTERNAL ::= "(" NODE ("," NODE)* ")" POST
LEAF ::= POST
POST ::= LABEL? HYBRID? ANNOT? (":",NUMBER)?

HYBRID ::= "#" ("H" | "R" | "LGT")? INTEGER
ANNOT ::= "[&" ATTRIB ("," ATTRIB)* "]"
ATTRIB ::= STRING "=" VALUE
VALUE ::= NUMBER | STRING | VECTOR
VECTOR ::= "{" (NUMBER | VECTOR) ("," (NUMBER | VECTOR))* "}"

DIGIT  ::= [0-9]
INTEGER ::= "0" | [1-9] DIGIT*
NUMBER ::= DIGIT+ ("." DIGIT+)? ([eE]DIGIT+)?
STRING ::= [a-zA-Z0-9|*%/.-+]+
       | "\"" [^"]+ "\""
       | "'" [^']+ "'"

All non-matching whitespace is ignored.