Skip to content

Lexical Analysis

mmaness edited this page Oct 5, 2011 · 15 revisions

A Survenity program, henceforth called a Survenity survey, is read by a parser. The Survenity language is described using a parsing expression grammar (see the section Grammar Specification). The parser is based on Citrus and Treetop, which are recursive descent parsers written in Ruby. Citrus is used for most parsing while Treetop is used as a backup for providing useful error messages.

This page describes how the Survenity parser breaks up a file into tokens, which represent parts of a survey.

Line Structure

Survenity is a line-oriented language, so it uses the end-of-line (character/sequence) to denote the end of statements. A Survenity survey is a sequence of logical lines. Currently, logical lines consist of only one physical line, so a Survenity survey may also be seen as a sequence of physical lines.

Physical Line

A physical line is a character sequence which is terminated by an end-of-line. The end-of-line must be denoted by a linefeed character (ASCII LF: \n). Optionally, a return character (ASCII CR: \r) may be included before the linefeed character.

Logical Line

Currently, a logical line is a single physical line. In this specification, logical line will be used in reference to places in which it would be possible that a change in the definition of a logical line would make using the term "physical line" inappropriate or incorrect.

Comment

A comment is a statement which is used purely for aiding human-comprehension of a Survenity survey. It is not used in any calculations. Comments are denoted with a hash symbol (#) and continues to the end of the physical line. Comments are ignored by the syntax and are not considered to be tokens.

Line Joining

Line joining '''is not implemented''' in Survenity.

Blank Lines

Blank lines are ignored by Survenity. A blank line is a logical line which contains only whitespace (space characters and tab characters) and/or a comment. Blank lines are allowed inside and outside of blocks. It is suggested, but not required, to use blank lines for readability purposes.

Indentation

Indentation is not required in Survenity but is suggested for nesting blocks since many blocks do not require to have an explicit symbol to denote the end of a block. It may aid in the readability of the survey.

Whitespace between Spaces

Space and tab characters are used to separate tokens in Survenity surveys. Any number of spaces and tabs can be used between tokens. Some tokens may also include whitespace in their specification. Remember that end-of-line sequences or characters cannot be used to separate tokens within the same statement.

Tokens

A token is a sequence of characters which represent an abstract unit of a program. For example, a plus sign, '+', is a token which specifies that the expression before it should be added to the expression after it. Another example is the sequence '123.7' which is a token which may represent a number which is one-hundred-twenty-three and seven-tenths. The tokens used in Survenity will be described in the following sections.

Question Name (Constant)

A question name or constant is a token which follows the following form:

name ::= uppercase (letter | digit | "_")*
letter ::= uppercase | lowercase
uppercase ::= 'A' | 'B' | ... | 'Z'
lowercase ::= 'a' | 'b' | ... | 'z'
digit ::= '0' | '1' | .. | '9'

A question name can have unlimited length and is case-sensitive.

Identifier

An identifier is a token which follows the following form:

name ::= lowercase (letter | digit | "_")*
letter ::= uppercase | lowercase
uppercase ::= 'A' | 'B' | ... | 'Z'
lowercase ::= 'a' | 'b' | ... | 'z'
digit ::= '0' | '1' | .. | '9'

An identifier can have unlimited length and is case-sensitive.

Keywords

A keyword or reserved word is an identifier (or question name, in one case) which cannot be used as an identifier for normal purposes. Survenity has keywords which are used for a variety of tasks and some keywords are chosen to prevent confusion or are intended for future use. The following keywords are used in Survenity:

class  do      each      EndSurvey  else   end   except
false  for     function  goto       if     load  method
null   return  true      when       while

Literals

Literals are notations for the value of some built-in types.

Integer Literal

Integers may only be represented in decimal notation, with the following expression grammar:

integer ::= ('-')?  digit+
digit ::= ('1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | '0')

Negative integers are allowed with the dash character. Note that the first digit must not be separated from the dash by whitespace.

Decimal Literal

Decimals may only be represented in decimal notation. Decimals must have at least one digit after the decimal point. The following grammar describes the decimal format:

decimal ::= ('-')? digit* '.' digit+
digit ::= ('1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | '0')

Similar to the integer interal, negative decimal numbers are also allowed.

String Literal

String literals can take two forms, double quote strings and single quote strings. Their names refer to the character used to enclose the string (sequence of characters). It follows the following grammar:

string ::= single_quote_string | double_quote_string
single_quote_string ::= "'" (character | '"')* "'"
double_quote_string ::= '"' (character | ('\' (character | '"') )* '"'
character ::= <any character other than an escape character '\', double quote, or single quote>

Operators

These tokens are operators:

+   -   *   /   //   ^   %   ..

Delimiters

The following tokens are used as delimiters in Survenity:

(   )   [   ]   <<   <-   =   ,   .

The following tokens have additional meaning in relation to other tokens:

 '   "   #

Go back to Survenity Language Reference

Home | JULIE Intro | Getting Started with JULIE | JULIE References | JULIE Development | Survenity Language