GitHub - EmilGedda/Leonardo: A parser/lexer/compiler for the Logo language. Written in C++ using Bison & Flex

EmilGedda / Leonardo Public

Notifications You must be signed in to change notification settings
Fork 1
Star 2

A parser/lexer/compiler for the Logo language. Written in C++ using Bison & Flex

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
custom_test		custom_test
scripts		scripts
src		src
.gitignore		.gitignore
AUTHORS		AUTHORS
COPYING		COPYING
ChangeLog		ChangeLog
INSTALL		INSTALL
Makefile.am		Makefile.am
NEWS		NEWS
README		README
configure.ac		configure.ac

Repository files navigation

===PROGP S4===

Emil Gedda (egedda@kth.se)

DESCRIPTION

My implementation of the Logo++ lab S4.
Written in C++ using bison as a parser generator (the grammar stuff),
and Flex as a tokenizer/lexical analyzer.

The core program is Flex tokenize:ing the input, and passing said tokens
to bison, which in turns construct our abstract syntax tree (AST) using
a specific set of grammar. Our AST is then executed using a Driver class.
An AST will be executed under a given context, which defines variables
and a turtle to operate upon.

FILES

Every code-related file resides in the src/ directory.

expression/: The expression folder contains all statements and
expressions used by the parser.

turtle/: The turtle folder contains the turtle specific logic

writers/: Contains the code related to outputting the images in
different formats such as SVG.

writer.hpp: An abstract interface for all the writers.

test/: The sample leonardo++ code provided.

ast.hpp A simple container class for an abstract syntax tree

context.hpp Runtime context which stores expressions and statements
inside a AST.

driver.{h,c}pp Connect the scanner and the parser together with streams

main.cpp The main-file parsing arguments and such

parser.yy This Bison related file contains the grammar for the language

scanner.ll Flex related code. The regex's for tokenization of string
to tokens which Bison then collects and parses.

BISON AUTO GENERATED FILES:
location.hh
parser.cc The actual parser code generated from parser.yy
parser.hpp
position.hh
stack.hh
y.tab.h

FLEX AUTO GENERATED FILES:
FlexLexer.h An example lexer provided by Flex
scanner.cc A case insensitive lexer/scanner

THE CODE

The code consists of 4 major parts
- The bison code
- The flex code
- The expressions and AST
- And the rest (Driver, Turtle, and Writers)

The user defined Bison code is located in src/parser.yy.
Bison allows of defining context-free grammar, and the ability to inline
code inside the grammar.

Lets take the grammar for a multiplication/division expression as an example

mulexpr: unaryexpr
{
$$ = $1;
}
| mulexpr MUL unaryexpr
{
std::unique_ptr<ArithNode> node1($1);
std::unique_ptr<ArithNode> node2($3);
$$ = new NMultiply(std::move(node1), std::move(node2));
}
| mulexpr DIV unaryexpr
{
std::unique_ptr<ArithNode> node1($1);
std::unique_ptr<ArithNode> node2($3);
$$ = new NDivide(std::move(node1), std::move(node2));
}

The bison grammar resembles BNF-grammar a fair bit, and the above grammar
may be rewritten into BNF as such:

mulexpr ::= unaryexpr | mulexpr MUL unaryexpr | mulexpr DIV unaryexpr

MUL and DIV are terminals (tokens) and mulexpr and unaryexpr are non-terminals.

The block of curly braces { } denotes some code to be run after Bison has
successfully parsed a given grammar.

A code block may be placed anywhere in the grammar. For example, in the REP
grammar a codeblock is placed before the parsing of the logo statements
inside the REP, which adds a new AST on top of our AST stack, which then after
the REP is done being parsed is popped back and inserted into the STRep
instance. This allows for nesting of reps without any hassle.

Inside the code block, Bison exposes the parsed non-terminals as variables
such as $1, $2, $3 ... and so on. The number after the $ corresponds to the
position of the non-terminal in the grammar. Thus, $1 and $3 in the above
example refers earlier parsed ArithNodes, infront and behind our MUL/DIV
token ($2 would simply refer to the string defined for the token MUL or DIV).
The $$ variable defines the "return" value of the grammar. Assigning a
ArithNode to the value of the current parsed grammar, which bison then uses
in other grammars as $1, $2 ...

The priority rules of the arithmetic operators isn't defined anywhere
in the code. Our well defined grammar handles that automatically.

NOTES

I was lazy and used std::unique_ptr everywhere, which takes care of
deleting new:ed variabled automatically, so I don't have to worry
about memory leaks.

COMPILING

Prerequesites:
A ISO-compliant C++11 compiler (finns på skoldatorerna, gcc 4.9)
Some other stuff which every Linux-dist should have.
Also works under msys and bash on windows (if I remember correctly).

Otherwise, use the Ubuntu computers @ KTH.

Steps:
Make sure you are in the correct directory (the root of the project),
and not inside the src dir, and then run:

./configure
make

In bash or your favourite POSIX-shell.
This will create a executable called leonardo in the src dir.

USAGE

Leonardo currently defaults to outputting SVG on stdout.
It does not do any PDF-converting (Austrin sa att det var lugnt att skippa SVG -> PDF)
Brackets denote optional parameters.

leonardo [-s] [-p] [-text] [FILE]

-s Enables debugging in the Scanner (Flex)
-p Enabled debugging in the Parser (Bison)
Note: Debugging information will be printed on the stdout
together with the output of the execute logo code.

-text Sets the output format as defined by
the S2 lab, not really useful for anything.
Without this flag, SVG is used instead.

FILE The file containing logo-code which should be parsed.
If FILE is not present or none was given, leonardo simply
reads the standard input (stdin) until an EOF is recieved.

For example:

leonardo custom_test/custom1.txt > myfile.svg

This parses custom1.txt and then pipes the resulting output to
a file called myfile.svg which may then be viewed in any recent browser.