full-parser

A flexible and customizable CLR parser with your manual configuration written with java language and JavaFX library

Documentation

This project consists of two primary sections:

lexical analysis
synthesis analysis

Note That

This project has been implemented with java and javafx in net beans ide.
The demo of this project is available in exe and jar format. you can download it here.
This compiler takes two customized Configuration.
- one for lexemes
- the other for grammars

So you must first write a config file for lexical analyzer and one, for synthesis analyzer in the format below:

lexeme.conf format

lexeme_name_upper_case := lexeme_regex

For example: DIGIT := [0-9]

You can define the lexemes using regexes with these operators:

* : for a phrase repetition for zero or more times

a* : {ε, a, aa, aaa, ...}

+ : for a phrase repetition for more than one times

a+ : {ε, a, aa, aaa, ...}

| : for using or concept

(a|b) : {a} or {b}

() : for keeping a pharse alongside together

(ab)+ : {ab, abab, ababab, ...}

[] : for defining a period

[0-9] : {0,1,2,3,4,5,6,7,8,9}

\LEXEME_NAME : for using some onother defined lexeme

DIGIT := [0-9] LETTER := [a-z]|[A-Z] ID := \LETTER(\LETTER|_|\DIGIT)+

grammar.conf format

grammar_name_upper_case := grammar_defenition

For example: S := if ([E])

Some rules for grammar configuration file:

use [GRAMMAR_RULE] for using other rules in the current rule

S := if ([E]) {[F]} E := id < 3 id F := string id = [R]

IMPORANT NOTICE

In the current code in this repository, you can use these phrases in grammars to determine the type of expression: id and literal

For example E := id is a grammar in which E derives id type. here, id is not just one string named "id". it consists of all strings defined for ID lexeme

To add more lexeme recognizer, add other defined lexemes in this line of code in file ASCK_COMPILER.java : (line 371)

if ("ID".equals(tkArray.get(i).getKey()) || "LITERAL".equals(tkArray.get(i).getKey())) {
    inputs.add(tkArray.get(i).getKey().toLowerCase());
}

This, will cause each tokens (of the set of tokens which is going to be parsed), to be compared and if it was an ID lexeme or LITERAL lexeme, then change the parser current input to the lexeme_name, (not the original string)

PAY ATTENTION THAT

The project may have some bugs, and you might need to manipulate some rules in config files.

If you want more information and some orders, you can contact us by email

Running the tests

Here is an example for a code (stranger things fancy code) is going to be tokenized at first and second, to be parsed.

The output is

set of tokenized lexemes (or errors)
an answer for that if the code is correct (in accordance with the grammar) or not

Authors

Soheil Changizi ( @cocolico14 )
Alireza Kavian ( @alirezakay )

Org.

Brilacasck

Team

ASCK TEAM

License

This project is licensed under the GNU GPLv3 License - see the LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
nbproject		nbproject
src		src
LICENSE		LICENSE
README.md		README.md
grammar1.conf		grammar1.conf
input1.in		input1.in
lexeme_skel.conf		lexeme_skel.conf
manifest.mf		manifest.mf
test.png		test.png
testIn.txt		testIn.txt

License

brilacasck/full-compiler

Folders and files

Latest commit

History

Repository files navigation

full-parser

Documentation

lexeme.conf format

You can define the lexemes using regexes with these operators:

grammar.conf format

Some rules for grammar configuration file:

IMPORANT NOTICE

Running the tests

The output is

Authors

Org.

Team

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages