rudilexanalyzer: Rudimentary Lexical Analyzer by Python

It's a simple lexical analyzer which accepts grammar with .txt extension, tokenize it, and store tokens in python list data structure. It can be useful for the later syntactical analyzer, like LR1 parser, LL1 parser, etc.
Provided grammars in this repository are based on Java Language Grammars.

Grammar notation conventions

For more readability and ease of analyzing input grammar should have the following requirements:

Each line of the text file should have only one product rule.
Each non-terminal token should start with % character.
Each terminal token should start with ~ character.
Non-terminals should contain only ASCII alphabetical characters, digits, or underscores _.
Terminals can consist of "any" character except % and ~ and space.
Immediately after the left-hand side of the rule these characters := are necessary.
Analyzer will ignore all characters out of the mentioned rules.

How to Write Grammar

Following simple grammar is a part of Java language grammar:

BreakStatement:
   break ;
   break Identifier ;

Terminals are keyword break and semicolon ;.
Non-terminals are BreakStatement and Identifier.
As mentioned notations are necessary this grammar should written like this in .txt file:

%BreakStatement := ~break ~;
%BreakStatement := ~break %Identifier ~;

How to Run Code Snippet

It's just a python script. Provide grammar rules in a text file in the same directory. Then call analyze_grammar function for tokenizing grammar. Its argument is path to the text file. There is an auxiliary function, print_grammar, which is for printing grammar line by line. You could pass returned object of first call to the latter function in order to examine results.

This Project is a part of Compiler Design course which is one of Kharazmi University's courses.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
README.md		README.md
grammar_comment.txt		grammar_comment.txt
grammar_for_statement.txt		grammar_for_statement.txt
rudilexanalyzer.py		rudilexanalyzer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

grammar_comment.txt

grammar_comment.txt

grammar_for_statement.txt

grammar_for_statement.txt

rudilexanalyzer.py

rudilexanalyzer.py

Repository files navigation

rudilexanalyzer: Rudimentary Lexical Analyzer by Python

Grammar notation conventions

How to Write Grammar

How to Run Code Snippet

About

Languages

erfansobhaei/rudilexanalyzer

Folders and files

Latest commit

History

Repository files navigation

rudilexanalyzer: Rudimentary Lexical Analyzer by Python

Grammar notation conventions

How to Write Grammar

How to Run Code Snippet

About

Topics

Resources

Stars

Watchers

Forks

Languages