Skip to content
This repository has been archived by the owner on Nov 8, 2022. It is now read-only.

antedeguemon/earleyparser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

earleyparser

This is a Python implementation of the Earley parser algorithm.

It determines if a given word belongs to a specified context-free language. If the word belongs to the specified language, it also mounts the parsing derivation tree.

I made this library as a college homework. It is pretty crude and should be considered an experiment.

Limitations

Strings are not supported in grammars - instead of using strings, you should use lists of characters:

import earleyparser

grammar = earleyparser.Grammar('S')

# Sentence is composed of `<adjective> <noun>`
grammar.add('S', ['A', ' ', 'N'])

# Adjective is `happy`, `yellow` or `red`
grammar.add('A', ['h', 'a', 'p', 'p', 'y'])
grammar.add('A', ['y', 'e', 'l', 'l', 'o', 'w'])
grammar.add('A', ['r', 'e', 'd'])

# Noun is `dog` or `cat`
grammar.add('N', ['d', 'o', 'g'])
grammar.add('N', ['c', 'a', 't'])

parser = earleyparser.Parser(grammar)
parser.print_derivation_tree("red cat")

# Due to some bug, the parser doesn't reset its state after it is run, so we
# need to build it again.
parser = earleyparser.Parser(grammar)
parser.print_derivation_tree("yellow cat")

Installation

pip install earleyparser

Example

import earleyparser

#
# Builds the following grammar:
# S -> 0S | 1S | 0 | 1
#
# Meaning it only accepts binary numbers
#
grammar = earleyparser.Grammar('S')
grammar.add('S', ['0', 'S'])
grammar.add('S', ['1', 'S'])
grammar.add('S', ['0'])
grammar.add('S', ['1'])

parser = earleyparser.Parser(grammar)
parser.run('101011')

#
# Checks if a word is accepted by a grammar
#
derivations = parser.get_completes()
if len(derivations) == 0:
    print('Not accepted')
else:
    print('Accepted!')

#
# Prints a derivation tree
#
parser = earleyparser.Parser(grammar)
parser.print_derivation_tree("0101011")

The resulting derivation from the above snippet is:

GAMMA
..S
....0
....S
......1
......S
........0
........S
..........1
..........S
............0
............S
..............1
..............S
................1

References