tokenyze

tokenyze is a Python tokenizer

Overview

It uses generators to do a look-ahead tokenizing of an input string.

Tokens are defined as names or strings, and can be nested using brackets. Names are made up of sequential non-whitespace characters. Brackets are special single letter tokens. Strings are delimited by either single or double quotes.

Backslashes can escape these characters.

Example:

The text

    "fr33(the p1zza c@t)n0w_",

will result in the following (generated) token list:

    ['fr33', '(', 'the', 'p1zza', 'c@t', ')', 'n0w_']

Implementation:

The code uses a generator getchars to deliver character from the text to the gettokens consumer. The consumer will pass on responsibility for parsing the text to either a whitespace consumer eatwhitespace or a token consumer, which will in turn defer to a name consumner eatname or string consumner eatstring.

The gettokens consumer itself is a generator, which will yield each found token in turn until there are no more tokens left.

Usage:

$ python
>>> import tokenyze
>>> for token in tokenyze.gettokens("fr33(the p1zza c@t)n0w_"):
...     print token
... 
fr33
(
the
p1zza
c@t
)
n0w_
>>>

Why?

I have been using Python's shlex for a bit, but while it is fine when parsing a text into names and strings, it is lacking once brackets are added to the mix.

I needed something with a bit more lookahead, and writing generators in python is always fun.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
tokenyze.py		tokenyze.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

README.md

README.md

tokenyze.py

tokenyze.py

Repository files navigation

tokenyze

Overview

Example:

Implementation:

Usage:

Why?

About

Releases

Packages

Languages

vigilantesculpting/tokenyze

Folders and files

Latest commit

History

Repository files navigation

tokenyze

Overview

Example:

Implementation:

Usage:

Why?

About

Resources

Stars

Watchers

Forks

Languages