Skip to content

Jeffail/tokesies

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tokesies

A string tokenizer library for Rust, where characters used to separate tokens may also be conditionally selected to be a token themselves.

There are filter implementations provided for a few basic use cases:

use tokesies::*;

let line = "hello!world, this is some_text";
let tokens = FilteredTokenizer::new(filters::DefaultFilter{}, line).collect::<Vec<Token>>();

// tokens: ["hello", "!", "world", ",", "this", "is", "some", "_", "text"]

assert_eq!(tokens.get(0).unwrap().term(), "hello");

You can alternatively provide a custom implementation:

use tokesies::*;

pub struct MyFilter;

impl filters::Filter for MyFilter {
    fn on_char(&self, c: &char) -> (bool, bool) {
        match *c {
            ' ' => (true, false),
            ',' => (true, true),
            _ => (false, false),
        }
    }
}

let line = "hello!world, this is some_text";
let tokens = FilteredTokenizer::new(MyFilter{}, line).collect::<Vec<Token>>();

// tokens: ["hello!world", ",", "this", "is", "some_text"]

assert_eq!(tokens.get(0).unwrap().term(), "hello!world");

Implementation is derived largely from this blog by @daschl.

Contributing and customizing

Contributions are very welcome, just fork and submit a pull request.

About

A string tokenizer library for Rust

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages