Skip to content

Commit

Permalink
Update README
Browse files Browse the repository at this point in the history
  • Loading branch information
zaibacu committed Dec 7, 2019
1 parent 471eee8 commit 21a3a66
Show file tree
Hide file tree
Showing 2 changed files with 40 additions and 21 deletions.
51 changes: 30 additions & 21 deletions README.md
Expand Up @@ -37,27 +37,7 @@ Now you can compile these rules `rita -f <your-file>.rita output.jsonl`

# Using compiled rules

## Standalone Version

While it is highly recommended to use it with spaCy as a base, there can be cases when pure python regex is the only option.

You can pass rule compilation function explicitly. This concrete function will build regular expressions and create executor which accepts raw text and returns list of results.

Here's a test covering this case

```python
def test_standalone_simple():
from rita.engine.translate_standalone import compile_rules
patterns = rita.compile("examples/simple-match.rita", compile_fn=compile_rules)
results = list(patterns.execute("Donald Trump was elected President in 2016 defeating Hilary Clinton."))
assert len(results) == 2
entities = list([(r["text"], r["label"]) for r in results])

assert entities[0] == ("Donald Trump was elected", "WON_ELECTION")
assert entities[1] == ("defeating Hilary Clinton", "LOST_ELECTION")
```

## spaCy backedn
## spaCy backend

```python
import spacy
Expand Down Expand Up @@ -102,3 +82,32 @@ patterns = rita.compile("examples/color-car.rita")
ruler.add_patterns(patterns)
nlp.add_pipe(ruler)
```

If you don't want to use file to store rules, they can be compiled directly from string

```python
patterns = rita.compile_string("""
{WORD("Hello"), WORD("World")}->MARK("GREETING")
""")
```


## Standalone Version

While it is highly recommended to use it with spaCy as a base, there can be cases when pure python regex is the only option.

You can pass rule compilation function explicitly. This concrete function will build regular expressions and create executor which accepts raw text and returns list of results.

Here's a test covering this case

```python
def test_standalone_simple():
from rita.engine.translate_standalone import compile_rules
patterns = rita.compile("examples/simple-match.rita", compile_fn=compile_rules)
results = list(patterns.execute("Donald Trump was elected President in 2016 defeating Hilary Clinton."))
assert len(results) == 2
entities = list([(r["text"], r["label"]) for r in results])

assert entities[0] == ("Donald Trump was elected", "WON_ELECTION")
assert entities[1] == ("defeating Hilary Clinton", "LOST_ELECTION")
```
10 changes: 10 additions & 0 deletions docs/syntax.md
Expand Up @@ -27,6 +27,7 @@ Also, macro can have modifier (if it supports it)
WORD+ # Declare, that you'll have 1..N words
WORD* # Declare, that you'll have 0..N words
WORD? # Declare, that you'll have 1 or no words
WORD! # Declare, that you want to ignore this word
```

More examples
Expand Down Expand Up @@ -78,3 +79,12 @@ When building a rule, you may want to combine several rules into one, you can us
```

we're saying: `If any of these color words are present in text and is followed by word "car", we assume this part can be labeled as "CAR_COLOR"`

## Logical variants

You can say, that your rule expects either `word1`, or `word2`. Usually this can be achieved by writing two separate rules, but there's easier way:
```
{WORD("word1")|WORD("word2")}
```

Pipe character (`|`) marks a logical `OR` meaning that either right or left side can be matched. It works only on surface level, if you want nested logic - write separate rules.

0 comments on commit 21a3a66

Please sign in to comment.