Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProbabilisticGrammarMiner does not produce correct probabilistic grammar #153

Open
martineberlein opened this issue Feb 6, 2023 · 0 comments

Comments

@martineberlein
Copy link
Contributor

Describe the bug
When using the ProbabilisticGrammarMiner to learn the probability distribution from a set of given inputs, the miner does not return the correct probabilistic grammar. In particular, this bug occurs whenever a grammar is used with a production rule that has an "empty" alternative: ⟨maybe_minus⟩ ::= "" | "-". Somehow the miner does not account for the empty ("") derivation sequence.

To Reproduce
You can reproduce the failure with the following simple grammar and code:

from fuzzingbook.ProbabilisticGrammarFuzzer import ProbabilisticGrammarMiner
from fuzzingbook.Grammars import Grammar, is_valid_grammar
from fuzzingbook.Parser import EarleyParser


if __name__ == "__main__":
    grammar: Grammar = {
        "<start>": ["<maybe_minus><number>"],
        "<maybe_minus>": ["", "-"],
        "<number>": ["1", "2", "3"],
    }

    initial_inputs = ["-1", "3", "2", "1"]
    assert is_valid_grammar(grammar=grammar)

    probabilistic_grammar_miner = ProbabilisticGrammarMiner(EarleyParser(grammar))
    probabilistic_grammar = probabilistic_grammar_miner.mine_probabilistic_grammar(
        initial_inputs
    )

    print(probabilistic_grammar)

The output is the following learned probabilistic grammar:

# learned from ["-1", "3", "2", "1"]
{'<start>': [('<maybe_minus><number>', {'prob': None})],
 '<maybe_minus>': [('', {'prob': 0.0}), ('-', {'prob': 1.0})],  # Not correct (There is only one negative number ("-1"))
 '<number>': [('1', {'prob': 0.5}),
              ('2', {'prob': 0.25}),
              ('3', {'prob': 0.25})]}

Expected behavior

However, with the given inputs, we expect the probabilities of the "<maybe_minus>" production rule to be:

# with ["-1", "3", "2", "1"]
'<maybe_minus>': [('', {'prob': 0.75}), ('-', {'prob': 0.25})]

Correct probabilistic grammar:

# learned from ["-1", "3", "2", "1"]
{'<start>': [('<maybe_minus><number>', {'prob': None})],
 '<maybe_minus>': [('', {'prob': 0.75}), ('-', {'prob': 0.25})],
 '<number>': [('1', {'prob': 0.5}),
              ('2', {'prob': 0.25}),
              ('3', {'prob': 0.25})]}

Potential Fix

I was able to track down the bug to the function expansion_key(...) which is used by the ProbabilisticGrammarMiner in the function set_expansion_probabilities(...). The function expansion_key does not account for the empty expansion of the production rule "<maybe_minus>": ["", "-"].
Adding the following case to the expansion_key function, solved the issue for me:

# Check for empty list expansion
if isinstance(expansion, list) and not expansion:
    expansion = ""
def expansion_key(symbol: str, 
                  expansion: Union[Expansion,
                                   DerivationTree, 
                                   List[DerivationTree]]) -> str:
    """Convert (symbol, `expansion`) into a key "SYMBOL -> EXPRESSION". 
      `expansion` can be an expansion string, a derivation tree,
         or a list of derivation trees."""

    if isinstance(expansion, tuple):
        # Expansion or single derivation tree
        expansion, _ = expansion

    # Check for empty list expansion
    if isinstance(expansion, list) and not expansion:
        expansion = ""

    if not isinstance(expansion, str):
        # Derivation tree
        children = expansion
        expansion = all_terminals((symbol, children))

    assert isinstance(expansion, str)

    return symbol + " -> " + expansion

Desktop (please complete the following information):

  • OS: macOS Ventura 13.0.1
  • Python version 3.10.9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant