open20q

Bayes network with entropy maximiser for Twenty-Questions, Identification Key and similar games/utilities

This repository contains an implementation of a naïve 20q Bayesian classifier, and the rough outlines (including literature) of the tools, methods and technology I intend to use to build it for questions that may have dependencies, using a Tree-Augmented Naive (TAN) Bayesian classifier.

Notes

People stumble over this repository with some regularity. Having come back to the idea time and again, I think that the dense representation of prior knowledge, which the approach descibed below needs, is really naïve. After all, I need to track about $N×Q$ matrix entries to compute the best decision tree on the fly. That's far too big. So, please use this to understand the idea given below, see it implemented, and tell me if you have a solution that is less intensive on computation and memory. I would probably consider an actual tree-based method these days, but I have not found a good one to base a whole open20q on yet, because decision trees tend to have a different structure from this kind of expert system.

License

All software in this repository, where not stated otherwise, should be considered under a 2-clause BSD license:

Copyright (c) 2012, 2015 Gereon Kaiping anaphory@yahoo.de All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Methodology

Following [8]: (This is for building a fixed decision tree from the probability distribution represented by the Bayesian classifier. Depending on the problem – eg. while the network is still being trained, which for an online game could be always, that's why part of the bibliography below is concerned with that part – it can be useful to not build a tree, but to use a very similar algorithm to iterate through questions.)

Start with the prior probabilities at the root node.
To choose the question for a node:
1. For each question Q that has not been asked:
  1. Calculate the probability that the answer to the question will be "yes", given the prior probabilities at that node and the item-question pair probabilities: the sum of P(X) * P(yes | Q, X) for that question Q for all items X.
  2. Calculate the new probability distributions given a yes answer to that question and given a no answer to that question.
  3. Calculate the remaining bit rates of those probability distributions, and then multiply by the answer probabilities in order to come up with an expected bit rate after the question has been answered.
2. Choose the question Q with the lowest expected remaining bit rate. Create two new nodes with the probability distributions given previously.
3. Choose the questions for those two new nodes using the same method, unless the question was of the form "Is it X?", in which case the "yes" node is terminal, or unless the tree has gotten too deep.

The idea is to improve prior probabilities by using a Tree Augmented Naive Bayes classifier (TAN).

This has some problems:

Applicable information on TAN seem generally difficult to find.
There is no obvious implementation of TAN in python. For Naive Bayes, the implementation reduces to a matrix multiplication and similar things, as implemented in my code. For TAN it seems like a fully-blown Bayesian network application is necessary, and given [9], I would probably pick libpgm.
While full learning of TANs is in theory solved (although I do not know a good implementation for it in python), this application would require online-learning (because new data will be added for every game, and you do not want to store the full history of all games) from incomplete data (because in no game will players answer all possible questions about their chosen entity). On-line learning of TANs has been researched [1,2,3], so has learning from incomplete data [6], but last time I looked, I could not find an algorithm for doing both.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
README.md		README.md
TODO		TODO
core.py		core.py
main.py		main.py
open20q_augmented.py		open20q_augmented.py
structured.py		structured.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

TODO

TODO

core.py

core.py

main.py

main.py

open20q_augmented.py

open20q_augmented.py

structured.py

structured.py

Repository files navigation

open20q

Notes

License

Methodology

Further Reading

About

Releases

Packages

Languages

Anaphory/open20q

Folders and files

Latest commit

History

Repository files navigation

open20q

Notes

License

Methodology

Further Reading

About

Resources

Stars

Watchers

Forks

Languages