Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature ideas #53

Open
4 tasks
LasseBlaauwbroek opened this issue Jan 15, 2022 · 5 comments
Open
4 tasks

Feature ideas #53

LasseBlaauwbroek opened this issue Jan 15, 2022 · 5 comments

Comments

@LasseBlaauwbroek
Copy link
Member

LasseBlaauwbroek commented Jan 15, 2022

@Zhang-Liao I'd like to gather some ideas for modifications of the features that we could try. Below are two initial ideas:

  • Use de Bruijn indices for bound variables instead of placeholders
  • Anonymous features: For local variables, we can recursively replace it's identifier with the features of the type of the variable
  • Limit the size of the feature vector for large proof states
  • Don't append feature counts directly to the features, but instead give them as weights to the models

If you have other ideas, please list them here.

@Zhang-Liao
Copy link
Contributor

Features that indicate the proof state is wrong, e.g., the size of the state is too large

@Zhang-Liao
Copy link
Contributor

Different features for _k_NN and online random forests

@LasseBlaauwbroek
Copy link
Member Author

Can you be more specific?

@Zhang-Liao
Copy link
Contributor

Statistics of the proof states, like the number of nodes

@Zhang-Liao
Copy link
Contributor

Can you be more specific?

I am not sure now. I think we need to run an ablation study on random forests.
I write it here because although my code of features is wrong, online random forests perform better with the wrong features than the correct ones. Meanwhile, the results in "offline-simulator" show that the top-1 accuracy of online forests is 3% better than kNN. However, with complex features, random forests are slightly worse than kNN. The above observation indicates the complex features may best suit kNN while not very appropriate for random forests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants