Anki Implementation of Ebisu algorithm #33

TommasoBendinelli · 2020-07-08T12:00:46Z

Hello,

I just went quickly through your note, but it seems like an excellent and math-based approach. Great Job!
I was wondering whether I could implement your scheduler in the Anki app.
How much would you quantify the effectiveness?
Correct me if I am wrong, but the primary benefit that I see is that you can handle better over-studying and under-studying. In case one follows the card schedule diligently, how much time in review the cards (without reducing recall) can be saved?

Have a nice day

fasiha · 2020-07-08T17:48:32Z

Jacob Puthipiroj wrote this Anki addon for Ebisu: https://github.com/thetruejacob/Anki-Ebisu and may have some thoughts comparing Ebisu to Anki's algorithm. I don't use Anki so I'm not sure.

One reason it's hard to answer your question about how much time you save with Ebisu over Anki's SRS algorithm is, you can customize Anki so that when you fail a card, instead of taking you back to 1 day intervals, it just reduces the interval by 20% or whatever. Then it behaves (qualitatively) like Ebisu, where quizzes just increase or decrease the halflife.

The more qualitative advantage I see of using Ebisu over Anki's built-in SRS algorithm is that you no longer have the one-day granularity of intervals.

fasiha · 2020-07-08T18:00:47Z

Also, there's this topic from the Anki discussion board that might be relevant: http://anki.tenderapp.com/discussions/add-ons/40926-refactoring-scheduler-code-in-light-of-new-style-hooks I'm not sure how far that person got though.

Also see #22 for some more discussion about quantitatively evaluating Ebisu vs other SRS algorithms like Duolingo's or Anki's. I'm personally not confident we find a meaningful way to compare these algorithms on what matters: student performance. At least, not without a lot of data (lots of participants, lots of cards, lots of quizzes, preferably lots of external exams that test students' actual grasp of the topic they're studying).

Thanks for looking at Ebisu! I am happy to answer any questions you have, let me know if any of the above is unclear.

TommasoBendinelli · 2020-07-15T10:47:50Z

Thank you for your resources, I really appreciate that. I will let you know in case I have any question or idea

TommasoBendinelli · 2020-07-15T11:02:40Z

I have read with interested #22, and I think that a proper evaluation of the algorithm performance is essential. I do not know if you are aware of this paper https://people.eecs.berkeley.edu/~reddy/files/DRL_Tutor_NIPS17_MT_Workshop.pdf, they are, to my knowledge, the only that try to compare different space repetition learning algorithms. It would be nice to extend somehow their approach.

fasiha · 2020-07-16T19:43:57Z

Thanks for the link, I hadn't seen this and it's interesting! Hmmmm—while it's not at all surprising, it is a bit disappointing that a deep learning network is only competitive with SuperMemo (hand-tuned rules circa late 1990s 😄!). In fact, looking at their Figure 1, it looks like SuperMemo outperforms their algorithm under some student models. (That said, I wouldn't be surprised if someday we find that we can do much better, either with deep learning or with simpler methods!)

In the paper they say,

For scheduler training and evaluation, we implemented three student simulators based on the EFC, HLR, and GPL memory models described in Section 3.1 as OpenAI Gym [5] environments.

It sounds like it might be very easy to add Ebisu to their benchmarks if they've published the code—have you seen a repo? If not, would you consider reaching out to Sid Reddy, the first author, and asking for their code and seeing how to incorporate Ebisu? I'm happy to help.

Some random thoughts follow. Compute-intensive training algorithms like neural nets and stochastic gradient descent (Duolingo's HLR) are out of the reach of mobile platforms that most students will use, leaving lightweight algorithms like SuperMemo, Anki, Leitner, Ebisu (which some might say is heavyweight), etc. as the only contenders. If we had large-scale experiments with real students which showed the superiority of one algorithm over the other, that'd be one thing, but simulating students is going to only give you a very blurry confidence in relative performance of algorithms 😞. And my hypothesis, which I mentioned in #22, is that other factors are going to be much more important for student performance (motivation, time, availability of materials, experience with other languages, etc.) than what SRS algorithm they used.

I'm super-happy you're interested in this topic and I look forward to what we can do here!

TommasoBendinelli · 2020-07-20T10:55:41Z

Thank you for the great answer and congrats again for this super repo.
The code of the paper is available here https://github.com/rddy/deeptutor, although it is just a large Jupiter notebook.

I agree with you that simulating students just provided blurry confidence in algorithm performances, but apart from a real experiment, it is the best evaluation method we can get. Also, I think that although all the factors that you mention are important for improving the learning experience, I also believe that a "great" SRS algorithm can make a difference, especially for "mature" cards (i.e., cards with long-expected half-life).

My end goal would be to create an SRS algorithm based on Reinforcement Learning (it sounds a bit fancy), not only to predict the recall but also to automatically scheduling cards review.
This algorithm would be, correct me if I am wrong, a bit different from Ebisu, where the recall threshold for performing the review is fixed.
Ideally, the algorithm should find the optimal trade-off between maximizing recall probability and reducing the number of reviews.

Currently, I am exploring the field, although the literature is a bit scarce.
Besides your repo and the mentioned paper, I am looking at the Duolingo half-life regression algorithm and this blog: https://papousek.github.io/modeling-prior-knowlegde-using-duolingo-data-set.html. Probably, you are already aware of these resources.

I have in mind to start coding in the next days by creating a reliable benchmark for evaluating the different algorithms and approaches (definitely Ebisu), similar to what the mentioned paper has done.

I would be delighted if you want to work jointly on this idea. If you are up to, we can even arrange a call to discuss the details.

fasiha · 2020-07-21T00:58:23Z

Ahh, I see, yes, if you're going to make a new SRS algorithm as part of an academic effort, then I see why you absolutely need a way to benchmark them :) let me know if you need any help in integrating Ebisu with Sid's Jupyter Notebook, it's quite long and I didn't spend much time looking at how it works.

the optimal trade-off between maximizing recall probability and reducing the number of reviews

This might be somewhat useful: http://learning.mpi-sws.org/memorize/ It sits on top of something like Ebisu or HLR (or anything that gives probability of recall). I have an implementation of their algorithm: https://github.com/fasiha/memorize-py/ but because it's stochastic, I don't personally use it.

Ebisu, where the recall threshold for performing the review is fixed

Ebisu is even lazier than this, it doesn't concern itself with when to perform a review at all. Yes, some quiz apps that use Ebisu use the fixed probability threshold. (I personally hate "piles of reviews that are due", in part because I'm busy and want my quiz app to work around my schedule instead of treating me like its slave, so my quiz apps just randomly select a card in the lowest decile of recall probability for me to review, repeatedly, until I'm tired of reviewing or have to stop and do something else. I guess part of this is because I use SRS for learning languages, and performance on SRS quizzes is only loosely related to being able to reading or listening comprehension in the wild, so I don't take reviews too seriously—this could change if you invent an amazing review scheduler!)

I'm interested in detecting and exploiting relationships between cards: some cards mutually strengthen each other, others might confuse each other, etc. If we had a correlation matrix, or a Bayesian network, between cards, how could we improve quiz scheduling? And, also very interestingly, could we intelligently suggest which cards to learn next?

In a lot of domains, we can use existing databases to create such networks of relationships—for example, kanji. (Other domains, like rare disease facts, you might have to infer such relationships with tons of real-world data; and maybe you find that no relationships exist after all.)

I have a lot of hope that, by explicitly modeling such relationships statistically, that we can find understandable & easy-to-implement algorithms—and so that's been my focus. However, it may be possible that reinforcement learning can infer these relationships, and therefore improve review scheduling and new card learning.

I'd also invite you to spend a little bit of time thinking about whether we can rethink how SRS works. Right now, when we think of flashcards, we think of reviews in an app. Can we better integrate real-world experience? For example, when learning a language, we spend a little bit of time reviewing cards but potentially a lot of time reading or listening, so what if an app let you read free text (or lightly pre-processed text), highlighted words/constructs you hadn't yet learned (i.e., weren't in your list of flashcards), and didn't highlight words that were in your deck but let you click them. If you clicked on a word that you already "knew" (i.e., that you had a flashcard for in your deck that you'd learned), then that counts as a failure. Whereas if you read past that word, that counts as a noisy success (Ebisu can handle noisy quiz results, in a branch, I haven't published it yet).

I'm interested in things like this because I wonder if, by using the flaschcard model, we've incorrectly tied together several different domains of learning. Flaschards for language vocabulary versus medical facts vs linear algebra theorems—we can make flashcards for all of these in Anki, etc., but I wonder if they are different so different that we should find better ways of practicing these facts than SRS reviews. And I say this because it might turn out that reinforcement learning would be very successful when applied to learning one specific domain, whereas it'll not work across all sets of flashcards, because the entire abstraction of "flaschards" is inappropriately tying together very different domains?

I hope some of this rambling is useful!

TommasoBendinelli · 2020-07-22T17:30:18Z

Thank you for the resources, I did not know about memorise!

grooveboxunited · 2020-12-04T01:46:18Z

Jacob Puthipiroj wrote this Anki addon for Ebisu: https://github.com/thetruejacob/Anki-Ebisu and may have some thoughts comparing Ebisu to Anki's algorithm. I don't use Anki so I'm not sure.

I'm curious, do you use a different SRS, or is Ebisu just academic for you?

fasiha · 2020-12-04T02:33:21Z

@grooveboxunited I use Ebisu in all my flashcard apps (see Meguro and Kanda for current/recent iterations), with the usage described above and #35: I don't quiz when predicted recall probability dips below some number, I review when I have time, so usually the predicted recall is very low, giving a big jump in halflife when I get it right; and barely changing the halflife if I get it wrong.

If by "SRS" you mean, a full-blown app that's battletested and ready for others to use, I don't think anyone's yet made an app using Ebisu.

Edit: Or, if you mean, do I personally use Anki with Ebisu, then no, I don't use Anki, I just mentioned Jacob's addon in case it was useful.

TommasoBendinelli changed the title ~~implementation of this approach in Anki App~~ Anki Implementation of Ebisu algorithm Jul 8, 2020

fasiha mentioned this issue Feb 6, 2021

Questions from Keybase user #42

Closed

BradKML mentioned this issue Apr 7, 2021

Comparison to SuperMemo #45

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anki Implementation of Ebisu algorithm #33

Anki Implementation of Ebisu algorithm #33

TommasoBendinelli commented Jul 8, 2020 •

edited

fasiha commented Jul 8, 2020

fasiha commented Jul 8, 2020

TommasoBendinelli commented Jul 15, 2020

TommasoBendinelli commented Jul 15, 2020

fasiha commented Jul 16, 2020

TommasoBendinelli commented Jul 20, 2020

fasiha commented Jul 21, 2020

TommasoBendinelli commented Jul 22, 2020

grooveboxunited commented Dec 4, 2020

fasiha commented Dec 4, 2020 •

edited

Anki Implementation of Ebisu algorithm #33

Anki Implementation of Ebisu algorithm #33

Comments

TommasoBendinelli commented Jul 8, 2020 • edited

fasiha commented Jul 8, 2020

fasiha commented Jul 8, 2020

TommasoBendinelli commented Jul 15, 2020

TommasoBendinelli commented Jul 15, 2020

fasiha commented Jul 16, 2020

TommasoBendinelli commented Jul 20, 2020

fasiha commented Jul 21, 2020

TommasoBendinelli commented Jul 22, 2020

grooveboxunited commented Dec 4, 2020

fasiha commented Dec 4, 2020 • edited

TommasoBendinelli commented Jul 8, 2020 •

edited

fasiha commented Dec 4, 2020 •

edited