Dev diary: split 3-atom model #66

fasiha · 2024-02-05T07:42:46Z

This dev diary is the third open proposal for Ebisu v3:

there is a v3 release candidate using an ensemble of Ebisu v2-style Beta distributions on logarithmically-spaced halflives (see Request for comment: a new Ebisu v3 API, the Gamma ensemble #62)
the previous dev diary discusses a single-atom Beta distribution approach with a power law decay (instead of exponential, see Dev diary: single-atom Beta power law #64 which also introduced using focal loss as an excellent alternative to cross-entropy (i.e., log-likelihood) to benchmark competing probabilistic models)

Both the above techniques share two nice desiderata:

realistic predictions on recall probability (Ebisu v2 was always too conservative, often predicting <1% probability of a successful quiz)
realistic strengthening of a memory model after successful quiz (Ebisu v2's post-quiz halflife update was also very conservative)

But there's another desideratum:

Repeatedly quizzing at the same quiz interval shouldn't dramatically increase the halflife. I.e., from Mozer et al.'s (DOI), taking the same quiz one week apart over and over again should not result in a memory halflife of months or years. The student has only demonstrated (very clearly!) that they can remember the same fact a week apart, and their memory for longer periods hasn't yet been tested.

Unfortunately, both the ensemble and the Beta-power-law approaches mentioned above fail miserably on this third requirement.

Code to generate the table below

Starting with https://github.com/fasiha/ebisu/tree/v3-release-candidate run this in the top-level directory:

import ebisu

m1 = ebisu.initModel(100)
for i in range(20):
  print(i, ',', ebisu.modelToPercentileDecay(m1 := ebisu.updateRecall(m1, 1, 1, 100)))

and this in the scripts/ directory to access the betapowerlaw.py script:

import betapowerlaw as bp

m2 = [1.25, 1.25, 100]

for i in range(20):
  print(bp.modelToPercentileDecay(m2 := bp.updateRecall(m2, 1, 1, 100)))

index	Ensemble halflife (hours)	Beta powerlaw halflife (hours)
0	185	236
1	267	463
2	359	842
3	458	1476
4	562	2535
5	668	4307
6	773	7269
7	877	12220
8	980	20497
9	1081	34332
10	1180	57459
11	1278	96119
12	1375	160741
13	1470	268762
14	1565	449324
15	1660	751140
16	1753	1255635
17	1847	2098912
18	1941	3508465
19	2034	5864550

I can explain why both models have this flaw:

the ensemble approach sees halflife grow aggressively because while the shortest-term atom has the most weight, its weight steadily drops with each such quiz via the likelihood reweighting. This is unavoidable: the short-term atom predicts the observed probability of success as < 100% whereas the longer-term atoms predict the successful quiz at ~100%. After normalizing the weights, the ensemble's halflife shifts to the right. After 20 such quizzes, the halflife is far beyond the 100 time units we originally started with.
The Beta power-law approach has even more aggressive growth. This is because each successful quiz keeps incrementing the Beta distribution's α parameter (which conventionally represents the number of successes of a binary/Bernoulli coin flip). Therefore, the probability of recall at the halflife (100 time units) keeps getting more and more mass at ~100%. This Beta distribution goes through the nonlinear power-law and ends up with a lot of probability of recall even at distant time horizons.

These two failure modes are independent and made me think about ways to circumvent both while keeping the other two desiderata listed above.

Here's where I ended up.

Consider a simple 3-atom ensemble with fixed weights (i.e., the weights don't change, so it's quite a stretch to call it an "ensemble"):

a primary Ebisu v2 atom (a Beta distribution on recall at some halflife, with exponential decay)
a strengthening atom, also Ebisu v2, with a halflife 2× (or N×) the primary; this model is never updated
a long-term atom, a power-law that for now can be the betapowerlaw model proposed in the previous dev diary; this model is also never updated

Here's the idea: the primary atom is just an Ebisu v2 atom, so it's conservative: it evolves slowly and therefore is less vulnerable to the halflife growing dramatically after repeated quizzes on the same time interval. The second atom allows this model to circumvent the conservativeness of Ebisu v2: it explicitly posits that memory can strengthen organically and its halflife is pegged to twice (or N×) the first atom's halflife: this meets our second desideratum of realistic halflife growth after quizzes, and that's why it never needs updating. Finally, the third atom (the power law) makes explicit the chance that exponential decay is just wrong for this memory and captures the odds that without study the student will remember this fact for a year. This achieves the first desideratum of respectable predicted recall probabilities, and similarly doesn't need updating: it just exists to prop up the recall probability at long intervals.

Here are the halflives for the three proposals after twenty successful quizzes each 100 hours apart, as well as how much bigger this halflife is than the starting halflife: the last column, the split approach, shows unbounded growth of the halflife but much slower. After twenty iterations, it's still 7× the starting halflife, versus 17 (ensemble) and 600 (Beta power-law):

Ensemble halflife	Powerlaw halflife	Split halflife
185.8 (1.59x)	168.4 (1.68x)	426.8 (1.38x)
267.2 (2.28x)	259.1 (2.59x)	538.8 (1.75x)
359.2 (3.07x)	379.7 (3.80x)	645.5 (2.09x)
458.8 (3.92x)	540.3 (5.40x)	747.5 (2.43x)
562.6 (4.80x)	754.3 (7.54x)	845.7 (2.74x)
668.0 (5.70x)	1039.6 (10.40x)	940.4 (3.05x)
773.3 (6.60x)	1419.7 (14.20x)	1032.0 (3.35x)
877.5 (7.49x)	1926.4 (19.26x)	1121.0 (3.64x)
980.2 (8.37x)	2601.6 (26.02x)	1207.5 (3.92x)
1081.1 (9.23x)	3501.7 (35.02x)	1291.8 (4.19x)
1180.5 (10.08x)	4701.2 (47.01x)	1374.0 (4.46x)
1278.4 (10.92x)	6300.0 (63.00x)	1454.4 (4.72x)
1375.1 (11.74x)	8430.8 (84.31x)	1533.1 (4.98x)
1470.8 (12.56x)	11270.6 (112.71x)	1610.3 (5.23x)
1565.8 (13.37x)	15055.4 (150.55x)	1685.9 (5.47x)
1660.1 (14.18x)	20099.5 (200.99x)	1760.2 (5.71x)
1754.0 (14.98x)	26821.8 (268.22x)	1833.2 (5.95x)
1847.6 (15.78x)	35780.7 (357.81x)	1905.1 (6.18x)
1941.1 (16.58x)	47720.3 (477.20x)	1975.7 (6.41x)
2034.6 (17.37x)	63632.1 (636.32x)	2045.4 (6.64x)

(The absolute value of the third column appears to be similar to the values in the first column but that's because the split-3-atom model started out at a higher halflife: the primary atom of that model has halflife of 100, so between the strengthening and the long-term atoms, the overall halflife is much higher than 100. That's why you want to pay attention to the parenthetical number, how much bigger this halflife is from the starting halflife.)

After some tweaking of the parameters of this model, we find that it's very competitive with the ensemble and the Beta-power-law approaches:

*Dev instructions to generate this plot*

To obtain this plot,

create a venv or Conda env,
install dependencies: python -m pip install numpy scipy pandas matplotlib tqdm ipython "git+https://github.com/fasiha/ebisu@v3-release-candidate",
then clone this repo and check out the release candidate branch: git clone https://github.com/fasiha/ebisu.git && cd ebisu && git fetch -a && git checkout v3-release-candidate,
download my Anki reviews database: collection-no-fields.anki2.zip, unzip it, and place collection-no-fields.anki2 in the scripts folder so the script can find it
start ipython: ipython
run the script: %run scripts/split3.py. This will produce some text/figures.

Compare to the ensemble approach:

and the Beta-power-law results:

Indeed, for the first half of the graphs above (the flashcards for which I had a lot of failed quizzes), this "split-3-atom" model outperforms the two alternatives.

When I initially sketched this split-3-atom model, I thought the first atom would have a lot of weight, like 80%, and the next two atoms would have 10% each. Turns out that an equal split works the best, one-third weight for each. There also appears to be some advantage to scaling the second atom to 5x the first atom's halflife instead of 2x in terms of focal loss performance, but we'll have to see if that's "real" or just the loss function being weird.

As usual, I'm going to stew over this and poke around the text file generated by the script above that delves into the predictions made for each model for individual quizzes per flashcard. But I'm tentatively excited about this model. It's lacks the mathematical elegance of the Beta power-law model and needs more parameters (specifically, the weights and the halflife-scalar for the second atom), but so far I like its behavior a lot.

The text was updated successfully, but these errors were encountered:

fasiha · 2024-02-05T08:01:47Z

As in the previous dev diary for the Beta power-law model, the script has a GRID_MODE flag that iterates over initial α=β as well as initial halflife, and for each tuple, sums the focal loss over all quizzes, all flashcards. That's what suggested the 24 hour halflife for the equal-weighted case:

zxl777 · 2024-02-13T01:56:11Z

@fasiha
I have developed a free online version of Flashcards available at https://itoytoy.com/anki
I plan to use ebisu 3.0 and will regularly sync the review data from users' cards with you for further optimization of ebisu.

I have previously used ebisu 2.1 in my product, but feel that its potential has not been fully utilized in practical applications. After integrating ebisu 3.0, should any issues arise, I will consult with you for guidance.

Thank you.

fasiha mentioned this issue Feb 24, 2024

Would you be interested in benchmarking Ebisu against FSRS? fasiha/ebisu.js#23

Open

fasiha mentioned this issue Mar 11, 2024

Questions about v3 #63

Open

fasiha mentioned this issue Apr 12, 2024

Ebisu? open-spaced-repetition/srs-benchmark#85

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dev diary: split 3-atom model #66

Dev diary: split 3-atom model #66

fasiha commented Feb 5, 2024 •

edited

fasiha commented Feb 5, 2024 •

edited

zxl777 commented Feb 13, 2024

Dev diary: split 3-atom model #66

Dev diary: split 3-atom model #66

Comments

fasiha commented Feb 5, 2024 • edited

fasiha commented Feb 5, 2024 • edited

zxl777 commented Feb 13, 2024

fasiha commented Feb 5, 2024 •

edited

fasiha commented Feb 5, 2024 •

edited