Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Approximate closed-form rebalancing for faster updates and more meaningful half-lives #41

Open
matomatical opened this issue Jan 23, 2021 · 2 comments

Comments

@matomatical
Copy link

matomatical commented Jan 23, 2021

@fasiha, I dare say, I think you're going to like this!

Overview

I've been working on a pure-Python implementation of the ebisu algorithm and I was not satisfied with the current approach to determining a new half-life during updates (that being 'just use the old half-life, unless the Beta model gets too unbalanced, then 'rebalance' it using a coarse grid search over halflives').

I came up with a different approach which turns out, from my initial experiments, to be simpler, work faster, and actually lead to more accurate/meaningful half-life updates than the current rebalancing strategy (see also #31).

Key idea

They key idea is as follows:

  • As you know (indeed as you invented), the half-life parameter is the time elapsed at which the recall probability is distributed according to a beta distribution with the other two parameters.

  • Thus this 'half-life' parameter is at its most meaningful when that beta distribution is roughly centered at p=0.5, where we will have something approximating exponential decay with half-life equal to the parameter we have been calling 'half-life'.

  • You have shown that in this case the time-decaying beta distribution's mean does not undergo exponential decay exactly, but that it's pretty close. If we were to (approximately) model the probability of recall's mean over time as exponential decay with some half-life, we would get an (approximate) closed form equation for that half-life in terms of the mean at any other given time.

    Let P_recall@t be the random variable representing the probability of recall at elapsed time t. Then:

    E[P_recall@t] =approx. 2 ^ {-t / λ}

    where λ is the half-life of the exponential decay. We can invert this equation to get:

    λ =approx. - t / log_2 E[P_recall@t].

  • My approach is essentially to use such an equation on the analytic posteriors to determine a new value for the half-life parameter during every update.

Algorithm

The resulting update algorithm works as follows. If my understanding of the ebisu algorithm is correct, the only change from the current algorithm is in step 4.

  1. We have the prior over recall probability at time λ_old, distributed according to a Beta distribution with parameters α_old, β_old. I'll denote this prior P_recall@λ_old ~ Beta(α_old, β_old).
  2. Because the quiz takes place at time t, we shift this prior through time before the update, to prior P_recall@t ~ GB1(...).
  3. We perform the update, computing posterior P_recall@t ~ (that complex distribution with analytical moments).
  4. We want to find a suitable half-life for this posterior:
    • Move the posterior back to time λ_old, giving posterior P_recall@λ_old ~ (another complex distribution with analytical moments).
    • Compute the first moment of this posterior at λ_old, giving us E[postr P_recall@λ_old].
    • The equation mentioned above approximately governs the approximately exponential decay of this posterior's mean. In particular, E[P_recall@λ_old] =approx. 2 ^ {-λ_old / λ_new} where λ_new can be interpreted as the 'effective half-life' of the posterior and will be our choice for the new λ parameter.
    • Plug the posterior's first moment and λ_old into this equation to calculate λ_new =def. - λ_old / log_2 E[postr P_recall@λ_old].
  5. With our suitable new half-life chosen, move the posterior to λ_new, posterior P_recall@λ_old ~ (another complex distribution with analytical moments), which should be approximately central.
  6. Moment match a Beta distribution to give us posterior P_recall@λ_new ~approx. Beta(α_new, β_new), which we take as our new memory model.

Note: some of those steps are conceptual rather than computational; pseudocode illustrating the computational steps is below (all in log-space, of course):

def updateRecall(result r, time elapsed t, prior model parameters θ):
    # Steps 1--2 are captured by assumption within the parameters
    _, _, λ_old = θ

    # Steps 3 and 4:
    # calculate the posterior after update at time t, shifted back to time λ_old
    postr_λ_old = analytic_posterior(r, t, λ_old, θ)
    # use this posterior mean and the exponential decay approximation as
    # a heuristic to determine a new half-life
    ln_μ_λ_old = postr_λ_old.ln_moment(1) # ln here denotes natural logarithm
    λ_new = - λ_old * ln(2) / ln_μ_λ_old 

    # Step 5. Compute the moments of the posterior at time λ_new
    postr_λ_new = analytic_posterior(r, t, λ_new, model)
    ln_m1 = postr_λ_new.ln_moment(1)
    ln_m2 = postr_λ_new.ln_moment(2)
    mean  = exp(ln_m1)
    var   = exp(ln_m2) - exp(2*ln_m1)
    # Step 6. match with a beta distribution to fit our new model
    α_new, β_new = beta_match_moments(mean, var)
    return (α_new, β_new, λ_new)

Results

I have experimented by performing these Bernoulli updates at 0.0001, 0.01, 1, 100, and 10000 half-lives, for both failed (successes=0) and passed (successes=1) trials. The initial half-life was exactly 1 (so t = δ).

I tried the following four update schemes for comparison:

  1. λ_new = λ_old (no rebalance): Do not attempt to rebalance, just force fit a Beta distribution to the posterior, no matter how unequal alpha and beta are.

    λ_new = λ_old (no rebalance)
    result  delta    new alpha       new beta      new half-life
    fail    1e-04        1.665871    3.093668      1
    fail    1e-02        1.670583    3.093355      1
    fail    1e+00        2.000000    3.000000      1
    fail    1e+02        2.004093    2.006296      1
    fail    1e+04        2.000000    2.000001      1
    pass    1e-04        2.000100    2.000000      1
    pass    1e-02        2.010000    2.000000      1
    pass    1e+00        3.000000    2.000000      1
    pass    1e+02      102.000000    2.000000      1
    pass    1e+04    10012.410073    2.002081      1
    

    OK, no question, this strategy is Really Bad, obviously. The half-life parameter loses all interpretability and, as I think you mentioned elsewhere, moment-matching would probably be pretty inaccurate for such extremely skewed posteriors.

  2. ebisu (AUTO rebalance): The current implentation: Do (1) unless alpha and beta differ by a factor of two or more, then use coarse grid-search to find a new half-life.

    ebisu (AUTO rebalance)
    result  delta    new alpha       new beta      new half-life
    fail    1e-04        1.665871    3.093668      1
    fail    1e-02        1.670583    3.093355      1
    fail    1e+00        2.000000    3.000000      1
    fail    1e+02        2.004093    2.006296      1
    fail    1e+04        2.000000    2.000001      1
    pass    1e-04        2.000100    2.000000      1
    pass    1e-02        2.010000    2.000000      1
    pass    1e+00        3.000000    2.000000      1
    pass    1e+02        1.328794    2.075719     61.566292
    pass    1e+04        2.589229    2.032667   3361.405627
    

    Behaving as expected. Interestingly, there is never a need to rebalance for fail trials even for extremely quick fails.

  3. ALWAYS ebisu-rebalance: Always use coarse grid-search to find a new half-life.

    ALWAYS ebisu-rebalance
    result  delta    new alpha       new beta      new half-life
    fail    1e-04        1.430187    3.132588      1.127626
    fail    1e-02        1.434299    3.132170      1.127626
    fail    1e+00        1.725811    3.034261      1.127626
    fail    1e+02        1.749764    2.017428      1.127626
    fail    1e+04        1.745926    2.010629      1.127626
    pass    1e-04        1.746014    2.010628      1.127626
    pass    1e-02        1.754725    2.010569      1.127626
    pass    1e+00        2.627134    2.006456      1.127626
    pass    1e+02        1.328794    2.075719     61.566292
    pass    1e+04        2.589229    2.032667   3361.405627
    

    Also behaving as expected. Interestingly, the grid search finds an increased half-life parameter as the best fit even for failed trials. Of course, the effective 'half-life' determining scheduling probably still goes down, and this is probably captured through appropriate changes in alpha and beta.

  4. always APPROX. rebalance: Always use the approximation discussed above to find a new half-life.

    always APPROX. rebalance
    result  delta    new alpha       new beta      new half-life
    fail    1e-04        2.763985    2.994630      0.660264
    fail    1e-02        2.765495    2.994940      0.661462
    fail    1e+00        2.790686    2.936193      0.756471
    fail    1e+02        2.005878    2.006227      0.999208
    fail    1e+04        2.000001    2.000001      1.000000
    pass    1e-04        2.000019    2.000003      1.000036
    pass    1e-02        2.001878    2.000299      1.003606
    pass    1e+00        2.135892    2.018352      1.356915
    pass    1e+02        2.487732    2.034483     35.695958
    pass    1e+04        2.501217    2.034258   3466.775981
    

    This behaviour is, I think, really neat! So we see that for the cases where the existing ebisu algorithm (above) decided to rescale the half-life, this approximate method got a similar value (e.g. 3361 v. 3466, 61 v. 35, about 1 v. about 1). In a few notable cases, I think it did a 'better' job, especially in that the half-life actually goes down for failed trials. Perhaps most importantly, the changes to the alpha and beta parameters at the new half-life are pretty contained, i.e. the beta distribution has stayed very well balanced. More on this in the following plot.

Finally, here's a plot of some of the posterior Beta distributions at their new half-lives after these updates, for the various algorithms. Of course, the no-rebalancing scheme is useless, and the ebisu rebalancing scheme works pretty well. But here you can see visually how the new approximate balancing scheme looks even a little more centered at its half-life than the current ebisu rebalancing scheme.

balance

Remaining issues

  • So far, I have only implemented this for Bernoulli quizzes, because that's all my pure-Python port aims to implement (because my memory app doesn't offer any Binomial trials with total > 1). However, I think the math would go through regardless and so this approach could easily be incorporated into ebisu proper in principle.

  • I haven't extensively tested the approach, it may be possible that the exponential approximation is pretty dramatically off outside of the range of values I tested, and there it may lead to bad updates. I was actually hoping that with your experience testing ebisu algorithms you might be able to (help) confirm whether this algorithm works over a broader range of situations.

  • The result of the algorithm is still not completely centered, and so perhaps over many updates it would become unbalanced and need a more sophisticated rebalancing operation. Or, for reasons deeper than what I have considered, this approach might be kind of self-stabilising. Anyway, either way, to be safe it would be easy to use as a default strategy followed by a more sophisticated balancing strategy if the alpha and beta get too out of kilter.

  • In case you are interested, I do have code to back this up and I am in principle willing to share it all in the public domain, but it's not quite ready for sharing yet. Anyway you can probably find some of it if you look hard enough around my own github.

@fasiha
Copy link
Owner

fasiha commented Jan 24, 2021

First, thank you so much for spending so much time and attention on Ebisu, on coming up with this approach, and for writing it up in such detail! I am very grateful!!

Second—yes! I like this!!! I am confident you have found a much better alternative to the rebalancing that we currently do—we can replace the very ad hoc check for rebalance (0.5 < α / β < 2) and the coarse grid search with a very reasonable and self-explanatory alternative.

And as you note, it will give us a cheap and good approximation to start the fine search for the true halflife (that gives us posterior mean of 0.5) to properly do #31.

I'm also confident it will be straightforward to incorporate this approach with the binomial quiz case.

Are you interested in a version of Ebisu that does #31 (always rebalances), using this approach? If so, I can try to focus on that plus some other outstanding issues that are mostly done and release a new minor version—alas due to life commitments, I'm not sure when I'll be able to give this the time it needs. But it sounds like you have a version that is working for you, and while, yes, it may be possible that we need a fallback in case this algebraic approach to approximating the halflife fails, I think barring pathological cases it should work just fine.

Again, many, many thanks for your hard work!!!

@matomatical
Copy link
Author

Right, I have my own implementation, which I will continue to use for my memory exercises. So there's no pressure from me to get this into a new minor version. Please consider this issue a 'feature suggestion' rather than an 'issue' or urgent feature request.

In the mean time, I'll continue to use (my implementation of) the ebisu algorithm for my memory exercises and I can update you once I have more data on how this heuristic updating scheme performs, or if it destroys my memory models, or if I come across any other suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants