Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about v3 #63

Open
LazerJesus opened this issue Sep 12, 2023 · 17 comments
Open

Questions about v3 #63

LazerJesus opened this issue Sep 12, 2023 · 17 comments

Comments

@LazerJesus
Copy link

LazerJesus commented Sep 12, 2023

Hi,
I recently found your project and have started to build a learning app using Ebisu to time the repetitions. I have a few questions specifically concerned with version 3 of Ebisu.

I'll just list them

  1. What kind of delay do you expect between the release of the Python version and the JS version?
    I ask because I would love to stay on the edge but can't stand building complex systems in Python. I'd love to switch to JS.
  2. I see there are multiple ports already. Are you aware of any Rust port or is that on the Roadmap?
  3. Can the Ensemble be used to model card stages as Anki does?
    Anki groups its cards into New, Learning, Review, Young, Mature, and Relearning.
    From the way I understood the Ensemble, it should be possible to model a (Ebisu-)fact progressing from New -> Learning -> Learning -> Young -> Mature <*-Relearning by shifting the ensemble. Is that correct and is that what you had in mind?

Thanks for building this. I find it conceptually quite appealing for its simplicity and elegance. I'll probably contribute some code examples soon if I am allowed. I had some trouble wrapping my head around the intended usage that a few dedicated examples would ease.
Best,
Finn

@fasiha
Copy link
Owner

fasiha commented Sep 13, 2023

Thanks for writing!

  1. I'll probably release the Python and JS versions simultaneously.
  2. As far as I know, no Rust port has been created of v2. However, I can definitely try to create one—the JavaScript and Java ports both use a simple implementation of substack's Gamma function, and a Rust port of one of those should be straightforward. When v3 is finished, it shouldn't be hard to port that to Rust as well.
  3. No, the Ebisu model isn't based on Anki's stages and doesn't try to imitate them. If I recall, Anki's stages like "new" and "young" etc. are needed because it needs to apply a different interval increase for a flashcard at different times, and it has a simple state machine to move a card between these stages. Ebisu v3 continues with the overall goal of Ebisu to minimize such ad hoc rules, and to instead move all the subjective choices into a single place, the prior (the initial model). Further testing may wind up finding flaws in it, but so far v3 takes care of shifting the ensemble through a very standard Bayesian update (with some thresholding, see Request for comment: a new Ebisu v3 API, the Gamma ensemble #62 → bullet 6, "What minimum weight must an atom have before you apply a quiz update?"). Said another way, the v3 update function is a pure function of the previous model and the quiz result, and that model is just an ensemble of weighted Beta distributions (no conception of how long it's been in memory, how many successes or failures it has, etc.).

Happy to receive your feedback and contributions to usage! I definitely appreciate that I am too close to the library to understand where it's confusing.

Hope this helps! Let me know if I can elaborate more on anything. I am so lazy and busy with life that I haven't made time to finish Ebisu v3 but it's quite close!

@LazerJesus
Copy link
Author

LazerJesus commented Sep 13, 2023

Hey, thanks for coming back to me so quickly!
first off, anyone who builds a library like this isn't lazy ;) I appreciate the effort.

  1. great
  2. great
  3. I see that the function itself is pure and absolutely should never have these states internally.
    However, the categorization of facts is useful for the user and his experience. So I am asking more from a perspective of whether it is possible to build a categorization on top of the algorithm. how would you recommend I go about that?
    For example, if I want to categorize any fact into three stages for the user: Unknown, Learning, and Known.
    Unknown is when the user first encounters the fact and gets multiple repetitions per session - seconds to minutes.
    Learning / Known where repetitions happen on a scale of days to weeks and months.
  • The transition from Learning to Known is pretty straightforward from an Ebisu perspective. But for my application, I need to be able to label any fact as Known/Unknown, without recalculating the decay. I want to add a label, such that I can assume this fact as known and move on to other items in the set. Should I maybe base it on something like: if it takes more than 24*7 hours to reach 90% decay, I add the label? This doesn't feel too elegant. I like the idea of adding the label once a specific model in the ensemble of models becomes dominant, but maybe I am misreading your design here.
  • How would you recommend I elegantly switch timescales of unknown (minutes) to learning (days, weeks, months)? or is that just a concept I am trying to bring from Anki that I should free myself from?

Once I am confident that I am using the tool correctly, Ill propose some additions to the readme.
Best,
Finn

@fasiha
Copy link
Owner

fasiha commented Sep 14, 2023

Interesting question! I haven't thought about this before, let me see if we can noodle through to something meaningful—

Unknown is when the user first encounters the fact and gets multiple repetitions per session - seconds to minutes. Learning / Known where repetitions happen on a scale of days to weeks and months

So note that each Ebisu model has a time-to-80%-recall, a floating point number. For young cards this will be minutes, and for more mature cards it will be days. This is a continuous variable and it's not obvious to me how to discretize it into bins like "unknown", "learning", "known" unless you do something ad hoc like

  • unknown = time-to-80%-recall < 1 day
  • learning = time-to-80%-recall < 1 week
  • otherwise known

These limits (1 day, 1 week) are magic numbers and it's not clear to me that you need these when the time-to-80%-recall will tell you more exactly when to review.

For a user visualization/motivation perspective, one idea could be, like you suggest, use the dominant atom of the ensemble as some kind of feedback. The default v3 model will have 5 atoms so you can label flashcards as one-star through five-star?

for my application, I need to be able to label any fact as Known/Unknown … I want to add a label, such that I can assume this fact as known and move on to other items in the set. Should I maybe base it on something like: if it takes more than 24*7 hours to reach 90% decay, I add the label? This doesn't feel too elegant

Ah, so, if the goal is to avoid overloading the learner with too many new facts, then maybe your rule can be something like, "don't introduce new flashcards until no flashcard has time-to-80%-recall < 2 days".


So those are a couple of ideas for mapping an Ebisu v3 model to a visualization for the user (find the atom with the highest weight) as well as how to pick when to introduce new flashcards (ensure there are no quickly-decaying flashcards). Both are a little ad hoc but that's fine, since these are app-level decisions that you make, Ebisu doesn't make these decisions.

But from the perspective of categorizing cards you've already learned for scheduling purposes, it's still not clear to me whether this is useful? Because, as described above, Ebisu gives you a real number for exactly what time the recall probability drops below a threshold (or equivalently, what the recall probability is right now), and it seems like that should be enough to decide whether or not to ask the student about this flashcard.

(I've also mentioned this in other places but I personally hate apps with "due dates". I have bad memories of using Anki, today I'd have five reviews, tomorrow I'd have fifty 😱, very unpleasant user experience. Apps I create allow the user to drive the experience: we keep picking the weakest flashcard and show that to the user for as long as they want to review. When they're tired of reviewing and want to learn new cards, we teach them new cards. No due dates. If the student doesn't study for a week, and only has ten minutes, we don't tell them "YOU HAVE FIVE MILLION CARDS DUE!!!", we just quiz them the weakest card.

Other than just being a nicer user experience, another reason to avoid due dates is: what happens when you review a flashcard that's related to another flashcard? Suppose you have just reviewed the pronunciation of "階段". That has obviously affected your memory of the meaning of "階段". Ebisu doesn't yet have a public API to handle this kind of correlation between models, but you can imagine using noisy-binary quizzes to update the meaning flashcard in Ebisu v2+. You can imagine having dense webs of correlations between flashcards, where reviewing one flashcards tweaks due dates of potentially dozens of cards. In this situation, it's not obvious why you'd keep recomputing due dates when you could just compute the recall probability of all your flashcards right now.)

Anyway, hope this helps! Forgive me if I've misunderstood you, please feel free to tell me where I'm getting confused!

@LazerJesus
Copy link
Author

LazerJesus commented Sep 14, 2023

Ok, this is turning into a nice noodle salad. I did do my best to keep it sorted.

Relations Between Cards

You really tickled my fancy with the concept of relationships between cards. Because that's essentially what I want to build across scales and a lot of my thinking follows from that so let's start here.

You can imagine using noisy-binary quizzes to update the meaning flashcard

  1. How would you implement this? Can you maybe draw me up a pseudocode example? I dont do too well with pure mathematical explanations, I need to see the noodle in action to understand it.

Ebisu doesn't yet have a public API

  1. Is there a private API? Are you planning one?

Ah, so, if the goal is to avoid overloading the learner with too many new facts

It is not about controlling the load for the learner (I agree with you that Anki is too rigid), it is about enabling the modeling of interdependencies between facts. I think about knowledge as interconnected and codependent. I need to know arithmetic before I do calculus. I need to know how to write functions in Python before I can build a flask API. So when I want to know how to mark a factoid as known, it's because I want to unlock the dependents of that fact for the user.
Think of my application like a skill tree. I have a corpus of knowledge where each fact is known/unknown and has relationships to facts that depend on it. The user starts at the bottom of the tree and works himself up by learning facts which unlocks further facts to learn etc.

  1. Do you think it's possible to have shared atoms?
    Like your 階段 example - can pronunciation and meaning share an atom? (when you have a dependency graph, a lot more relations like this become available. But this is the simplest example I can think of and a good place to start exploring the concept I think.)

One more thing about the multiple atoms of v3.

  1. How does the weighting shift between atoms?
  2. how much can I control the weight distribution?
  3. how much should I control the weight distribution?

I ask because exerting control here seems like a nice way to accomplish all kinds of goals. For example the shared atoms idea from above. Or the idea of moving a fact from being reviewed multiple times each session to being reviewed at most once per session - this could also be accomplished by intentionally shifting the weighting. But of course, I dont want to fuck with the learning of the user.

ps.

you could just compute the recall probability of all your flashcards right now

  1. how do you implement this without computation becoming a concern. If i have a million users and each has 10.000 cards and i have to recompute everything on every review thats a lot of compute. even with some heuristics in place, that approach seems unscalable to me.

Ok. I am so looking forward to your response.

@fasiha
Copy link
Owner

fasiha commented Sep 18, 2023

How to update a meaning flashcard if you just reviewed the pronunciation flashcard

There are two ways this can happen

Passive review

When you quizzed the first flashcard (pronunciation of a word), the student answered and you showed them the meaning. This is an active (normal) review for the pronunciation flashcard and a passive review for the meaning: you didn't actively test recall on the meaning, and you have no evidence that that memory was strengthened or weakened or whatever.

The very simple way I handle passive reviews is to keep the same model and just overwrite the "last seen" timestamp for the flashcard with the current timestamp.

Equivalently you can run updateRecall(prior=model, succcess=0.5, total=1, tnow=elapsedTime) using the v2 API, i.e., a totally uninformative review using the noisy-binary quiz with probability of getting this right at 50%. This will output the same output as the input, so it's equivalent to just updating the "last seen" timestamp.

Correlation using noisy-binary

The more interesting approach is if you don't show the meaning after quizzing for the pronunciation. This isn't a passive quiz for meaning—it's not a quiz at all, and you have to use math. I don't have a good mathematical way to do this 😢 but here's an approach I've experimented with: per https://fasiha.github.io/ebisu/#bonus-soft-binary-quizzes you can customize call updateRecall for the meaning flashcard with

  • q1 = probability of getting the pronunciation right assuming you truly know the meaning
  • q0 = probability of getting the pronunciation right assuming you've truly forgotten the meaning

These are hard to guess at, and maybe a future version of Ebisu will help you find these numbers given a lot of quiz history. But you can guess: suppose

  • q1 = 0.8 and
  • q0 = 0.4?

So you could do something like this: assume the student passed the pronunciation flashcard:

import ebisu

meaningModel = (2, 2, 1) # one hour halflife
elapsedTime = 2 # it's been two hours since you last saw the meaning

newMeaningModel = ebisu.updateRecall(meaningModel, 0.8, 1, elapsedTime, q0=0.4)
print(ebisu.modelToPercentileDecay(newMeaningModel))
# prints 1.145123576480906

So our model for the meaning went from halflife of 1 hour to 1.15 hours because we guess that there's this link between passing the pronunciation flashcard and knowing the meaning (80% chances of knowing pronunciation assuming you truly knew the meaning, 40% of knowing the pronunciation assuming you truly forgot the meaning).

If you failed the pronunciation quiz but wanted to keep the same numbers, it'd be

ebisu.modelToPercentileDecay(ebisu.updateRecall(meaningModel, 1 - 0.8, 1, elapsedTime, q0=0.4))
# 0.9393476052593827

i.e., the halflife for the meaning flashcard dropped from 1 hour to 0.9 hours because you failed the pronunciation card.

You'd then overwrite the meaning flashcard's model with this new model (keeping its "last seen" timestamp the same).

I don't love this technique. It's totally ad hoc. I want to spend some time thinking about other ways to model (and measure) correlations between flashcards (there's an issue about this #27) but haven't gotten to it. So there's no API planned for this, just some experimentation.

Skill tree

Ahh nice, thanks for this explanation! So it's like https://www.executeprogram.com (Gary Bernhardt is also a huge fan of spaced repetition 🙌).

Honestly I don't know if you want to or need to use Ebisu for modeling the skill tree? Like, I don't know if for example Execute Program requires you to achieve some level of mastery in step 1 before allowing you to study step 2, I think it's perfectly reasonable to let the student see flashcard 1, ask them to commit it to memory, and then click "next" to go on to flashcard 2 that depends on 1.

If you wanted to use Ebisu to prevent students from rushing too fast, then we have the various ad hoc techniques we discussed above (like, ensure flashcard 1 has an time-to-80%-recall > 1 week, etc.) but the skill tree itself is something that I think makes sense to keep at your app's level.

Can different facets of the same concept share models/atoms?

I don't think so. I haven't thought about this a lot, but it's never made sense to me to try and use a single memory model to capture pronunciation vs meaning vs writing—i.e., the three facets of the word are 階段 (written form), "kaidan" (pronunciation in Japanese), and "stairs" (meaning in English).

It's always made sense to keep these as separate models. One reason is just the learning experience. Some learners who know Chinese will already know the written form and they'll be really good at guessing the pronunciation in Japanese (階段 is pronounced jie1duan4, which maybe to English speakers sounds quite different from "kaidan" but these are actually quite related), but the meaning is quite different in Chinese—so maybe the meaning card might actually be harder for Chinese speakers learning this than for English speakers.

But also from a statistics perspective, it seems harder to make a hierarchical model where there's a "base" model that represents the overall fact and then some transformed model for written form vs pronunciation vs meaning. It seems easier to keep these separate models and enforce some kind of correlation when updating one without the other?

So I don't have any clear idea of how I'd combine atoms or ensembles tracking one of these vs the other in Ebisu.

Now, your app should definitely keep track of the fact that these three flashcards are interrelated! Back when I used it (years ago) Anki didn't do this and it really annoyed me to get asked about the written form today when it asked me about the pronunciation yesterday. So even if you didn't track any kind of correlation between these different flashcards, you should at least try and space them out a bit so your users don't get annoyed.

Or you could quiz users on all 3 sub-facts at the same time, maybe the rule is "when one is due, review all 3".

Or you could review one sub-fact (written form to pronunciation) when it's due, and then after reviewing the answer, you could have a followup screen where the user can click to see the meaning. That's the perfect use case for a noisy-binary quiz: if they click the meaning, that means they probably forgot it, and if they didn't click it that meaning they probably remember it but it's not 100% and you can tweak the numbers to decide how much meaning.

So in summary, Ebisu for now doesn't know how to handle the different facets of the same card but your app definitely should.

Updating the atom weights in the ensemble

How does the weighting shift between atoms?

It's a very standard statistical update: each atom's weight is multiplied by Probability(quiz result | that atom), i.e., if the atom is very surprised to see this quiz result, it is deweighted; if it's not at all surprised by this quiz result it's not deweighted (and then all weights are normalized to sum to 1). And the actual amount of reweighting is just that probability it assigned the observation (they call this the "likelihood" of observing the result so this approach to reweighting is called the likelihood update).

how much can I control the weight distribution?

You can fully control this: you can set each atom's model (an Ebisu v2 model, a 3-tuple of alpha, beta, and time) and its weight, or you can let the initialization do that by giving it

  • a single model,
  • an initial halflife,
  • a final halflife,
  • and an initial weight,

and it creates atoms with that halflife logarithmically spaced in halflife and in weight.

how much should I control the weight distribution?

I haven't thought of any obvious reasons you'd tweak the weights/models of individual atoms but am happy to be surprised 😅

Performance of recall predictions at scale

Yeah I've always worried about running predictRecall on every flashcard every time you want to know what to quiz on, especially in v2 where this function is quite expensive (multiple Gamma function evaluations). I think different people have come up with different tricks as you can imagine:

  • throttle. Only rerun predictRecall every 10 minutes or whatever.
  • compute predictRecall for some set of future times and store those, and linearly interpolate between them.
  • v3 makes the predictRecall step much faster by settling for an approximation of the exact Bayesian prediction. No more Gamma function calls, just arithmetic.

There's no great solution. It's not been a problem since most apps seem to run this locally on users' devices so they're only running it over whatever hundreds~thousands of flashcards a few times a day but yes, I always worry this will become a deal-breaker for someone at some point.

Hope this helps and is clear!

@LazerJesus
Copy link
Author

sorry for not responding yet. i am building my app using ebisu and want to get more handson experience. so far i quite like the experience

@LazerJesus
Copy link
Author

I am building and building. Ebisu is pretty deeply embedded in my system by now. Do you have a prediction when multiple models, ie v3, will drop? ive reached the first use case where that specific feature would be killer.
Best, Finn

@fasiha
Copy link
Owner

fasiha commented Nov 12, 2023

@LazerJesus thanks for checking in! Sorry to make you wait for so long! I just published a release candidate 3.0.0rc1 to Pypi with the Beta ensemble extension 🥳! The README is https://github.com/fasiha/ebisu/tree/v3-release-candidate#readme and you can install it like this:

python -m pip install "git+https://github.com/fasiha/ebisu@v3-release-candidate"

(Note, the above installs the latest version for GitHub, things have changed a tiny bit since I published rc1 to PyPI but if you want that, you'd do python -m pip install "ebisu>=3rc")

The README has an example script and some verbal explanation for changes to the API. If you're able to beta-test for me, I would be SO grateful 🙇🙏🥹!

Please let me know any and all questions, comments, feedback 😁!

@LazerJesus
Copy link
Author

LazerJesus commented Nov 14, 2023 via email

@fasiha
Copy link
Owner

fasiha commented Nov 15, 2023

@LazerJesus Ebisu.js now has a 3.0.0-rc.1 release candidate up at https://github.com/fasiha/ebisu.js/tree/v3#readme with instructions on how to install it (npm i "https://github.com/fasiha/ebisu.js#v3") and how to use it. It's been tested to ensure it produces the same numbers as the Python v3-release candidate as well. Please check it out!

@LazerJesus
Copy link
Author

LazerJesus commented Nov 15, 2023 via email

@LazerJesus
Copy link
Author

LazerJesus commented Dec 4, 2023

aufgehoben ist nicht aufgeschoben as we say in germany. i am finally getting to it.
I've been playing around the parameters and am unable to initiate a model with numAtoms: 1

var model = ebisu.initModel({ firstHalflife: 24, numAtoms: 1 });
579 |       sum2 += firstWeight * d ** i;
580 |     }
581 |     return Math.abs(sum2 - 1);
582 |   }, { lowerBound: 1e-3, guess: 0.5, tolerance: 1e-10, maxIterations: 1e3 }, fminStatus);
583 |   if (!(fminStatus.converged && isFinite(solution) && 0 < solution && solution < 1)) {
584 |     throw new Error("unable to initialize: " + fminStatus);
              ^
error: unable to initialize: [object Object]
      at initModel (/Users/finn/vivalence/code/spanish/app/backend/node_modules/ebisu-js/dist/ebisu.cjs:584:10)
      at /Users/finn/vivalence/code/spanish/app/backend/src/games/flashcards/logic/ebisu.js:3:12
      at processTicksAndRejections (:61:76)
    ```

@fasiha
Copy link
Owner

fasiha commented Dec 23, 2023

unable to initiate a model with numAtoms: 1

Sorry @LazerJesus, the single-atom model is just an Ebisu v2 model 😕 (var ebisu = require('ebisu'); ebisu.ebisu2.defaultModel(24)). V3 doesn't support this case but I guess I could add special support for this in initModel? Everything we want from v3 (more reliable predictions, etc.) relies on numAtoms>1 so I hadn't ever thought of this case. Let me know if you are able to try with more typical numAtoms or if you want me to add support for the single-atom case to v3.

Sorry also for the delay! I work on quiz apps for some days and then return to Ebisu and then go back to working on apps 😅

@LazerJesus
Copy link
Author

hey @fasiha
no problem, i have that same routine. what are you working on?

so, there are a few thoughts about the atoms.
first off, allowing for numAtoms = 1 would make it easier to get started with migrations.
i have ~5000 atoms in my app and havent yet had the time to try to move them. do you have a migration pattern in mind from v2 to v3?

@LazerJesus
Copy link
Author

@fasiha, any updates? specifically on the migration

@fasiha
Copy link
Owner

fasiha commented Mar 11, 2024

@LazerJesus sorry for the delay! After some more testing (see #66) I'm no longer confident in the v3 release candidate. My apologies 😢🙇!

If you wanted to continue using the v3-rc we discussed above in this thread, one approach might be, for each model to port, initialize a v3-rc model and replay all the quizzes, assuming you have that data somewhere in your database.

I have a JavaScript implementation of the newer algorithm from #66 (tentatively called split3) I'm testing out in another app: https://github.com/fasiha/tabito/blob/main/src/ebisu (does assume you have ebisu-js and minimize-golden-section-1d installed). I am just starting testing this in that quiz app, so I am unable to recommend it, but if you wanted to try it, you could port your old Ebisu v2 models quite easily:

import * as ebisu3split from './split3.ts';
const oldModel = [5.5, 5.5, 4]; // first and second elements should be very close, since ebisu v2 rebalances models
// Also I'm assuming the third element is in HOURS
const newModel = ebisu3split.initModel({alphaBeta: oldModel[0], halflifeHours: oldModel[2]})

This split3 model has performance comparable to v3-rc when run on old data, and is a very simple twist on v2. I'm hoping to start testing this in a real app in a few days, so I'll report back either way.

If you want me to release this as a branch that you can npm install let me know, I'll be happy to do so.

I haven't entirely abandoned v3-rc, so I'm really sorry for sending you in so many different directions. These kinds of open-ended research problems are hard, and I'm really bad at them 😓 so thank you so much for bearing with me and your continued encouragement 🙇

@LazerJesus
Copy link
Author

for now, my needs are met by v2. i am under no pressure to change the SR algorithm. if you have something you deem worth testing, i'll play around with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants