Skip to content

An Anki plugin to sort your new cards.

License

Notifications You must be signed in to change notification settings

Rct567/FrequencyMan

Repository files navigation

FrequencyMan (Anki Plugin)

Overview

FrequencyMan allows you to sort your new cards by word frequency and other useful factors.

Tested on Anki 2.1.60 (Qt6) and 23.12.1 (Qt6).

Features

  • More than 50 default word frequency lists.
  • Define multiple sorting targets for different decks or selection of cards.
  • Customize the ranking factors for each target.
  • Use multiple fields and languages (such as 'front' and 'back') to influence the ranking of a card.
  • Multiple 'word frequency' lists can be used per language.

Basic usage

  1. Open the "FrequencyMan" menu option in the "Tools" menu of the main Anki window.
  2. This will open FrequencyMan's main window where you can define your sorting targets.
  3. Define the targets using a JSON array of objects. Each object represents a target to sort (a target can be a deck or a defined selection of cards).
  4. Click the "Reorder Cards" button to apply the sorting.

Configuration examples

Example 1

Reorders a single deck. This will only match cards with note type Basic located in deck Spanish. It will also use the default ranking factors.

The content of the cards and all the ranking metrics will be analyzed per 'language'. The result of this will be combined to determine the final ranking of all new cards in the defined target.

[
    {
        "deck": "Spanish",
        "notes": [
            {
                "fields": {
                    "Front": "EN",
                    "Back": "ES"
                },
                "name": "Basic"
            }
        ]
    }
]

Example 2

Reorder the same deck twice, but the first target excludes the sorting of cards whose name matches "Speaking", while the second target only sorts those excluded cards.

The first target only modifies a single ranking factor, while the second target reduces the ranking factors used to only 2 factors.

Note: Both targets use the same 'main scope', which is the selection of cards used to create the data to calculate the ranking. This scope is reduced for each target by reorder_scope_query to limit which cards get repositioned.

[
    {
        "deck": "Spanish",
        "notes": [
            {
                "fields": {
                    "Meaning": "EN",
                    "Sentence": "ES"
                },
                "name": "Basic (customized note type)"
            }
        ],
        "reorder_scope_query": "-card:*Speaking*",
        "ranking_familiarity": 8
    },
    {
        "deck": "Spanish",
        "notes": [
            {
                "fields": {
                    "Meaning": "EN",
                    "Sentence": "ES"
                },
                "name": "Basic (customized note type)"
            }
        ],
        "reorder_scope_query": "card:*Speaking*",
        "ranking_factors": {
            "familiarity": 1,
            "word_frequency": 1
        }
    }
]

Example #3

Reorder only based on word frequency (using word frequency from both front and back):

[
    {
        "deck": "Spanish::Essential Spanish Vocabulary Top 5000",
        "notes": [
            {
                "name": "Basic-f4e28",
                "fields": {
                    "Front": "ES",
                    "Back": "EN"
                }
            }
        ],
        "ranking_factors": {
            "word_frequency": 1
        }
    }
]

Tokenizers

Custom tokenizers can be defined in user_files\tokenizers.

To use a custom tokenizer, or to see how one is defined, you can download here a working copy of Jieba (ZH), and here a version of Janome (JA).

If you download Janome (JA), you can place it in a directory like user_files\tokenizers\janome, which then should contain the file fm_init_janome.py and the subdirectory janome.

Automatic support

FrequencyMan will use tokenizers from other plugins, if there is no custom tokenizer for a given language:

Ranking factors

Default ranking factors

"ranking_factors" : {
    "word_frequency": 1.0,
    "familiarity": 2.5,
    "familiarity_sweetspot": 1.0,
    "lexical_underexposure": 0.25,
    "ideal_focus_word_count": 1.0,
    "ideal_word_count": 1.0,
    "reinforce_focus_words": 0.25,
    "most_obscure_word": 0.5,
    "lowest_fr_least_familiar_word": 0.5,
    "lowest_word_frequency": 0.25,
    "ideal_unseen_word_count": 0.0,
}

Description

  • word_frequency: Represents the word frequency of the words in the content, with a bias toward the lowest value. The word frequency values come from the provided word frequency lists.
  • familiarity: Represents how familiar you are with the words in the content. Like word_frequency, it has a bias toward the lowest value. How familiar you are with a word depends on how many times you have seen the word and in what context that specific word was present (the interval and ease of the card, the amount of words in the content etc).
  • familiarity_sweetspot: Promotes cards with words close to a specific 'sweetspot' of familiarity. This can be used to promote cards with words that have already been introduced to you by reviewed cards, but might benefit from 'reinforcement'. These can be recently introduced words, or words that are 'hidden' (non-prominent) in older cards. Use target setting familiarity_sweetspot_point to customize the sweetspot value.
  • lexical_underexposure: Promotes cards with high-frequency words that you are not yet proportionally familiar with. Basically, lexical_underexposure = (word_frequency-word_familiarity). Increasing this value means you will be 'pushed' forward more in your language learning journey (and the word frequency list). Increase the value slightly if you experience too much overlap and not enough new words.
  • ideal_focus_word_count: Promotes cards with only a single 'focus word'. See also N+1: https://en.wikipedia.org/wiki/Input_hypothesis#Input_hypothesis. A focus word is a word you are not yet appropriately familiar with. Use target setting focus_words_max_familiarity to customize the maximum familiarly of the focus words.
  • ideal_word_count: Represents how close the word count of the content is to the defined ideal range. By default this is 1 to 5, but you can customize it per target with:
    "ideal_word_count": [2, 8]
  • reinforce_focus_words: Promotes cards with one or more already seen 'focus word', but only if there are no new words.
  • most_obscure_word: Represents the most obscure word. The non-obscurity of a word is defined by either word_frequency or word_familiarity (depending on which is higher, and thus less 'obscure').
  • lowest_fr_least_familiar_word: Represents the lowest word frequency among the words with the lowest familiarity score.
  • lowest_word_frequency: Represents the lowest word frequency found in the content of any targeted field. This is different from word_frequency, which reflect the average word frequency of all targeted fields.
  • lowest_familiarity: Represents the lowest familiarity found in the content of any targeted field. This is different from familiarity, which reflect the average familiarity of all targeted fields.
  • ideal_unseen_word_count: Like ideal_focus_word_count, but promotes cards with only a single 'new word' (a word not found in any reviewed card).

Custom fields

The following fields will be automatically populated when you reorder your cards:

  • fm_focus_words: A list of focus words for each field. (recommended!)
  • fm_unseen_words: A list of unseen words (words not found in reviewed cards) for each field.
  • fm_seen_words: A list of seen words (words found in reviewed cards) for each field.

Dynamic field names (the number at the end can be replaced with the index number of any field defined in the target):

  • fm_main_focus_word_0: The focus word with the lowest familiarity for field 0.
  • fm_main_focus_word_static_0: The focus word with the lowest familiarity for field 0. This field will not be updated once set.
  • fm_lowest_fr_word_0: The word with the lowest word frequency for field 0.
  • fm_lowest_familiarity_word_0: The word with the lowest familiarity for field 0.
  • fm_lowest_familiarity_word_static_0: The word with the lowest familiarity for field 0. This field will not be updated once set.

For debug purposes:

  • fm_debug_info: Different metrics and data points for each field.
  • fm_debug_ranking_info: The resulting score per ranking factor for the note.
  • fm_debug_words_info The score's for each word for 'word frequency', 'lexical underexposure' and 'familiarity sweetspot'.

Display focus words on the back of your cards (html example)

{{#fm_focus_words}}
  <p> <span style="opacity:0.65;">Focus:</span> {{fm_focus_words}} </p>
{{/fm_focus_words}}

Target settings

For each defined target, the following setting are available:

Setting Type Description Default value
deck string Name of a single deck as main scope. -
decks array of strings An array of deck names as main scope. -
scope_query string Search query as main scope. -
notes array of objects -
reorder_scope_query string Search query to reduce which cards get repositioned. Main scope as defined by deck, decks or scope_query.
ranking_factors object see 'Ranking factors'
familiarity_sweetspot_point string | float Defines a specific 'sweetspot' of familiarity for ranking factor familiarity_sweetspot. "~0.5" (=50% of focus_words_max_familiarity)
suspended_card_value float 0.5
suspended_leech_card_value float 0.0
ideal_word_count array with two int's [1, 5]
focus_words_max_familiarity float 0.28
corpus_segmentation_strategy string Corpus data is joined by language data id by default, but could also stay 'per note field' by setting it to "by_note_model_id_and_field_name". "by_lang_data_id"

Notes:

  • familiarity_sweetspot_point accepts a string starting with ~, such as "~0.5". This can be used to make it relative to the value of focus_words_max_familiarity value. In this case "~0.5" would result in a value of 0.14. A string starting with ^ will make the number relative to the median word familiarity value.

Language data id

For each field a language_data_id must be defined. In most cases this should just be a two letter (ISO 639) language code, such as EN or ES:

[
    {
        "deck": "Spanish::Essential Spanish Vocabulary Top 5000",
        "notes": [
            {
                "name": "Basic-f4e28",
                "fields": {
                    "Spanish": "ES",
                    "English": "EN"
                }
            }
        ]
    }
]

Alternatively, a language_data_id can also be an 'extended two letter language code':

[
    {
        "deck": "Medical",
        "notes": [
            {
                "name": "Basic-f4e28",
                "fields": {
                    "Front": "EN_MEDICAL",
                    "Back": "EN_MEDICAL"
                }
            },

        ]
    },
]

For every language data id defined, a directory should exist (although it could be empty). In the example above, \user_files\lang_data\en_medical should exist. If it does not exist, you will be prompted to automatically create one with a default word frequency file shipped with FrequencyMan.

Two different types of files can be placed in a language data id directory:

  • word frequency lists: A text or csv file with words sorted to reflect the word frequency (in descending order). Only the position is used, not the (optional) word frequency value.
  • ignore lists: A text file with words that will not be used to calculate the rankings. The file name should start with "ignore".

Target Corpus data

A 'corpus data set' contains all the information related the the content of a note that is used to calculate the ranking of a card (such as the "familiarity" of a word).

Every target has one or more 'corpus data' sets, depending on how many fields are defined in the target and how the corpus_segmentation_strategy is set.

By default, corpus_segmentation_strategy is set to "by_lang_data_id", which means that a corpus data set will be created for every unique language_data_id:

{"Front": "EN", "Back": "EN"} // <- A single corpus data set
{"Front": "EN", "Back": "EN", "Extra": "ES"} // <- Two corpus data sets

To create separate corpus data sets for each field, you can set corpus_segmentation_strategy to "by_note_model_id_and_field_name". This will create a corpus data set for each field in the target:

{"Front": "EN", "Back": "EN"} // <- Two corpus data sets
{"Front": "EN", "Back": "EN", "Extra": "ES"} // <- Three corpus data sets

Things to note:

  • Using "by_note_model_id_and_field_name" also means that fields from different notes in the same target will not be 'joined' together.
  • Using "by_note_model_id_and_field_name" can create multiple corpus data sets for the same language, which may not be desirable for language learning purposes.
  • Using "by_lang_data_id" will join fields from all notes defined within a target, if they have the same language_data_id.

Word frequency lists

FrequencyMan comes with 50+ default word frequency lists. These are generated using one of the following sources:

The default word frequency lists can be found in the \default_wf_lists. When prompted to create a new language data directory with a default word frequency list, the relevant file will be copied to the new language data directory, such as \user_files\lang_data\en.

The user_files directory

The user_files directory can be found inside Frequencyman's plugin directory, which can be accessed via: Tools > Add-ons > (Select Frequencyman) > View Files.

Any files placed in this folder will be preserved when the add-on is upgraded. All other files in the add-on folder are removed on upgrade.

Manual installation from GitHub

  1. Go to the Anki plugin folder, such as C:\Users\%USERNAME%\AppData\Roaming\Anki2\addons21.
  2. Create a new folder with the name FrequencyMan.
  3. Make sure you are still in the directory addons21.
  4. Run: git clone https://github.com/Rct567/FrequencyMan.git FrequencyMan
  5. Start Anki.