These notebooks visualize words using letter bigram data. The thickness, or weight, of the edges that connect each node represents either the frequency values for pairs of letters in a word (molecularBigrams
), or the probability that one letter will follow the next in a letter pair (markovBigrams
). At the moment, the frequency notebook only registers one instance of self-looping bigrams ("oo", "aa", etc.). Node layout for frequency values is handled automatically with Fruchterman-Reingold; bigram probabilities are visualized with linear plots.
Data available for:
- Google Books (from Peter Norvig's "English Letter Frequency Counts")
- English words from Allison Parrish's Gutenberg, dammit (includes words that appear more than 100 times)
- Wordfreq (sampling of 25,000 words)
A fully interactive version of molecularBigrams
is also available on my website.
To do:
Add bigram data for a different corpusHandle self-loops ("oo," "aa," etc.)(all but the Python Jupyter Notebook)Do a version with Markov sequences