Molecular Bigrams

These notebooks visualize words using letter bigram data. The thickness, or weight, of the edges that connect each node represents either the frequency values for pairs of letters in a word (molecularBigrams), or the probability that one letter will follow the next in a letter pair (markovBigrams). At the moment, the frequency notebook only registers one instance of self-looping bigrams ("oo", "aa", etc.). Node layout for frequency values is handled automatically with Fruchterman-Reingold; bigram probabilities are visualized with linear plots.

Data available for:

Google Books (from Peter Norvig's "English Letter Frequency Counts")
English words from Allison Parrish's Gutenberg, dammit (includes words that appear more than 100 times)
Wordfreq (sampling of 25,000 words)

A fully interactive version of molecularBigrams is also available on my website.

To do:

~~Add bigram data for a different corpus~~
~~Handle self-loops ("oo," "aa," etc.)~~ (all but the Python Jupyter Notebook)
~~Do a version with Markov sequences~~

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
data		data
.gitignore		.gitignore
README.md		README.md
install.R		install.R
markovBigrams.ipynb		markovBigrams.ipynb
molecularBigrams.ipynb		molecularBigrams.ipynb
requirements.txt		requirements.txt
runtime.txt		runtime.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

.gitignore

.gitignore

README.md

README.md

install.R

install.R

markovBigrams.ipynb

markovBigrams.ipynb

molecularBigrams.ipynb

molecularBigrams.ipynb

requirements.txt

requirements.txt

runtime.txt

runtime.txt

Repository files navigation

Molecular Bigrams

About

Releases

Packages

Languages

t-shoemaker/molecular_bigrams

Folders and files

Latest commit

History

Repository files navigation

Molecular Bigrams

About

Resources

Stars

Watchers

Forks

Languages