Skip to content

codebox/wordvis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is a Python Script to generate Sunburst Charts that visualise the structure of English words.

An example chart, generated using the 100,000 most common words in the Google Books English Corpus is shown below:

Sunburst Chart of Common English Words, small

The charts consist of a series on concentric rings, with each ring divided into segments.

The rings represent letter positions within words - the innermost ring corresponds to first letters, the next ring to second letters, and so on.

Each segment within a ring represents a particular letter, occurring at that position within a word, and following the letter adjacent to it on the previous ring. The size of each segment represents how often that letter appears in that position within the corpus. For example, by looking at the innermost ring we can see that the most common letter to find at the start of a word is 'T':

Sunburst Chart of Common English Words, small

Many of the common words found in the corpus can be seen on the chart by starting at the inner ring and reading radially outwards. For example, the word 'THE' can be seen in the diagram above, at the 10 o'clock position.

Usage

Before running the script you must prepare a correctly formatted text file containing a list of word frequency counts such as this one.

Run the script with 2 command-line arguments indicating the location of the word file, and the desired output file name. For example:

python wordvis.py google-books-common-words.txt words.svg

The charts are generated in SVG format, and the resulting files are large. The SVG file generated using the Google Books data was around 42MB in size.

About

This is a Python script to generate Sunburst Charts that visualise the structure of English words.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages