Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequency transforms of text #132

Open
danuep opened this issue Dec 1, 2017 · 4 comments
Open

Frequency transforms of text #132

danuep opened this issue Dec 1, 2017 · 4 comments
Labels
completed For completed novels!

Comments

@danuep
Copy link

danuep commented Dec 1, 2017

I didn't even get the idea until a couple of days ago, and mostly I'm hoping I can get this uploaded before midnight...

Reading @aparrish at #23 talk about hoping to get a meaningful average novel got me thinking about the scales of variation in play, which led to wavelet transforms, which led to

Haar of Darkness

which is unfortunately 2000 words short of the limit, so in honor of a brilliant woman of letters and a brilliant woman of numbers:

The Wavelets, a Daubechies transform of The Waves, by Virginia Woolf

[edit: now with correct link to The Wavelets]

@danuep
Copy link
Author

danuep commented Dec 1, 2017

(now that I've slept)

I'm grateful to @aparrish for sharing her word vectors generated from Project Gutenberg. I wouldn't have had the time to pull this together without that resource. If I had more time, I'd go back and be more content-aware about tokenizing the source texts -- I split on spaces and at each non-letter character, and the vector file contains entries for tokens like '--' and contractions. Entertainingly enough, The Waves isn't in Project Gutenberg, and so my lookup error log was a nice list of words that she coined in that book. For those, I greedily matched valid sub-words starting from the beginning of the word.

I used JWave for the Haar and Daubechies transforms, and Annoy for the nearest-neighbor matching.

@hugovk
Copy link
Member

hugovk commented Dec 1, 2017

🎈

Is the source available somewhere?

@danuep
Copy link
Author

danuep commented Dec 1, 2017

I'll put it up later today--was mostly rushing to meet the deadline (which I now see was UTC, not local, so oh well).

@danuep
Copy link
Author

danuep commented Dec 1, 2017

Scripts are up at https://github.com/danuep/nanogenmo2017

@hugovk hugovk added the completed For completed novels! label Dec 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
completed For completed novels!
Projects
None yet
Development

No branches or pull requests

2 participants