Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentence-level Markov model (or, reconstructing Moby-Dick using a neural network) #99

Open
jeffbinder opened this issue Dec 1, 2018 · 4 comments

Comments

@jeffbinder
Copy link

I split each chapter of Moby-Dick into sentences, then used a neural network to try to guess what order the sentences should appear in. I call the result Mboy-Dcki.

This is essentially a Markov chain model that works at the level of sentences rather than words or tokens. Such a model cannot be trained directly, so I created a encoder-decoder-type recurrent neural network that takes in the last 25 characters of a sentence and tries to guess what the first 25 characters of the next sentence will be. I then used this network to compute the probabilities for each pair of sentences.

It actually sort of works—at the very least, it picks the right sentence a little more often than chance would dictate. But the point, of course, is in the interesting ways it fails.

Code and a more detailed explanation are here.

@jeffbinder jeffbinder changed the title Sentence-level Markov model Sentence-level Markov model (or, selecting the order of sentences using a neural network) Dec 1, 2018
@jeffbinder jeffbinder reopened this Dec 1, 2018
@jeffbinder jeffbinder changed the title Sentence-level Markov model (or, selecting the order of sentences using a neural network) Sentence-level Markov model (or, reconstructing Moby-Dick using a neural network) Dec 1, 2018
@hugovk
Copy link
Member

hugovk commented Dec 1, 2018

MBOY-DCKI;
OR, THE WHEAL.
BY
HERMAN MELVILLE,
AND A NEURAL NETWORK

CHAPTER 1. Loomings.

Call me Ishmael. Go visit the Prairies in June, when for scores on scores of miles you wade knee-deep among Tiger-lilies--what is the one charm wanting?--Water--there is not a drop of water there!

Chief among these motives was the overwhelming idea of the great whale himself. He thinks he breathes it first; but not so.

Circumambulate the city of a dreamy Sabbath afternoon. With a philosophical flourish Cato throws himself upon his sword; I quietly take to the ship. Strange! here come more crowds, pacing straight for the water, and seemingly bound for a dive.

@hugovk hugovk added the preview label Dec 1, 2018
@enkiv2
Copy link

enkiv2 commented Dec 1, 2018 via email

@jeffbinder
Copy link
Author

Thanks for sharing this!

The two approaches could possibly be combined by running the neural network on phrases rather than sentences. The training script should work without modification on any linguistic unit (phrases, clauses, paragraphs, etc.)—the corpus just has to be prepared differently. Doing it at the phrase level might make the strangeness more immediate because you would have to read fewer words on average before getting to something that differs from the original text. I'm not sure how well the particular model I used would do at assembling phrases into syntactically correct sentences, though.

@tra38
Copy link

tra38 commented Dec 2, 2018

Just for reference, links to the two novels:

  • Original Method (trained on the Wright American Fiction corpus) - Mboydcki

  • Alternative Method (trained on the Moby Dick corpus) - Mobydcik

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants