Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running out of Markovs to chain #143

Open
serin-delaunay opened this issue Nov 30, 2016 · 7 comments
Open

Running out of Markovs to chain #143

serin-delaunay opened this issue Nov 30, 2016 · 7 comments

Comments

@serin-delaunay
Copy link

serin-delaunay commented Nov 30, 2016

IPython notebook: https://github.com/serin-delaunay/NaNoGenMo2016/blob/master/RunningOut.ipynb
Output (strict, 19589 words): https://raw.githubusercontent.com/serin-delaunay/NaNoGenMo2016/master/output/markov.txt
Output (late, 53,020 words): https://raw.githubusercontent.com/serin-delaunay/NaNoGenMo2016/master/output/markov_v2.txt

Going to try a very simple project for the last day: an ngram-based Markov chain trained on a variety of books from project Gutenberg, but with destructive output.The first time a rule is selected from the model, the rule is deleted and it can't be used again. The novel should sound normal (for a Markov story) at the start, but gradually (or quickly) turn into something really odd.

I'll probably have to find some workarounds to make it reach 50,000 words wothout stopping - such as:

  • Using multiple sources;
  • Using sources in early modern English and even middle English;
  • Allowing a small selection of rules to be used more than once.
@serin-delaunay
Copy link
Author

serin-delaunay commented Nov 30, 2016

First, attempt, using the following sources:

  • Boccaccio, tr. Rigg: The Decameron
  • Anonymous, untranslated: Sir Gawain and the Green Knight
  • Chaucer, untranslated: The Canterbury Tales
  • Chaucer, untranslated: Troilus and Criseyde
  • Shakespeare: Troilus and Cressida
  • Euclid (Bibliotheca Polyglotta): Elements
  • D. H. Lawrence: Lady Chatterley's Lover

Rules were strictly deleted after one usage. Ngrams were of length 2-5, alternating words and whitespace/punctuation. The output story only reached 3337 words, and they were too heavily slanted towards middle/early modern English, and fewer line breaks might be nice. Memory usage is almost a gigabyte.

Sample from the start:

Leue --
but increase of. power! he gazed, he remembered going,' she flung himself, departed equally welcome; and giannotto but," protested the “side” of cristen folk.
--

'twas under!'

there sold. rer. ital. script.
(muratori: suppl. tartini) ii. 5 ] 
and, though,' said gentlemen not,
caused him!

he watz arȝe mony, 
sir gawayn, and
'tha'.

she translated was! yet hold-door
quickly, he, being laid,
  yielding at
nights are!'

Sample from the end:

at cannes or slayn;
and ordered
with boydekyns anoon;
right away,' he graunted, and coome agayn!
this requeste.
so beauteous widow was. i
crucified christ?" "ay,"
returned nello's seeing how.
as courteous as-tyt! bot þaȝ men./ if mellors.

'honour, but 'twill
be. but:--"no! no. no, leave clifford; the draught, and lach þyn awen.'
'þis kastel to þyn aunt.
is 'a tourneiynge;
for kings' and
'tis

@serin-delaunay
Copy link
Author

serin-delaunay commented Nov 30, 2016

Second attempt, using the following sources:

  • Boccaccio, tr. Rigg: The Decameron
  • Anonymous, untranslated: Sir Gawain and the Green Knight
  • ? ed. Wheatley : Merlin : or, the early history of King Arthur : a prose romance
  • Chaucer, untranslated: The Canterbury Tales
  • Chaucer, untranslated: Troilus and Criseyde
  • Jayne Amara Ross: As true as Troilus
  • Shakespeare: Troilus and Cressida
  • Euclid (Bibliotheca Polyglotta): Elements
  • D. H. Lawrence: Lady Chatterley's Lover
  • P. G. Wodehouse: The Man with Two Left Feet And Other Stories
  • Robbie Kydd: Auld Zimmery

Rules were deleted with probability 0.99 after each usage. Output was 7236 words, nearly 1.7 gigabytes memory usage. I think it would be a good idea to make the generator favour productions from long ngrams, so that when those are all inaccessible it can fall back to the shorter, more permissive ngrams.

Sample from the start:

' s sleight-of.

when jeeves to!' and galegantius the
/bakbitere./ after crowe,
and--as cm is't 'ave bothered, lady
jane, but--'

'and bete that. 'two strokes, and--'

'of sense
hard, but
inactive, and,
besides, gathered here (how strange weight hate: as depe of ; 
therefore each make." quod merlin, a, h, and shameful!
and snoring loudly.

Sample from the end:

'so roially.
this, ffor how? if fl, 
and toddling away;
ffor mannes foul presumpcioun!
ffor also.

this journey." the metynge. and, "wite alle.
"i gain or sounder þat place!' she peyneth hem,' set their
lovers, know andy ... and oynementes. and cryden, out! harrow! and smoke...'

'will disclose, seeing death,
and devotedly at yowre-self com agains accidie. for -- of bitterness.

she believed it.


writing rubbish?' asks her--i--i--i--

@hugovk hugovk added the preview label Nov 30, 2016
@serin-delaunay
Copy link
Author

serin-delaunay commented Nov 30, 2016

Unexpectedly short stories in buggy versions of the generator:

 windowedisposingbarelystynkyng?
How.

 '-," !
Binomial,  ; 
. [:

@serin-delaunay
Copy link
Author

Third attempt, same sources. the program attempts to find a rule for the most recent 5-gram, progressing to 4-grams, 3-grams, and 2-grams if it fails, and stops if none are found. 4127 much more coherent words were generated. The text stops at in jest. from Ross's Troilus, because nothing else in the source texts has jest.

Sample from the start:

Him
prisoner, and by good
luck i happened to
win, and devote
himself and away with
them, and graciously bade her a
thousand times, to giacomino's
house, where, while in this.'

and he gives, what thinks he is: were it nat for fere,
as she thought.

Sample from the end:

'oho, that's wanted by the, thow oughtest to be (as he began
piteously to, beseech her not weep and
bitterly bewail herself; but being minded to
abet? and again somewhat
rudely, and still something of a, namely cg, gd, and ef, the square equal to each
may seem best; for well thei wolden, and ther shall i
know what else is there
of discernment, worshipful my ladies, we held discourse
of the, pandare, i kan never seye nay.
what! quod this senatour repaireth with victorie and the
cardinals and many tymes; and by doing after this
i gabbe nat, so have ye right welcome, and let's
niver fight! i love him, but it hevyeth me whan i am." than seide amaunt, "ye haue with-outen loue is in mariage hony-sweete;
and for-thi, werk som-what haue i, myn uncle," quod she;
this thank have i
yearned to hold 
in jest, in dreams, in supplications, 
in jest.

@serin-delaunay
Copy link
Author

serin-delaunay commented Nov 30, 2016

Fourth attempt. I've altered capitalisation behaviour so that the first alphabetic character (including thorns and yoghs) after a full stop, question mark, or exclamation mark is capitalised, and the word I is capitalised (actually it's not, but I don't have time to debug that). The proper names in the source texts are quite diverse, so I'm not sure it's worthwhile trying to catch those. I've added chapters, so that whenever the generator gets stuck and gives up, it can start again (still with the restricted rule set). I've decreased the possibility of rule deletion to 0.95.

The chapters get progressively shorter (but don't reach zero length), the process gets gradually slower, and it has real difficulty reaching 15,000 words. I think it needs a larger corpus.

Sample from the start:

 Rout þe raynez he tornez,
halled out at
florence one that honourably entreats you to-morrow!

Aeneas.
We know not even peter, though he
was free, i will, sweet queen.

Helen.
She shall have
what he says, too busy to pay
him out by
spinning what her new fere.

Song, it was you.

Sample from the last chapter before I interrupted the process:

 Schyndered þe bones,
and schrank þurȝ þe fryth and a love-affair,' she said. 'Charming! Charming! Sir john!'

And she reached it just can't! I hope there's the
sort. Wherefore she said:
oh, no! I've tried drawing with my consent be
buried like a rabbit came
so near that i, with hertely wyl they sworen and assenten
to al this; for god woot, ther lith no remedye.
Upon that oother
marchandise, that men sometimes frame messages in such high qualities merit not oblivion--was madonna
oretta's apt to convey him in, and they so, that the “!

@serin-delaunay
Copy link
Author

serin-delaunay commented Dec 1, 2016

Attempt 5. I've reduced the deletion probability to 0.9 and added the following source:

  • Kant, tr. Meiklejohn: The Critique of Pure Reason

Output: https://raw.githubusercontent.com/serin-delaunay/NaNoGenMo2016/master/output/markov.txt

19589 words according to notepad++ - my code reported over 20,000, but whatever.

Sample from the start:



   ---   Chapter 1   ---   

Of £100 and a sense, asked to see
me.'

She leaned against the dogmatist can promise us. For me.'

Mother turns. 'I don't.
It's 'appen better never to
quit bologna, until he falls in love.
How were i, for
thy conseillyng, certes, my conseil al.
For sith i woot,
another seyde the kyng and queene
(though neptunus have deitee in the
     critique itself it rather than any
number. Now, just as she
says that i, woful wrecche and in
amity with god, being minded remain there. So,
the more men she might. But, though she were
a saucerful of tainted milk, but he
knew that god's will, and of
thee: wait and speak with all your
husbands, that, when from time to overflow all contradictory predicates, only
one can belong to, the wolves would have
thee lie to-night."
"With pleasure," returned the master gave the
word curved is superfluous. Now, when
they begin talking nasty then. But i've 'borrowed' a can of itself?
But this previous condition and conditioned in phenomena,
however small, without drawing it across at
her. At first
sight, having never seen one in, with their own.

Sample from the end:

   ---   Chapter 80   ---   

Fro
youre herte slyde.
What deyntee sholde a conseil hyde./ For salomon
seith, -- it is pathologically affected (by
sensuous impulses); it is square; 
therefore a binomial straight line"), no better employe, for thei roos at mydnyght.

Whan the boordes were vppe, than was gaudius and his. One of clifford roused her fair
companions for the?

@serin-delaunay
Copy link
Author

Late output: https://raw.githubusercontent.com/serin-delaunay/NaNoGenMo2016/master/output/markov_v2.txt

Strictly speaking NaNoGenMo is over, but I changed the method of choosing the starting token:

Originally I chose a random ngram in the Markov model, and chose a random production from that ngram's rule. Later I made a list of all available productions in the whole model. That was really expensive!

Now I make a set of every word encountered in the source text during parsing, and convert it to a list. To start a chapter I choose one at random, and delete it from the list. That makes the whole novel generation process muuuuuuuch faster. My code underestimates the number of words output, so I set its target word count to 55,000 and got 53,020 words in 762 chapters. The generation process took a matter of seconds. Parsing is still really slow, though.
Sample from the start:

   ---   Chapter 1   ---   

Underwrite in an unconnected and
rhapsodistic state, but is
dependent on the flat.

'What is me
that 'tis seldom indeed, when i gave her, he
said. 'If we floated like tobacco smoke of those. If a point,
because i love, ywis,
for in truth borrowed from experience--it is
certainly not soothing. I am, it's...'

'Oho, that's
working for money. You had dropped in somewhere to put the
utmost contempt of those, young as you
want.'

Sample from the end:

   ---   Chapter 757   ---   

Investing him with:--"what
means this, sir?" Quoth he.

   ---   Chapter 758   ---   

Abcm to the:

   ---   Chapter 759   ---   

Knoweth his penaunce
was queynt and al day,
and she kisses me on.'

'Cross your heart before; this follows naturally, according to?

   ---   Chapter 760   ---   

Swowned ther he hideth hym and spak namoore, but in, left behind in the;

   ---   Chapter 761   ---   

Joins in her.

For the big, hollow sandstone slab of the.

   ---   Chapter 762   ---   

Werreieth
troughe wityngly, and deffendeth his folie,
so?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants