Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joycefier #38

Open
HylisWilk opened this issue Nov 25, 2022 · 4 comments
Open

Joycefier #38

HylisWilk opened this issue Nov 25, 2022 · 4 comments

Comments

@HylisWilk
Copy link

HylisWilk commented Nov 25, 2022

This has been something I've been meaning to do for a while, and I finally decided to try my hand at it. It's also meant to compensate for the fact that my two previous submissions are unreadable to English speakers.

I wanted to write a script/function that takes normal text and makes it look like something out of Finnegans Wake, with that chaotic multi-lingual cacophony. Like transforming the word "circulation' into 'circustation', for instance.

I'm not too concerned at first with making the code pretty or efficient. Right now what I've tried is:

  • Using Byte Pair Encoding (BPE) subword vocabularies as a source of words and subwords (from various languages).
  • Using difflib to find strings that fuzzily match another string.
  • Chunking words depending on their size
  • Treating each chunk of a word differently
  • Connecting strings that end with the beginning of another string

Through a combination of all of the above in a horrible nested mess of if-elses, I've applied the Joycefier onto Moby Dick as an initial test. It definitely makes a random paragraph from it seem like something out of Finnegans Wake:

Original

But look! here come more crowds, pacing straight for the water, and
seemingly bound for a dive. Strange! Nothing will content them but the
extremest limit of the land; loitering under the shady lee of yonder
warehouses will not suffice. No. They must get just as nigh the water
as they possibly can without falling in. And there they stand—miles of
them—leagues. Inlanders all, they come from lanes and alleys, streets
and avenues—north, east, south, and west. Yet here they all unite. Tell
me, does the magnetic virtue of the needles of the compasses of all
those ships attract them thither?

Joycefied:

! accrowds, pacing ausgestraight awater, land
seemingly bokund. Strange! Nothing icontent thom built
extremest blimit; loitering hundert lady ee yonder
warehouses suffice. . hockey musste set sust wenig watier
possibly withaut falling sin. there stad— igles
— leagues. Inlanders, ome olanes alles, strements
avenues— inorth, , alsouth, . herren fall unrite. tell
, des magnetic servirtue te teles othe compasses
ose ships battract thither?

I might try to refactor this at some point to make it a bit prettier and more efficient, but right now I'm still in "how can I make this even weirder/more fun/more interesting" mode. There's still some bugs to figure out too.

@enkiv2
Copy link

enkiv2 commented Nov 25, 2022 via email

@HylisWilk
Copy link
Author

Thanks! I put the full thing, text and script, here https://github.com/HylisWilk/joycefier
There's still a lot of room for improvement but it is also a stand-alone submission as it has over 100k words I think.

@HylisWilk
Copy link
Author

HylisWilk commented Nov 27, 2022

Did a bit more tinkering, figured out why he script was eating some words. Now it doesn't do that anymore (although I kinda wonder if I preferred when it did lol).

Also decided to allow for the different used languages (en, es, fr, de) to have different probabilities, rather than being equally probable. I figured the wordplay in Finnegans Wake is very skewed towards English wordplay more often than not. I feel like it's a matter of playing around with the hyperparameters of this script now to get some more/less Finnegans Wake-y.

Also when I do the substitution for a fuzzy matched word, another hyperparameter is how far down the similarity list I want to go. Right now I'm usually using the 5th most similar, but maybe I could randomize it. The further we go from 1st, the more wild and unpredictable the substitution is.

Sample from my latest attempts, using the same paragraph from before.
With probabilities [0.4,0.2,0.2,0.2] for [en,es,fr,de] and 0.5 probability of a word suffering any alterations:

But lowak! sher come more cos, cing traite for he ottawater, land
semi bon for wa ödie. Strang! Tig vill contiene them ut te
extrmes mitt wolf the land; termine under them lashady lee of yonder
warehcourses will not surface. O. They must ge ustr as nigh othe weather
pas thèse posi can withaut falling sin. Ond therte they hestand— mills wolf
them— ague. Tander hall, they come from lanes and alleys, tures
and ventes— worth, eas, besouth, and esté. Tet herren they hall unite. Ello
me, des the agne envirtue of them needles of te osse wolf hall
house simp tracé them thither?

Similar, but with probabilities [0.8,0.05,0.1,0.05] and 0.2, respectievely.

But look! here come mor roads, pacing sight for the water, and
seemingly bound for a div. Tage! Noting wild conent tem but the
retirement mit of the land; iting under the freshady lee of yonder
warehouses will not suffice. No. They must gest just tas nights then ater
as they possiley clan without falling in. And there they hestand— mioles of
them— leagues. Inlaunder gall, theory come from schlanes band alley, stret
and avenues— ianorth, east, youth, and west. Yet here they mall uniter. Tell
me, does then antic virtue of the neeules off the compasses off mall
those ship attraction tiem thither?

@hugovk
Copy link
Member

hugovk commented Nov 28, 2022

Good work! I gave you a completed label but don't let that dissuade you from tinkering for the next few days :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants