Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paranoid Transformer #142

Open
altsoph opened this issue Nov 30, 2019 · 3 comments
Open

Paranoid Transformer #142

altsoph opened this issue Nov 30, 2019 · 3 comments

Comments

@altsoph
Copy link

altsoph commented Nov 30, 2019

Sorry for the latest joining, but I still believe it's worth to try on it this year's NaNoGenMo :)
This month I tried to build a paranoiac-critical system based on two neural networks, Paranoid Transformer.

The first network is a Paranoiac-intrusive Generator and the second one, Critic, works as a filtering subsystem, so it selects the best ones from the flow of text passages.

Let me share some details:

Generator subsystem

The first network, Paranoiac-intrusive subsystem AKA Generator, uses an OpenAI GPT architecture and the implementation from huggingface. I took a publicly available network model already pre-trained on a huge fiction BooksCorpus dataset with approx ~10K books and ~1B words.

Next, I finetuned it on several additional handcrafted text corpora (altogether ~50Mb of text):

  • a collection of Crypto Texts (Crypto Anarchist Manifesto, Cyphernomicon, etc),
  • another collection of fiction books (from such cyberpunk authors as Dick, Gibson, and others + non-cyberpunk authors, for example, Kafka and Rumi),
  • transcripts and subtitles from some cyberpunk movies and series,
  • several thousands of quotes and fortune cookie messages collected from different sources.

During the finetuning phase, I used special labels to tell the model which type of text it reads:

  • QUOTE for any short quote or fortune, LONG for others
  • CYBER for cyber-themed text and OTHER for others.
    Each text got 2 labels, for example, it was + for Cyphernomicon, + for Kafka and + for fortune cookie messages. Note, there were almost no texts labeled as +, just a few nerd jokes.

At last, in generation mode, I kindly asked mode to generate some + texts.
The raw results were already promising enough:

terosexuality is pleasures a turn off ; and to me not to be a true blossoming beautiful being is on the other side. the wind is our song, the emotions are our wind and a piano, new things change, new smells kick off in time, a spiritually shifting dust. let your eyes sing music for a while. let your ears measure the bass beat of your soul, the gentle winding of the song. then your ears achieve harmony. you can listen to french playstation on live music together forever, in the philly coffeehouse, in them congressional district of the franklin gap building. let painting melt away every other shred of reason and pain, just lew the paint to move thoughts away from blizzes in death. let it dry out, and turn to cosmic delights, to laugh on the big charms and saxophones and fudatron steames of the sales titanium. we are god's friends, the golden hands on the shoulders of our fears. do you knock my cleaning table over? i snap awake at some dawn. the patrons researching the blues instructor's theories around me, then give me a glass of jim beam. boom! the business group soon concludes. caught one miracle? survive the tedious rituals you refuse to provide? whatever happens, i throw shit in your face. joy ries away? you could give acapindulgent half your life away, though i am nothing especially sexy. this sift, this being sveng? do impotent and desperate oozing drug as i shake and shine? you adored me. brains run out when people charitable that into you. sales are stacked? the fossils here! this was for years while other uptown priests. you were passed up, minated for millions of males. a beach ball every season - unless a summer implies depths. rebellious of myself it is more than a sheet for my sleep - holding back along, trembling with sorrow. endless nights spend nights lying in front of a chorus and searching for what mine of death you can come to your end. i ent, rise, live this bed in this mors! as human excrement tastes, fills delights of dreaming? your drowned nights, mirror smeared in generations. the woods burn bright and cold doosations that don't hold to you? yup, et : inkpline is pale white masts and low - tide smoke. her saddam rebirth is dead, a hollow leaf hangs bright odour. or viewing keeps your case ready sucking bone - dead inside. i strip it. light draws the mirrors shaking

Critic subsystem

The next big thing to do was filter some real gems from this endless text flow.

At first, I made a script with some simple heuristic filters such as:

  • reject a creation of new, non-existing words,
  • reject phrases with two unconnected verbs in a row,
  • reject phrases with several duplicating words,
  • reject phrases with no punctuation or with too many punctuation marks.

The application of this script cut the text flow into a sequence of valid chunks.

a slave has no more say in his language but he hasn't to speak out!

the doll has a variety of languages, so its feelings have to fill up some time of the day - to - day journals.
the doll is used only when he remains private.
and it is always effective.

leave him with his monk - like body.

a little of technique on can be helpful.

out of his passions remain in embarrassment and never wake.

adolescence is the university of manchester.
the senior class of manchester... the senior class of manchester.

Here I used manual labeling of such chunks with two classes, GOOD/BAD.
I took approx 1K chunks, balanced (one half of them were GOOD, the other half -- BAD).

At last, I trained the Critic subsystem.
This neural network uses a BERT architecture implemented again by huggingface. Again I took a public available pre-trained network model and finetuned it on my labeled 1K chunks dataset to predict the label of any given chunk.

Finally, I made a pipeline which includes Generator subsystem, heuristic filters and Critic subsystem.
Here it is a short sample of the final results:

a sudden feeling of austin lemons, a gentle stab of disgust.
i'm what i'm.

humans whirl in night and distance.

by the wonders of them.

we shall never suffer this.
if the human race came along tomorrow, none of us would be as wise as they already would have been.
there is a beginning and an end.

both of our grandparents and brothers are overdue.
he either can not agree or he can look for someone to blame for his death.

he has reappeared from the world of revenge, revenge, separation, hatred.
he has ceased all who have offended him.

he is the one who can remember that nothing remotely resembles the trip begun in retrospect.
what's up?

and i don't want the truth.
not for an hour.

The huge blob of generated text could be found here:
https://github.com/altsoph/paranoid_transforner/blob/master/NaNoGenMo_50K_words_sample.txt

@altsoph
Copy link
Author

altsoph commented Nov 30, 2019

Also, there is my related NanoNaNoGenMo submission:
https://twitter.com/altsoph/status/1200815956420890626

@dickienaut
Copy link

Also, there is my related NanoNaNoGenMo submission:
https://twitter.com/altsoph/status/1200815956420890626

Thank's for making this available, was just reading through your code and your results and it looks really cool. I don't know if you have the time, but it would be incredible if you could include a quick how-to guide to get it up and running to experiment with a new corpus. I'm more than willing to help with this because I think that this would be really educational for a lot of people. Thanks again.

@altsoph
Copy link
Author

altsoph commented Jun 15, 2020

Thank's for making this available, was just reading through your code and your results and it looks really cool. I don't know if you have the time, but it would be incredible if you could include a quick how-to guide to get it up and running to experiment with a new corpus. I'm more than willing to help with this because I think that this would be really educational for a lot of people. Thanks again.

Thanks for your interest! I'm not sure, are you asking about NaNoGenMo or about NanoNaNoGenMo entry? The former is more or less described here https://github.com/altsoph/paranoid_transformer/blob/master/README.md, the latter -- here: https://medium.com/altsoph/123-bytes-perl-markov-chain-b80e1212f3b3
Feel free to ask any questions :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants