Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Markov text with citations #18

Open
serin-delaunay opened this issue Oct 31, 2020 · 6 comments
Open

Markov text with citations #18

serin-delaunay opened this issue Oct 31, 2020 · 6 comments

Comments

@serin-delaunay
Copy link

A common criticism of GPT language models is that they plagiarise text from the internet. As an experiment in smoothing over this issue, I will make a Markov chain language model that tags each n-gram observation with the location of the original in the source text.

This means that in the text generation stage, each output token can cite the n-gram it was drawn from in the source text. In the generated novel, I'll put this info in footnotes. This should make the resulting text much better sourced, and give the reader clarity about the true origin of any deep insights found in the novel.

Haven't decided what source text to use. Maybe Shakespeare (all lines have a standard identifier), GPT research papers, Moby Dick...

Caveats:

  • I'll probably need to generate LaTeX to keep the footnotes organised.
  • The procedure would be difficult to port into GPT models.
  • Most of the 50,000 words would be in the footnotes.
@serin-delaunay
Copy link
Author

If there's time I might also do a slightly more serious separate entry that doesn't boil down to "YAMC".

@pjfpotter
Copy link

Why not write an entire novel of footnotes? Each footnote is a citation of the n-gram that would have been in the novel but then wasn't because it was replaced by it's own citation. Let's see how deep this rabbit hole goes.

@serin-delaunay
Copy link
Author

There's one like that at NaNoGenMo/2019#68; I'd rather keep this one simple. The footnotes will have a pretty well-defined format, so they wouldn't need to be Markov-generated or nested.

@greg-kennedy
Copy link

This is the one that comes to mind when I think of obsessive footnotes: NaNoGenMo/2019#127

@serin-delaunay
Copy link
Author

Yeah, that's closer to what I'm going for here. Thanks for the link, I saw that one last year but it had slipped my mind.

@verachell
Copy link

What a cool idea!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants