Quotations from persons of repute or infamy are often deployed to lend weight to ideas and rhetoric. It's a perfectly respectable technique - albeit a logical fallacy - and attested back to the early days of persuasion. And in a time when mass communication was all but unheard of, and personal communication was slow, a quotation from an authority could be validated in time and arguments disputed.
But on the internet, as the New Yorker once said, no one knows you're a dog. And while that may no longer be true for humans, it is still true of quotations. They can be deployed willy-nilly, with scarcely a thought given to accuracy or even validity. Yet by their mere presence, and the unverified attribution, they give unearned weight and substance to ideas and arguments.
There is a dedicated community of people out there who hunt down the sources of some of these spurious quotations and publish their findings. But it is a solitary and thankless task, apart from the joy of the hunt, and the creation of this false wisdom continues at industrial speed.
This project began life as a capstone for the Data Science Immersive course at General Assembly in Washington, DC. I set out to answer whether or not a computer can be trained to recognize a quotation as valid or invalid from the patterns latent in a published writer's works.
My initial version was, frankly, what you'd expect from a novice data scientist. In this repository, I will attempt to fix the errors of that first version and produce an effective validator.