Skip to content

MrPike/RSS2Ebook

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RSS2Ebook

A Python script that converts articles from an RSS feed into a single Ebook (ePub, PDF, docx - pandoc powered). The original motivation for this project was to create an ePub containing Paul Graham's Essays. This approach was then generalised to include any RSS source (but the default configuration still targets Paul Graham's work).

The script is (highly) dependant on some excellent Python libraries, including FeedParser, Goose3 and Pandoc. FeedParser does the heavy lifting with the RSS side of things, providing an easy to use interface for extracting the URL and Title of the linked articles. The URL is then fed into Goose, which extracts the primary article content of the linked page and returns 'clean' text.

## Installation and Running

  1. Optional, but recommended - Virualenv setup:
pip install virtualenv
cd RSS2Ebook
virtualenv --python=/usr/bin/python3.6 --no-site-packages RSS2Ebook
source RSS2Ebook/bin/activate
  1. Install required packages
pip install -r /path/to/requirements.txt

Also, your system needs to have Pandoc installed. If you're on a Mac, use the following instructions:

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
brew install pandoc
pandoc --version

If you are on another platform, or have issues with installing pandoc, please follow the official installation instructions provided by this excellent project

  1. Set the appropriate variables in the python script, specifically:
# The URL of the RSS feed you would like to parse
FEED_URL = 'http://www.aaronsw.com/2002/feeds/pgessays.rss'
# The name of the resulting ebook. Note - the extension is important,
# and will dictate teh format of the output.
EBOOK_NAME = 'Paul_Graham_Essays.epub'
  1. Run the script:
python RSS2Ebook.py
  1. Check your ebook. It will be generated in the location specfied by the EBOOK_NAME variable, in the python script.

About

A Python script that converts articles from an RSS feed into a single Ebook (ePub, PDF, docx - pandoc powered).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages