Simple link shortener with powerful and efficient webpage archive.
Shortened links that redirects to the original. If the original is down, a cached version of the webpage is displayed.
- In-memory data store
- Webpage scraping
- HTML templating
- Character encodings (bytes and strings)
/ (GET
, POST
) : A link is input on the page by the user and all <style>
and <script>
are stripped from it, only plain HTML is kept, hyperlinks are disabled too. The plain HTML code is then encoded using base64encoding. The b64_code
is searched in data store and if found, a previously generated shortid
is returned to the user. If this is the first time that link is being shortened, then a shortie
is generated (using Hashids library) based on the current timestamp and added to data store alongwith link
and b64_code
.
The link
, shortid
, and b64_code
is stored to redis following the given schema:
/shortid (GET
) : shortid
from the link is lookedup in data store and if not found an "Invalid shortlink!" message is shown to the user. If a valid shortid
is found, then the corresponding b64_code
and link values are fetched. If the fetched link is up (returns a success response code 200) then user is redirected to it, else b64_code
fetched from store is decoded to display cached version of the webpage.
- Hashes in Redis
- RedisLabs Doc on Redis
- Hashids Doc
- Guide to Parsing HTML with BeautifulSoup in Python
- Original idea of base64 encoding on HN
- URL shortener design and hashids usage
- Original bs4 html scraping method - bumpkin's answer
- A similar tool
- Save page as an image or pdf format (maybe)
- Add option to view cached page before redirect