Skip to content

aashrafh/enwik8

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

About

An attempt to compress the first 100 MB of Wikipedia which is called enwik8 using LZW(Lempel–Ziv–Welch) and BZip2-Like algorithms with variable length encoding.

Results

  • LZW:
    • Compression ratio: 2.905
    • Compressed file size: 32 MB
  • BZip2-Like:
    • Compression ratio: 3.855
    • Compressed file size: 24 MB

How to run

  • Compression
    1. Open a terminal on the directory containing the code
    2. Generate the binary file using command: g++ -o encoder.exe encoder.cpp
    3. Run the binary file: ./encoder.exe
  • Decompression
    1. Open a terminal on the directory containing the code
    2. Generate the binary file using command: g++ -o decoder.exe decoder.cpp
    3. Run the binary file: ./decoder.exe

To Do

  • A Decoder for the BZip2-Like algorithm