File-Compressor-Decompressor

Introduction
Hufman Coding
Linear Block Codes
Decompression
Usage

Introduction

A python administered project which uses Huffman Encoding Technique to compress a text file into a binary file.
Utilizes Linear Block Codes to identify any errors committed while compression.

Huffman Coding

The thought process behind Huffman encoding is that a letter or a symbol which occurs frequently is represented by a shorter code, and the one that appears rarely is represented by a longer code. This leads to an efficient representation of characters that require less memory to be stored.
To compress a sequence of characters, we need a table that gives us the sequences of bits used for each character. This table creates an encoding tree that uses the root/leaf path to create a bit sequence to encodes the characters. Using this we can create a list of all characters with the maximum bit length of the encoded characters and the number of occurrences.
To construct an optimal tree, we use a greedy algorithm. Huffman encoding trees return the minimum length character encodings used in data compression. The nodes in the tree represent the frequency of a character’s occurrence. The root node represents the length of the string, and traversing the tree gives us the character-specific encodings. Once the tree is constructed, traversing the tree gives us the respective codes for each symbol.

Linear Block Codes

In coding theory, a linear code is an error-correcting code for which any linear combination of codewords is also a codeword. Linear codes are traditionally partitioned into block codes and convolutional codes, although turbo codes can be seen as a hybrid of these two types. Linear codes allow for more efficient encoding and decoding algorithms than other codes.
Linear codes are used in forward error correction and are applied in methods for transmitting symbols (e.g., bits) on a communications channel so that, if errors occur in the communication, some errors can be corrected or detected by the recipient of a message block. The codewords in a linear block code are blocks of symbols that are encoded using more symbols than the original value to be sent.
A linear code of length n transmits blocks containing n symbols. For example, the [7,4] Hamming code is a linear binary code which represents 4-bit messages using 7-bit codewords. Two distinct codewords differ in at least three bits. As a consequence, the error detecting ability is 2 bits per codeword while the error correcting ability is 1.

Decompression

For the final and easiest step, we use a hashmap data structure.
We loop through the binary file and keep adding the binary digits to an empty string curr_code until it matches with one of the codewords in the hashmap whose corresponding symbol added to string decode.
As soon as we add the decoded character or symbol to the decode, the curr_code is again made empty and the same process is repeated from the next binary digit.

Usage

Run decompress.py

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.gitignore		.gitignore
Huffman.py		Huffman.py
README.md		README.md
decompress.py		decompress.py
lbc.py		lbc.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

Huffman.py

Huffman.py

README.md

README.md

decompress.py

decompress.py

lbc.py

lbc.py

Repository files navigation

File-Compressor-Decompressor

Introduction

Huffman Coding

Linear Block Codes

Decompression

Usage

About

Releases

Packages

Languages

shrenik-jain/File-Compressor-Decompressor

Folders and files

Latest commit

History

Repository files navigation

File-Compressor-Decompressor

Introduction

Huffman Coding

Linear Block Codes

Decompression

Usage

About

Topics

Resources

Stars

Watchers

Forks

Languages