etrizzo/Huffman
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Author: Emily Rizzo This is a command line huffman compressor/decompressor, which can read text files, generate Huffman trees for those files, encode files into compressed "huffman files", and decompress huffman files into text files. Contents of README: I. HUFFMAN.CPP - STRUCTURE //structure of the program II. HUFFMAN.CPP - CLASSES //classes contained in the program III. HUFFMAN.CPP - FUNCTIONS //important functions IV. COMPRESSED FILE FORMAT //format for the compressed huffman files generated by the program V. CHANGING DEFAULT EXTENSIONS/DIRECTORIES //variables to change the extensions and directories for the files read and outputted by this program =========================== I. HUFFMAN.CPP - STRUCTURE =========================== main() runs a while loop, which takes in and executes commands from cin. These commands are: run <filename> //runs encode and decode on <filename> (ex: "run file.txt" would run encode and decode on "TXT_DIR/file.txt") encode <filename> //only encodes <filename> decode <filename> //only decodes <filename> print //prints the huffman tree for the last file to be encoded help //prints a brief summary of all commands files //prints a list of all default files submitted with this assignment, and a brief description of each exit //exits the program After exiting, main() runs a cleanup function to free memory allocated by the tree. =========================== II. HUFFMAN.CPP - CLASSES =========================== == NODE CLASS == - name (string) : character or string stored in the node) - frequency (string) : number of times the node occurs in the file - code (string) : bit code generated by the huffman tree (path from root where L adds '0' and R adds '1') - left (Node) : left child of the node - right (Node) : right child of the node - isLeaf (bool) : true if the node is a leaf (leaves are single characters in huffman trees) = void toString() - outputs node in format " name || freq || code " = vector<Node *> postOrder() - recursively performs post order traversal with current node as the root. Returns vector of nodes in post order. Also generates/updates "code" for each node traversed. Called by tree root to generate codes, or retrieve vector of nodes in post order for encoding. == TREE CLASS == - vector<Node *> freqs : collection of leaves in the tree, with frequencies. Generated when reading a text file. - vector<Node *> huffList : temporary storage used to generate internal nodes and tree structure by calling Huffman() - Node * root : root node of the tree = Node* contains(string name) - searches tree for a leaf node with name "name", and returns it if it exists. if no such node is in the tree, creates a new node(name) and pushes it to freqs and huffList. = void addChar(string name) - increments the frequency of the node with name "name", either by retrieving and increment existing node, or creating a new node if "name" is new. = void Huffman() - constructs huffman tree using huffList. Called when all leaves have been added to huffList. constructs tree by repeatedly combining two lowest frequency nodes (a) and (b) in huffList into a new node (ab), with frequency a.frequency + b.frequency When huffList.size() == 1, the remaining node is the root of the tree. Then runs postOrder() on the root to determine the bit codes for all nodes in the tree. = string Encode() - returns an encoding of a post-order traveral of the huffman tree with the format outlined in COMPRESSED FILE FORMAT section below. EOF character is encoded as byte "11111111" or 255 as an unsigned char, which is not on the ASCII chart =========================== III. HUFFMAN.CPP - FUNCTIONS =========================== Important functions in huffman.cpp are as follows: === Tree makeTree(string fileName) === - Generates and returns a huffman tree for the text file "fileName" by: 1. adding a node to t for each character in the file, and finding the correct frequencies for each char by reading through the file once 2. calling t.Huffman() to generate the tree structure and bit codes for the nodes. The tree can then be used to encode/compress the characters in the text file. When all characters in the file have been read, adds an "EOF" node with frequency 1, to generate EOF bit code. === long encode(string fileRoot, Tree t) === - Encodes/compresses the file "TXT_DIR + fileRoot + TXT_EXT", using the tree t. The compressed file is written to the file "HUF_DIR + fileRoot + HUF_EXT" Compression is done by converting the string of characters in the file to a string representation of the bit codes determined by the tree, and then that string of bits are compressed into characters. This function returns the number of bytes written to the compressed file, for comparison with the original file. === long decode(string fileRoot) === - Decodes/decompresses the file "HUF_DIR + fileRoot + HUF_EXT" by first decoding an encoded huffman tree, then using that tree to decode the remainder of the file (see COMPRESSED FILE FORMAT). The decoded file is written to the file "OUT_DIR + fileROOT + OUT_EXT" Decompression is done by converting the string of characters in the file to a string representation of all bits in the file, then calling decodeBytes() on that string of bits. This function returns the number of bytes written to the decompressed file, for comparisson with the compressed file. === string decodeBytes(string bytes) === - Decodes a string of bytes by first calling decodeTree(bytes) to generate a huffman tree, and then using that tree to decode the remainder of the file. Because the final byte of the file may have been padded with 0's, decodeBytes stops reading when the EOF character is encountered === Tree decodeTree(string bytes) === - Decodes and returns a huffman tree from bytes by: 1. Pulling the number of leaves (numLeaves) in the tree from the first byte 2. Generate tree stucture by reading through bits and adding nodes to a stack as follows: - if a '1' is encountered, read the next 8 bits to get the char represented by that leaf. Push this leaf onto the stack. Increment a counter for the number of leaves read (leavesFound). - if a '0' is encountered, pop two nodes off the stack and combine to form an internal node. Push this node onto the stack. When the leavesFound = numLeaves (all leaves have been read), AND the stack contains only one node (the root), the tree has been fully decoded. 3. Call postOrder() on the root to generate the bit codes for each node in the tree before returning the tree. =========================== IV. COMPRESSED FILE FORMAT =========================== This program encodes huffman-compressed files in the following format: <numLeaves> <encoded tree> <compressed file> <EOF> where: <numLeaves> == a single byte representing the number of leaves in the huffman tree. <encoded tree> == a post-order traversal of the tree, encoded in the following way: - Leaves are represented as a '1' followed by the 8 bits representing the character in that leaf. - Internal nodes are represented as a '0', with no additional bits <compressed file> == the compressed text from the original file, where each character from the original file has been converted to the bit code determined by the huffman tree <EOF> == a single 'character' added to the end of the file to signify when to stop reading bits - the final byte is padded to 8 bits with trailing 0's, which could be read incorrectly without the EOF. =========================== V. CHANGING DEFAULT EXTENSIONS/DIRECTORIES =========================== Default directory locations and extensions can be modified in the file huffman.cpp, by changing the global variables: TXT_EXT -- the input file extension (by default ".txt") HUF_EXT -- the extension added to the root filename for compressed files (by default "-HUF.txt"); OUT_EXT -- the extension added to the root filename for decompressed files (by default "-o.txt"); TXT_DIR -- directory of source files (by default "TXT-files/") HUF_DIR -- directory for files compressed by the program (by default "HUF-files/") OUT_DIR -- directory for files decompressed by the program (by default "OUT-files/")
About
This is a command line huffman compressor/decompressor, which can read text files, generate Huffman trees for those files, encode files into compressed "huffman files", and decompress huffman files into text files. Written in C++
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published