Skip to content
/ SD Public

String dictionary with front coding compression

Notifications You must be signed in to change notification settings

WikiBox/SD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

SD

C++ String dictionary with front coding compression and serialization

sd is a C++ class implementing a compressed string dictionary. The class provides fast add, locate and extract using front coding compression. Front coding compression means that common prefixes are compressed. This makes this string dictionary suitable for compact storage and fast access to a huge amount of strings like URL:s or file paths. Using sd it may be possible to hold a complete dictionary in RAM instead of having to use slower disk-based storage.

Note:

  • Strings should be added in strict ascending lexicographic order. Otherwise compression will be poor and locate won't work.
  • Strings are stored as raw bytes, but '\0' cant't be used in a string because that char is used as string terminator inside the data structure.
  • BUCKET_SIZE may be changed to shift the trade-off between compression level and speed of access. Big BUCKET_SIZE means higher compression but slower access. Sane values are between 16 and 2048.

This software is simplified version of StringDictionaryPFC from libCSD:

==========================================================================
  "Compressed String Dictionaries"
  Nieves R. Brisaboa, Rodrigo Cánovas, Francisco Claude, 
  Miguel A. Martínez-Prieto, and Gonzalo Navarro
  10th Symposium on Experimental Algorithms (SEA'2011), p.136-147, 2011.
==========================================================================

See: https://github.com/migumar2/libCSD

Releases

No releases published

Packages

No packages published

Languages