Skip to content

AlasdairF/BinSearch

Repository files navigation

##BinSearch

BinSearch is a super-efficient, in-memory key/value data structure for Go. In future it could also expand to be disk-based easily enough.

##Features

  • Supports keys in the following types: []byte, []rune, int, uint64, uint32, uint16, uint8.
  • Supports the following data structures: Key/Index store, Key/Val store, Counter (Accumulator).
  • Key/Index store allows for any value structure to be used along with the key.
  • Includes Read and Write functions for reading and writing the structure to disk.
  • Backend is binary search with a great number of optimizations.
  • Written with focus on high speed and low memory footprint.

##Advantages

  • Incredibly memory efficient, max 5KB of memory overhead.
  • []byte and []rune keys are compacted and therefore use less memory than the original slice, while remaining lossless and retrievable.
  • Very, very fast. Much faster than the native map for almost all applications.

##Disadvantages

  • Slow for sequential: find, add key, find, add key... in those cases the native map structure is faster. However, this problem can usually be solved by using Counter and Key/Index stores together, which restores the superiority to BinSearch for both speed and memory efficiency.
  • Maximum key size for []byte and []rune types is 64 bytes.

##Installation

go get github.com/AlasdairF/BinSearch

##Usage

The different structure names are one of Key, KeyVal, Counter, followed by one of Bytes, Runes, Int, Uint64, Uint32, Uint16, Uint8. E.g. KeyValBytes, CounterUint32.

Key and KeyVal types should never have duplicate keys added. It is important to either use Find(key) to check if a key exists before adding it (see Example 4), or to use the Counter structure to remove duplicates first (see Example 10).

Most structures require the keys to be added first and then the Build() function executed before any Find(key) is performed. The exception to this is the Add(key) function from Key and KeyVal types, which does not require the use of Build() and which does allow for Find(key) to be performed at any time, but the insertion of the keys is considerably slower. In most cases this is not necessary and there is usually a way to avoid using Add(key).

###Key Type

The Key store records only the keys and its Find(key) function will return an index number, which is the position of the key. You can store any value in a separate slice under this index. The Key store is also useful if you don't need any values, for example you want to see only if a word exists in a dictionary.

###KeyVal Type

The KeyVal store is similar to the Key store but it also features an int value associated with each key. This value is always returned by the Find(key) function instead of the index. If you do not want an int value then use the Key store and implement your own value slice.

###Counter Type

The Counter type adds up all of the values associated with every identical key element. It is therefore useful for removing duplicates from a list, for tallying up scores, and for counting the number of occurances of each key. Values are int and so may be positive or negative; the value is irrelevant if Counter is being used to remove duplicates.

##Index

INDEX

KeyBytes, KeyRunes
	func (t *KeyBytes) Len() int
	func (t *KeyBytes) Find(thekey []byte) (int, bool)					Returns: index, exists.
	func (t *KeyBytes) Add(thekey []byte) (int, bool)					Returns: index, exists. Adds the key if it does not already exist and returns the new index, otherwise returns the current index of the existing key.
	func (t *KeyBytes) AddAt(thekey []byte, i int) error				Returns error if thekey > 64 bytes
	func (t *KeyBytes) AddUnsorted(thekey []byte) error					Returns error if thekey > 64 bytes
	func (t *KeyBytes) Build() ([]int, error)							Returns slice mapping old indexes to new indexes. Can only be used after AddUnsorted, otherwise returns an error.
	func (t *KeyBytes) Optimize()										Copies all the data to new slices with capacity equal to length.
	func (t *KeyBytes) Reset() bool										Returns false if the structure is empty (Len() == 0)
	func (t *KeyBytes) Next() ([]byte, bool)							Returns: original slice of bytes, EOF (true = EOF)
	func (t *KeyBytes) Keys() [][]byte									Returns slice containing all the keys in order
	func (t *KeyBytes) Write(w *custom.Writer)							Writes built structure out to custom.Writer (requires github.com/AlasdairF/Custom)
	func (t *KeyBytes) Read(r *custom.Reader)							Reads structure in from custom.Reader (requires github.com/AlasdairF/Custom)
	
KeyValBytes, KeyValRunes
	func (t *KeyValBytes) Len() int
	func (t *KeyValBytes) Find(thekey []byte) (int, bool)				Returns: value, exists
	func (t *KeyValBytes) Update(thekey []byte, fn func(int) int) bool	Returns boolean value for whether the key exists or not, if it exists the value is modified according to the fn function
	func (t *KeyValBytes) UpdateAll(fn func(int) int)					Modifies all values by the fn function
	func (t *KeyValBytes) Add(thekey []byte, theval int) bool			Returns whether it exists. Replaces old value with the new value if it exists, otherwise adds it in place.
	func (t *KeyValBytes) AddUnsorted(thekey []byte, theval int) error	Returns error if thekey > 64 bytes
	func (t *KeyValBytes) Build()										Only required to be called after AddUnsorted, otherwise it will shrink array capacity to length.
	func (t *KeyValBytes) Optimize()									Copies all the data to new slices with capacity equal to length.
	func (t *KeyValBytes) Reset() bool									Returns false if the structure is empty (Len() == 0)
	func (t *KeyValBytes) Next() ([]byte, int, bool)					Returns: original slice of bytes, value, EOF (true = EOF)
	func (t *KeyValBytes) Keys() [][]byte								Returns slice containing all the keys in order
	func (t *KeyValBytes) Write(w *custom.Writer)						Writes built structure out to custom.Writer (requires github.com/AlasdairF/Custom)
	func (t *KeyValBytes) Read(r *custom.Reader)						Reads structure in from custom.Reader (requires github.com/AlasdairF/Custom)
	
CounterBytes, CounterRunes
	func (t *CounterBytes) Len() int									Len() is only accurate after Build()
	func (t *CounterBytes) Find(thekey []byte) (int, bool)				Returns: frequency, exists. Will return nonsensical results if used before Build() is executed; only use after Build.
	func (t *CounterBytes) Update(thekey []byte, fn func(int) int) bool	Returns boolean value for whether the key exists or not, if it exists the value is modified according to the fn function
	func (t *CounterBytes) UpdateAll(fn func(int) int)					Modifies all values by the fn function
	func (t *CounterBytes) Add(thekey []byte, theval int) error			Returns an error if thekey > 64 bytes
	func (t *CounterBytes) Build()										Always required before Find.
	func (t *CounterBytes) Optimize()									Copies all the data to new slices with capacity equal to length.
	func (t *CounterBytes) Reset() bool									Returns false if the structure is empty (Len() == 0)
	func (t *CounterBytes) Next() ([]byte, int, bool)					Returns: original slice of bytes, value, EOF (true = EOF)
	func (t *CounterBytes) Keys() [][]byte								Returns slice containing all the keys in order
	func (t *CounterBytes) Write(w *custom.Writer)						Writes built structure out to custom.Writer (requires github.com/AlasdairF/Custom)
	func (t *CounterBytes) Read(r *custom.Reader)						Reads structure in from custom.Reader (requires github.com/AlasdairF/Custom)
	func (t *CounterBytes) KeyBytes() *KeyBytes							Copies keys to a KeyBytes structure
	func (t *CounterBytes) KeyValBytes() *KeyBytes						Copies keys and values to a KeyValBytes structure
	
KeyInt, KeyUint64, KeyUint32, KeyUint16, KeyUint8
	func (t *KeyInt) Len() int
	func (t *KeyInt) Find(thekey []byte) (int, bool)					Returns: index, exists.
	func (t *KeyInt) Add(thekey []byte) (int, bool)						Returns: index, exists.
	func (t *KeyInt) AddAt(thekey []byte, i int)
	func (t *KeyInt) AddUnsorted(thekey []byte)
	func (t *KeyInt) Build() []int										Returns slice mapping old indexes to new indexes. Only required if AddUnsorted was used, otherwise it will shrink array capacity to length.
	func (t *KeyInt) Optimize()											Copies all the data to new slices with capacity equal to length.
	func (t *KeyInt) Reset() bool										Returns false if the structure is empty (Len() == 0)
	func (t *KeyInt) Next() (uint64, bool)								Returns: key, EOF (true = EOF)
	func (t *KeyInt) Keys() []uint64									Returns slice containing all the keys in order
	func (t *KeyInt) Write(w *custom.Writer)							Writes built structure out to custom.Writer (requires github.com/AlasdairF/Custom)
	func (t *KeyInt) Read(r *custom.Reader)								Reads structure in from custom.Reader (requires github.com/AlasdairF/Custom)
	
KeyValInt, KeyValUint64, KeyValUint32, KeyValUint16, KeyValUint8
	func (t *KeyValInt) Len() int
	func (t *KeyValInt) Find(thekey uint64) (int, bool)					Returns: value, exists
	func (t *KeyValInt) Update(thekey uint64, fn func(int) int) bool	Returns boolean value for whether the key exists or not, if it exists the value is modified according to the fn function
	func (t *KeyValInt) UpdateAll(fn func(int) int)						Modifies all values by the fn function
	func (t *KeyValInt) Add(thekey uint64, theval int) bool				Returns whether it exists. Replaces old value with the new value if it exists, otherwise adds it in place.
	func (t *KeyValInt) AddUnsorted(thekey uint64, theval int)
	func (t *KeyValInt) Build()											Only required to be called after AddUnsorted, otherwise it will shrink array capacity to length.
	func (t *KeyValInt) Optimize()										Copies all the data to new slices with capacity equal to length.
	func (t *KeyValInt) Reset() bool									Returns false if the structure is empty (Len() == 0)
	func (t *KeyValInt) Next() ([]byte, int, bool)						Returns: original slice of bytes, value, EOF (true = EOF)
	func (t *KeyValInt) Keys() []uint64									Returns slice containing all the keys in order
	func (t *KeyValInt) Write(w *custom.Writer)							Writes built structure out to custom.Writer (requires github.com/AlasdairF/Custom)
	func (t *KeyValInt) Read(r *custom.Reader)							Reads structure in from custom.Reader (requires github.com/AlasdairF/Custom))
	
CounterInt, CounterUint64, CounterUint32, CounterUint16, CounterUint8
	func (t *CounterInt) Len() int										Len() is only accurate after Build()
	func (t *CounterInt) Find(thekey uint64) (int, bool)				Returns: frequency, exists. Will return nonsensical results if used before Build() is executed; only use after Build.
	func (t *CounterInt) Update(thekey uint64, fn func(int) int) bool	Returns boolean value for whether the key exists or not, if it exists the value is modified according to the fn function
	func (t *CounterInt) UpdateAll(fn func(int) int)					Modifies all values by the fn function
	func (t *CounterInt) Add(thekey uint64, theval int)
	func (t *CounterInt) Build()										Always required before Find.
	func (t *CounterInt) Optimize()										Copies all the data to new slices with capacity equal to length.
	func (t *CounterInt) Reset() bool									Returns false if the structure is empty (Len() == 0)
	func (t *CounterInt) Next() ([]byte, int, bool)						Returns: original slice of bytes, value, EOF (true = EOF)
	func (t *CounterInt) Keys() []uint64								Returns slice containing all the keys in order
	func (t *CounterInt) Write(w *custom.Writer)						Writes built structure out to custom.Writer (requires github.com/AlasdairF/Custom)
	func (t *CounterInt) Read(r *custom.Reader)							Reads structure in from custom.Reader (requires github.com/AlasdairF/Custom)
	func (t *CounterInt) Copy() *KeyInt									Copies keys to a KeyInt structure

##Examples

###1. Basic KeyVal usage with AddUnsorted & Build (fast)

obj := new(binsearch.KeyValBytes) // create BinSearch structure
obj.AddUnsorted([]byte("something"), 277) // add at the end, Find() will not yet work
obj.AddUnsorted([]byte("word"), 1000)
obj.AddUnsorted([]byte("hello"), 55)
obj.Build() // when using AddUnsorted, Build() must be executed before Find()
if val, exists := obj.Find([]byte("word")); exists {
	fmt.Println(`word is`, val) // word is 1000
}

###2. Basic KeyVal usage with Add (slow)

obj := new(binsearch.KeyValBytes) // create BinSearch structure
obj.Add([]byte("something"), 277) // Add() does the add in the correct position so Find() can be used already
obj.Add([]byte("word"), 1000)
if val, exists := obj.Find([]byte("word")); exists { // Find() can be used without Build() because we used Add()
	fmt.Println(`word is`, val) // word is 1000
}
obj.Add([]byte("hello"), 55)
if val, exists := obj.Find([]byte("hello")); exists {
	fmt.Println(`hello is`, val) // hello is 55
}

###3. Using a custom value with AddUnsorted & Build (fast)

obj := new(binsearch.KeyBytes)
valstore := make([]mystruct, 0, 100)
obj.AddUnsorted([]byte("first")) // Add the key to the end of the Key structure
valstore = append(valstore, mystruct{123}) // Add the value using append, to the end of the value structure
obj.AddUnsorted([]byte("second"))
valstore = append(valstore, mystruct{456})
newindexes, err := obj.Build() // only with Key stores, Build() returns a map of old indexes to new indexes to reorder the 
temp := make([]mystruct, len(valstore)) // make a new slice for reordering the values
for indx_new, indx_old := range newindexes { // loop through the mapping of old indexes to new indexes
	temp[indx_new] = valstore[indx_old] // copy the value from the old index location to the new index location on the new slice
}
valstore = temp // replace the old slice with the new sorted slice
if indx, exists := obj.Find([]byte("second")); exists {
	val := valstore[indx]
	fmt.Println(val) // mystruct{456}
}

###4. Using a custom value with Add (slow)

Note: this can be done much more efficiently using a Counter. See Example 10 below.

key := []byte("test")
obj := new(binsearch.KeyBytes)
val := make([]mystruct, 0, 100)
if indx, exists := obj.Add(key); exists {
	fmt.Println(`Value is`, val[indx])
} else {
	val = append(val, mystruct{}) // Enlarge by 1
	copy(val[indx+1:], val[indx:]) // Make space at indx
	val[indx] = mystruct{123} // Add your value into the correct position
}

###5. Using KeyBytes as a dictionary

dictionary := new(binsearch.KeyBytes)
dictionary.AddUnsorted([]byte("aardvark"))
dictionary.AddUnsorted([]byte("pineapple"))
dictionary.AddUnsorted([]byte("zulu"))
dictionary.Build()
if _, exists := dictionary.Find([]byte("pineapple")); exists {
	fmt.Println(`It's in the dictionary!`)
} else {
	fmt.Println(`It's not in the dictionary!`)
}

###6. Using CounterBytes to tally scores

obj := new(binsearch.CounterBytes)
obj.Add([]byte("john"), 5) // Counter structures only have the Add() function and it is always very fast
obj.Add([]byte("fred"), 2)
obj.Add([]byte("fred"), 4)
obj.Add([]byte("bill"), 1)
obj.Build()
if val, exists := obj.Find([]byte("fred")); exists {
	fmt.Println(`Fred scored`, val) // Fred scored 6
}

###7. Using CounterBytes to count occurances

obj := new(binsearch.CounterBytes)
obj.Add([]byte("the"), 1)
obj.Add([]byte("the"), 1)
obj.Add([]byte("the"), 1)
obj.Add([]byte("and"), 1)
obj.Build()
if val, exists := obj.Find([]byte("the")); exists {
	fmt.Println(`the appears`, val, `times`) // the appears 3 times
}

###8. Using CounterBytes to remove duplicates

obj := new(binsearch.CounterBytes)
obj.Add([]byte("the"), 0)
obj.Add([]byte("the"), 0)
obj.Add([]byte("the"), 0)
obj.Add([]byte("and"), 0)
obj.Build()
uniques := obj.Keys()

###9. Creating a dictionary from results which may contain duplicates

obj := new(binsearch.CounterBytes)
obj.Add([]byte("aardvark"), 0)
obj.Add([]byte("aardvark"), 0)
obj.Add([]byte("zulu"), 0)
obj.Add([]byte("zulu"), 0)
obj.Build()
dictionary := obj.Copy() // Copy() only exists for Counter structures, it copies the structure into a Key structure, this is only really necessary to save memory
if _, exists := dictionary.Find([]byte("pineapple")); exists {
	fmt.Println(`It's in the dictionary!`)
} else {
	fmt.Println(`It's not in the dictionary!`)
}

###10. Creating a custom Key/Value structure from results which may contain duplicates

Note: the result is equivalent to Example 4, but this is much faster.

origdata := [][]byte{[]byte("a"), []byte("b"), byte("c"), byte("c"), byte("d")}
origval := []mystruct{mystruct{10}, mystruct{20}, mystruct{30}, mystruct{40}}
obj := new(binsearch.CounterBytes)
for _, word := range origdata {
	obj.Add(word, 0) // add all of the keys to the counter, ignoring the values
}
obj.Build()
newobj := obj.Copy() // copy CounterBytes to KeyBytes
valstore := make([]mystruct, newobj.Len()) // we already know the number of unique keys, so the size of the value storage structure is known
var indx int
for i, word := range origdata { // loop through all keys again
	indx, _ = newobj.Find(word) // find the index position of this key
	valstore[indx] = origval[i] // add the value into the appropriate position
}

###11. Idiomatic way to range over the structure

obj := new(binsearch.KeyValBytes)
// ... pretend keys are added and Build() is executed here
if obj.Reset() { // Reset() must be called first and it must be checked as Next() will panic if called on an empty structure
	var key []byte
	var val int
	for eof := false; !eof; {
		key, val, eof = obj.Next() // Bytes and Runes keys are converted back to the original `[]byte` or `[]rune` from their compacted version when using Next()
	}
}

###12. Saving to file

import "github.com/AlasdairF/Custom"
func save(filename string, obj *binsearch.KeyValBytes) error {
	fi, err := os.Create(filename)
	if err != nil {
		return err
	}
	defer fi.Close()
	w := custom.NewWriter(fi)
	defer w.Close()
	obj.Write(w)
	return nil
}

###12. Reading from file

import "github.com/AlasdairF/Custom"
func load(filename string) (*binsearch.KeyValBytes, error) {
	// Open file for reading
	fi, err := os.Open(filename)
	if err != nil {
		return nil, err
	}
	defer fi.Close()
	// Attach reader
	r := custom.NewReader(fi, 20480)
	// Load binsearch.KeyRunes
	obj := new(binsearch.KeyValBytes)
	obj.Read(r) // do the reading
	// Make sure we're at the end and the checksum is OK
	if r.EOF() != nil {
		return nil, errors.New(`Not a valid binsearch structure.`)
	}
	return obj, nil
}

About

Super-efficient, in-memory key/value data store

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages