Skip to content

ScanBytes/ScanBytes.cpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ScanBytes Unlicensed work

GitHub Actions Libraries.io Status

ScanBytes is a tool and a lib for scanning files for occurrences of certain bytes fast.

It can be useful for example for creating an index of a CSV/TSV file or just a file with lines. Then a certain line/record can be quickly fetched by its index.

Features

  • Multithreading brings performance benefits when data fits in disk caches.
  • 2 generic backends, one is JIT-ed one, another one is not-JITed. Obviously, JIT is supported only on certain platforms, currently only x86_64.
  • Specialized hardcoded backend for scanning certain common cases:
    • lines in a file
    • TSV
    • CSV
  • Automatic dispatching between backends.
  • Built-in benchmark.

Example

echo "The quick brown fox jumps over the lazy dog" > fox.txt
#     0123456789ABCDEF0123456789ABCDEF01234567
ScanBytes --alphabet " fh" s fox.txt | hd
00000000  01 00 00 00 00 00 00 00  03 00 00 00 00 00 00 00  |................|
00000010  09 00 00 00 00 00 00 00  0f 00 00 00 00 00 00 00  |................|
00000020  10 00 00 00 00 00 00 00  13 00 00 00 00 00 00 00  |................|
00000030  19 00 00 00 00 00 00 00  1e 00 00 00 00 00 00 00  |................|
00000040  20 00 00 00 00 00 00 00  22 00 00 00 00 00 00 00  | .......".......|
00000050  27 00 00 00 00 00 00 00                           |'.......|
00000058

As you see, there is a lot of redundancy in the output. It can be compressed by encoding it into a proper data structure, but it is currently notimplemented in C++.

Installation

Packaging with CPack is implemented, you can generate an installable package for Debian and RPM-based distros. All the dependencies are assummed to be installed the same way.

Related projects

About

A library and a CLI tool for scanning files for occurences of certain bytes and then outputting them into CLI.

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published