Digestif

An aid for creating hash digests of large files

Synopsis

Digestif lets you create fast checksums of large files by
skipping sections of the file. It was created with compressed media
files in mind, which generally have such a high information density
that we can get away with a checksum that doesn’t actually consider all
the bits. Someday I’d like to understand the likelyhood-of-collision
implications for specific compression algorithms (mp3, h.264, xvid, et al.),
but right now I’m going to settle for guessing at where “good enough for me”
might lie.

One side-effect of this approach is that the error-corrective nature of
digests is, of course, lost. This is really more of an inescapable artifact
of the problem we’re trying to solve. To create a hash of a really large
file, the biggest bottleneck with modern computers is streaming
5-10 gigs off of the disk. The actual checksumming is not hard.
By looking at less data, we speed up the hash process immensely, and
we incur the cost of vulnerability of file corruption. Because the
purpose I have in mind for this tool is identity checking, not
corruption detection, this issue is not a problem for me.

Installation

gem install digestif

Usage

Just like md5 on the command line, but it only works on files, not on
streaming data (can’t seek a stream).

digestif some_large_file

Since this program is designed to get around file limitations specifically, it
didn’t make sense for me to invest in making streams work.

For a detailed look at the options, see

digestif --help

Motivation

I wrote digestif to solve a problem for a media catalogue I was working on.
I wanted a filename-independent way to evaluate whether or not a file was in
the catalogue yet, but the files were so large that streaming the whole file
off of the hard drive was too slow for the response time I was hoping for.
(Interested parties, I was getting 5 gigs hashed using md5 in about 2.4
minutes.)

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
bin		bin
features		features
lib		lib
.gitignore		.gitignore
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
History.txt		History.txt
LICENSE		LICENSE
README.textile		README.textile
Rakefile		Rakefile
digestif.gemspec		digestif.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bin

bin

features

features

lib

lib

.gitignore

.gitignore

Gemfile

Gemfile

Gemfile.lock

Gemfile.lock

History.txt

History.txt

LICENSE

LICENSE

README.textile

README.textile

Rakefile

Rakefile

digestif.gemspec

digestif.gemspec

Repository files navigation

Digestif

Synopsis

Installation

Usage

Motivation

Author

About

Releases

Packages

Languages

License

aroberts/digestif

Folders and files

Latest commit

History

Repository files navigation

Digestif

Synopsis

Installation

Usage

Motivation

Author

About

Resources

License

Stars

Watchers

Forks

Languages