Skip to content

split() vs. prepare()

magnum edited this page Nov 29, 2020 · 13 revisions

Should they be in prepare, or in split? What are the nuances and differences?

http://www.openwall.com/lists/john-dev/2014/06/24/1

http://www.openwall.com/lists/john-dev/2011/07/06/8

Some key differences (apart from the crucial fact prepare() is the only one function that can see other input file fields than "ciphertext"):

  • pot file lines doesn't go through prepare(). However, they DO go through split() in order to allow case-correction for legacy pot file entries and things like that but only with index 0 (actually they will even be rejected before calling split() if valid() returns >1) . Among other things, this means if you add a tag to bare hashes in prepare(), untagged hashes from a pot file will not be recognized. If you do it in split(), they will.
  • John "proper" does not call split for pot entries when doing --show. Jumbo does.
  • if things are converted in prepare(), valid() only needs to accept the canonical format.
  • If things are converted in split(), valid() needs to accept both/all formats
  • prepare() and valid() may be called before setting a proper input encoding, while split() shouldn't have this problem. See #2252.

Basically, prepare() should be used for fetching stuff from remote fields (like the conversion of pwdump format in NT) or conversion to canonical format (like raw-sha1 accepting both {SHA}Base64 and hex, but always storing them as Base64 to pot file) and split() should be used for most everything else.

Another notable difference is that prepare() is very performance critical while split() is not. That's because prepare() will not only always be called while split() is only called if valid() accepted it (of course, valid() too is performance critical). It gets worse: If you didn't specify a format, every format's prepare() will be called for every single line in your input file. Imagine that when loading 100 million hashes. For example: You shouldn't do a blind strlen in prepare(). Also, if prepare() gets to know a hash will fail valid(), it should return NULL and nothing else - this signals to loader not to even try.