Skip to content

blue-sky-r/word-forensic-correlator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

What is W4C

W4C is helper tool for forensic analyst to link MS word document back to specific MS word installation by creating unique fingerprint from internal binary word document fields.

Stay tuned to more updates coming to W4C ms word forensic fingerprinting ...

Name W4C comes from ...

W4C is short name for Word Forensic Correlator -> wor-for-cor -> wor4cor -> w4c.

Forensic Question

W4C is doing its best to help to answer the forensic questions:

  • Were two (or more) MS Word documents last edited/saved on the same MS Word installation instance ?
  • Were these two (or more) MS Word documents created and saved by the same author/company/computer ?

Note: sometimes the final answer 42 is not good enough :-)

How does it work

W4C correlates some internal and not well documented Microsoft Word binary document format structures (like FIB private/undocumented/reserved fields). The final matching result is calculated and shown as correlation percentage. This result could be used in forensic evidence as helper to proof that document(s) under investigation has/have been edited/saved on the same MS-word installation as known validated reference document.

W4C is not using well-known MS Word metadata (like author, dates, version), which can be easily edited and spoofed. Document size/formatting/contents should have no effect on W4C correlation.

Requirements

CLI version: w4c.py ... python 2.x installed

GUI version: w4c-gui.py ... python 2.x, tkinter installed

Note: self-contained py2exe compiled packages have all dependencies packaged inside package. To download windows executable package go to the RELEASES tab.

Files

w4c.py              ... CLI version of Word-Forensic-Correlator
w4c-gui.py          ... GUI version of Word-Forensic-Correlator
wordfile.py         ... module for OLE2
wordfingerprint.py  ... module for fingerprinting
py2exe/             ... directory for py2exe
py2exe/setup.py     ... setup to compile w4c.exe package
py2exe/setup-gui.py ... setup to compile w4c-gui.exe package

How to use CLI version: w4c.py

Model situation: you have one (or more) reference document which was saved/edited on specific MS Word installation under investigation. Now you can use W4C to correlate other questionable documents and verify if they were also saved/edited on this MS Word installation:

$ ./w4c.py -ref reference.doc investigated.doc

Note: in case of problems try:

$ python w4c.py

To get usage help, use -help or execute without any parameters:

    (c) 2015 W4C = MS Word Forensic Correlator [wor-for-cor] version 2.0.1 by robert

    W4C correlates some internal MS-word doc structures to calculate percentage of probability that
    document under test [test.doc] was edited with the same MS-word version as reference doc [ref.doc]

    usage: w4c.py [-help ][-verbosity int ][-fingerprint csv ] -ref ref.doc test.doc

    help            ... show this usage help
    verbosity int   ... optional - set level of verbosity to integer value [default 3]
    fingerprint csv ... optional - use csv fields to calculate fingerprint [see bellow for defaults]
    ref ref.doc     ... reference ms word document
    test.doc        ... documents under test will be correlated to reference one

    Default fingerprint definition:
    product.written.by-language.stamp-created.private-saved.private-created.build-saved.build-stylesheet.len^footref.off

    Fields available for fingerprint CSV:
    signature.magic, created.env, flags.env, fib.magic, fib.ver, product.written.by, language.stamp, autotext.offset,
    flags.doc, fib.min, created.magic, saved.magic, created.private, saved.private, charset.doc, charset.int, 
    saved.build, created.build, key.head.xor, text.offset, stylesheet0.off, stylesheet0.len, stylesheet.off, 
    stylesheet.len, footref.off, footref.len

    Supported fingerprint fields logical operators:
    ^ = xor, | = or, & = and

    Single letters instead of descriptive keyword could be used like:
    -v = -verbosity
    -f = -fingerprint
    -r = -ref

How to use GUI version: w4c-gui.py

Just start GUI version by executing w4c-gui.py. Then select reference and inspected files through BROWSE button. After the file is selected, validation result, MD5 hash and forensic fingerprint are shown. When both files are already selected the final correlation percentage is calculated and shown.

$ ./w4c-gui.py

Note: in case of problems make sure tkinter is installed and themes are configured (ubuntu/kubuntu has broken tkinter themes), see TK-TclError BACKGROUND for more details how to fix broken tkinter themes.

Windows Executable

For your convenience Windows 32 bit executables compiled by py2exe are provided in RELEASES tab. Download the package and unpack it to the working dir.

w4c-exe.zip is CLI version in single executable file w4c.exe
w4c-gui.zip is GUI version with all the dependencies (w4c-gui.exe)

Note: You don't need need to download or install anything else, everything is provided inside the package

CLI version - execute W4C from windows command line interpreter cmd.exe (run -> cmd). Cd to the working directory and execute W4C.EXE

GUI version - execute W4C-GUI by double-click in working directory or create new shortcut for W4C-GUI.EXE.

Pros

W4C by using not well known nternal structures should be more tamper/forgery resistant than any other known forensic tools. However, please read section bellow to understand the limits.

Cons, Limits

As far as it is known, MS Word does not provide any clear unique identification (like serial number) within document itself. W4C is trying its best to get the unique fingerprint from document file but there is still probability for false positives.

False positives - exactly the same MS Word installation (language and service packs levels and settings) might provide matching fingerprint.

False negatives - through the time the installed service packs and settings might change which will result to different fingerprint. Therefore it is recommended to use reference document from time range as close as possible to inspected ones to eliminate this.

History

W4C was originally implemented for my father (R.I.P.) working as certified digital forensic analyst. I have decided to release it to public just recently in 2015 as I was not able to find any similar tool out there.

Hope it helps ...

keywords: microsoft, word, doc, document, ms-word, ole2, digital, forensic, fingerprint, correlate, compare, signature