Mft2Csv

Introduction

This tool is for parsing, decoding and logging information from the Master File Table ($MFT) to a csv. It is logging a large amount of data and that has been the main purpose from the very start. Having all this data in a csv is convenient for further analysis. It supports getting the $MFT from a variety of sources.

Details

Input can be any of the following:

Raw/dd image of disk (MBR and GPT supported)
Raw/dd image of partition
$MFT extracted file
Reading of $MFT directly from a live system
Reading of $MFT by accessing \.\PhysicalDriveN directly (no mount point needed).
Reading of $MFT directly from within shadow copies.
$MFT fragments extracted from memory dumps or unallocated (see MFTCarver).
Single records extracted. See MftRcrd with the -w switch.

There is an option to choose which UTC region to decode for. For instance you have a disk image and the target system had a timezone configuration of UTC -9.30, then you can configure it like that and get the timestamps directly into UTC 0.00. Default is UTC 0.00, and if running on a live system, there is no need to do anything as timestamps are automatically set to UTC 0.00.

The format of output can be chosen. Currently it is possible to choose from:

All (will write to csv everything it can). Default set.
log2timeline: http://code.google.com/p/log2timeline/wiki/l2t_csv
bodyfile (v3.x): http://wiki.sleuthkit.org/index.php?title=Body_file

It is possible to parse broken/partial $MFT by configuring "Broken $MFT". This setting is necessary if for instance index number 0 is missing (the record for $MFT itself).

Also it is possible to configure the tool to skip fixups. This is something you may want if you are working on memory dumps. If so, you need to run MFTCarver on the memdump first. Then run mft2csv on the output file from MFTCarver. Must have both "Broken $MFT" and "Skip Fixups" configured in such a case.

Explanation

Output names should be quite self explanatory by their name, but anyways here's e few hints;

Those ending with a ON is just to indicate whether the attribute was used or not in the record.
Those ending with 2 or 3 or 4 are indicating they are number 2, 3 or 4 in the row of the same attribute on the same record.
Prefix of FN means $FILE_NAME.
Prefix of SI means $STANDARD_INFORMATION
Prefix of HEADER means the record header.
CTime means File Create Time.
ATime means File Modified Time.
MTime means MFT Entry modified Time.
RTime means File Last Access Time.
USN stands for Update Sequence Number.
LSN stands for $LogFile Sequence Number.
RecordOffset refers to the hex offset inside the $MFT itself, and not on the physical disk. Only meant as a helper when quickly looking up abnormalities found in the csv.
Signature refers to whether the record signature equals 0x46494C45 or not.
IntegrityCheck refers to a record integrity check by comparing the Update Sequence number to the record page (sector) end markers (2 bytes for each sector).
Those starting with GUID belongs to the $OBJECT_ID attribute.
Those starting with INVALID_FILENAME is when illegal characters are found in the filename. For example, sometimes control characters like line feeds etc exist in filename.

Alternate Data Streams

These streams can be identified in the second $DATA attribute. That is under those fields starting with DATA and ending with either 2 or 3. So if a second ADS is attached to a file, it will show up with the details under the third $DATA attribute. As far as I know there do not exist any limit on number of ADS's tied to a file. For that reason, my limit on support for 3 $DATA attributes per file may miss out on those cases with more than 2 ADS's on the same file (not that I have seen this in multiple though). But to fix this I added a variable that counts the number of $DATA present. It is shown in the csv under Alternate_Data_Streams. Note that ADS's can only be attached to the $DATA attribute of a file. It is not possible to create ADS's on any of the other attributes. That may not be surprising as it should be obvious as it's named Alternate Data Stream, instead of Alternate File_Name Stream.

Timestamps

The timestamps are now at the precision of nanoseconds. Prior to processing, there is also an option to set how timestamps will be presented in the csv. That is in which UTC region the target system had. The configured setting is visible in the csv header, as well as in the logfile. There are several options for format of the timestamps as well as the precision. The GUI will display the different examples. The timestamps generated are corrected for any time bias on the investigating machine, which means the timezone configuration on the machine where mft2csv is run, should not affect output. There's also an option to directly adjust timestamps for any known timezone bias for images or $MFT taken from a different system. That means with a disk image from a system configured at UTC -9.30 for instance, can be directly adjusted, so that timestamps are displayed in UTC 0.00.

Note

Resolved paths may not be correct for deleted files/folders. That is because its parent ref (or parents parents ref, etc) may have been overwritten with a new entry.

Usage and examples

For all scenarios, one must:

Select the output format. Default is set to dump as much as possible.
Select the separator, and whether quotation mark is to be surrounding the written values in the csv. Default separator is pipe ("|").
Select which timezone the timestamps should be dumped to. Default is UTC 0.00.
Select if fixups should be skipped or not. Default to off. Usually only applicable in very specific situations.
Select if $MFT should be treated as broken. Default to off. Only applicable when using $MFT file as input. Should be used when fragments of $MFT is used and $MFT is not in the first record.
Select if resident data should be extracted. Default to off. Require to select the path to extract to.
Select format of decoded timestamps, as well as the precision (None, MilliSec, NanoSec). Examples are displayed in GUI. Choose if extra timestamps are to be dumped to a separate csv by ticking off "split csv".
Select if UNICODE or ANSI. Default is ANSI output.

Usage scenarios:

Parsing $MFT on the systemdrive from your running live system. Detected NTFS volumes will be displayed in the second combobox from the top. If new drive is attached or mounted, click "Rescan Mounted Drives". Select target volume in the second combo. Configure the stuff mentioned at the top. Press "Start Processing".
Parsing $MFT off a volume on \.\PhysicalDrive1 which is not mounted. Check the dropdown in the first combo. Normally you should at least have PhysicalDrive0 in there. Optionally rescan for attached drives by clicking "Scan PhysicalDrive". Select wanted drive in first combo. Then click "Test it". Found volumes will be populated in the second combo, and temporarily replace the mounted volumes. When correct one chosen, confiure the rest of the necessary stuff like already explained. Press "Start Processing".
Parsing an image file (of disk or partition). Press "Choose Image" and browse to the target image. Identified volumes will be populated into the combobox at the top. Select the target volume. Configure the stuff mentioned at the top. Press "Start Processing".
Parsing an already extracted $MFT. Press "Choose $MFT" and browse to the target $MFT. Configure the stuff mentioned at the top. Press "Start Processing".
Parse an $MFT reconstructed from a memory dump (or carved from unallocated). Run MFTCarver on the memory dump file, and output a pseudo $MFT file. Press "Choose $MFT" and browse to the file you just created. Configure the stuff mentioned at the top. Configure "Skip Fixups" according to what type of file MftCarver output. Configure "Broken $MFT". Press "Start Processing".
Extract resident data from a memory dump. Do as in example 4. But also configure "Extract Resident". Then press "Set Extract Path" and browse to and select target output path. Press "Start Processing".
Parsing $MFT directly from a shadow copy. Press the button "Scan Shadows", and watch any found shadow copies in the top combo/drop down. Press the button "Test it" to verify it, and found volumes will be populated in the second combo, and temporarily replace the mounted volumes. When correct one chosen, configure the rest of the necessary stuff like already explained. Press "Start Processing".
Parse a single record. Copy the record with a hex editor or use MftRcrd with the -w switch. Press "Choose $MFT" and browse to the file you just created. Configure the stuff mentioned at the top. Configure "Broken $MFT". Press "Start Processing".

Note about extracted resident data

The extracted files must obviously have the nonresident flag set to 00. Regular files and alternate data streams are supported. These can be up to about 700 bytes in size. They are extracted to their original name, but no folder structure is regenerated. To reduce the risk of overwriting files with similar filenames and to help identify where in the $MFT the file was extracted from, the $MFT offset will be prefixed to the filename like for instance: [0x00228000]import.reg

Any deleted files will be prefixed with [DEL] like this: [0x00228000][DEL]import.reg

Alternate data steams will have the name of the data stream prefixed with ADS_ and attached behind the original filename like: [0x00228400][DEL]import.reg[ADS_Zone.Identifier]

ToDo

Move attribute header analysis out as a separate part.
Security attribute.
Group the data in a more sensible way.
Improve the speed.

Thanks and credits

DDan at forensicfocus.
AutoIt forums (KaFu & trancexx) where the starter code was provided; http://www.autoitscript.com/forum/topic/94269-mft-access-reading-parsing-the-master-file-table-on-ntfs-filesystems/
Ascend4nt's AutoIt wintimefunctions; http://sites.google.com/site/ascend4ntscode/
David Kovar's nice analyzeMFT.py python script; http://www.integriography.com/
jennico's extended hex to dec conversion AutoIt udf
The "ntfs-cmd" project; http://code.google.com/p/ntfs-cmd/
llewxam at the AutoIt forums

Provide feedback

Saved searches

Use saved searches to filter your results more quickly