Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Layers - Several fixes around maximum_address and chunk sizes #1141

Open
wants to merge 9 commits into
base: develop
Choose a base branch
from

Conversation

gcmoreira
Copy link
Contributor

@gcmoreira gcmoreira commented May 6, 2024

maximum_address fixes in several layers

I think there are still other places to fix but I haven't fixed everything as testing it will be massive. I think we should review all occurrences of layer.maximum_address and where we calculate the data size based on that value. There are other places where I suspect it's being used incorrectly. To mention just a few:

  • PdbMSFStream::maximum_address looks like it's also wrongly calculated as len(self._pages) * self._pdb_layer.page_size should be the next byte following the maximum_address.
  • Additionally, there are other places that require double-checking. For instance, everywhere the maximum_address is in a while condition but the value is not included in the address range i.e.: lime, or avml. Technically looks wrong but I think these could still be ok because they are expecting more than 1 byte to read anyway. So it should mean that the layer/file is truncated/bad format/etc.

CommandLine: file_handler_class_factory ->CLIFileHandler

  • Fixed issue when the filename doesn't contain an extension: For instance, if we call the LayerWriter plugin with --output aaa ... it creates a .aaa (hidden filename in linux). Next iterations using the same argument will generate -1.aaa, -2.aaa, etc. insted of aaa, aaa-1, and aaa-2 respectively.
  • Stored the real filename created in file_handle._output_filename. That way we can know the final filename including both the "--output-dir" and the "--output" values and its "-1", "-2", etc (if used). Otherwise, the filename reported will be incorrect (and old version of this file), which could result in inaccurate analysis. Unfortunately, the final filename is determined when the file is closed, which might lead to some controversy regarding its implementation.

LayerWriter plugin. Several fixes

  • Fixed missing --output argument and other bugs, plus some code improvements.

@gcmoreira gcmoreira changed the title Layers - Several fixes around maximum_address and chunck sizes Layers - Several fixes around maximum_address and chunk sizes May 6, 2024
@gcmoreira
Copy link
Contributor Author

I was wrong with e5a5b89, it should be fine. Anyway, I think it is a more common way to express this kind of check and is generally considered more readable.

@gcmoreira
Copy link
Contributor Author

gcmoreira commented May 6, 2024

Tests:

Vmem image

Vol2

$ python2 ../volatility2/vol.py imagecopy -f ../ubuntu2004lts64bit/ubuntu2004lts64bit-Snapshot35.vmem -O ubuntu2004lts64bit-Snapshot35.vmem.raw.vol2
Volatility Foundation Volatility Framework 2.6
Writing data (5.00 MB chunks): |.......................................................................................................|

Vol3

$ python3 ../volatility3/vol.py -f ../ubuntu2004lts64bit/ubuntu2004lts64bit-Snapshot35.vmem LayerWriter --output ubuntu2004lts64bit-Snapshot35.vmem.raw.vol3
Volatility 3 Framework 2.7.0
Progress:  100.00		PDB scanning finished                  
Status
Progress:   99.61		Writing layer primary                  
Layer has been written to /home/user/vol3-imagecopy/ubuntu2004lts64bit-Snapshot35.vmem.raw.vol3

Test

-rw------- 1 user user 536870912 May 15  2023 /media/user/ubuntu2004lts64bit/ubuntu2004lts64bit-Snapshot35.vmem
-rw-rw-r-- 1 user user 536870912 May  6 16:09 ubuntu2004lts64bit-Snapshot35.vmem.raw.vol2
-rw------- 1 user user 536870912 May  6 16:09 ubuntu2004lts64bit-Snapshot35.vmem.raw.vol3

$ sha1sum ubuntu2004lts64bit-Snapshot35.vmem.raw.vol2 ubuntu2004lts64bit-Snapshot35.vmem.raw.vol3
48206033995ab862fdf729ed4bf2a8bd104654ab  ubuntu2004lts64bit-Snapshot35.vmem.raw.vol2
48206033995ab862fdf729ed4bf2a8bd104654ab  ubuntu2004lts64bit-Snapshot35.vmem.raw.vol3

Core image

Vol2

$ python2 ../volatility2/vol.py imagecopy -f ./ram-4.10.0-42.core -O ram-4.10.0-42.core.raw.vol2
Volatility Foundation Volatility Framework 2.6
Writing data (5.00 MB chunks): |...........................................................................................................................................................................................................................................|

Vol3

+ ./vol.py -f ./ram-4.10.0-42.core LayerWriter --output ram-4.10.0-42.core.raw.vol3
Volatility 3 Framework 2.7.0
Progress:  100.00		PDB scanning finished                   
Status
Progress:   99.98		Writing layer primary                   
Layer has been written to /home/user/vol3-imagecopy/ram-4.10.0-42.core.raw.vol3

Test

-rw-r--r-- 1 user user 1208166496 May  6 07:51 ./ram-4.10.0-42.core
-rw-rw-r-- 1 user user 4294967296 May  6 16:10 ram-4.10.0-42.core.raw.vol2
-rw------- 1 user user 4294967296 May  6 16:10 ram-4.10.0-42.core.raw.vol3

$ sha1sum ram-4.10.0-42.core.raw.vol2 ram-4.10.0-42.core.raw.vol3
36a504a0f1e203a6d6cb9bc1a4c0657fc3a47c97  ram-4.10.0-42.core.raw.vol2
36a504a0f1e203a6d6cb9bc1a4c0657fc3a47c97  ram-4.10.0-42.core.raw.vol3

@gcmoreira
Copy link
Contributor Author

@ikelos LayerWriter plugin needs #1142 to be fixed

Copy link
Member

@ikelos ikelos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with most of it, just the way of exposing the finally written filename. I'm happy to change the API if needed, or to use the preferred_filename value which was the original intention if it's not too confusing, but it needs to be one of those two options. Accessing a new uninitialized private variable is a no-no...

@@ -792,8 +790,8 @@ def close(self):
return None

self._file.close()
output_filename = self._get_final_filename()
os.rename(self._name, output_filename)
self._output_filename = self._get_final_filename()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self variables should be defined (and set to a default value, even if it's None) in the object's __init__ method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd almost sooner this sets the preferred filename just before the file is actually closed. I think the intention was that it would act as the filename that had been output, based on

Until this file has been written, this value may not be the final filename the data is written to.
. The comment suggests that this value would be set to the correct filename once the file is closed, could be done on this line instead of creating a new (private) variable. It looks like we never did that, but that was the intention.

@@ -95,15 +100,16 @@ def _generator(self):
if not self.config["layers"]:
self.config["layers"] = []
for name in self.context.layers:
if not self.context.layers[name].metadata.get("mapped", False):
if "mapped" not in self.context.layers[name].metadata:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this was changed, but I don't particularly have a problem with it.

self.config["layers"] = [name]

for name in self.config["layers"]:
# Check the layer exists and validate the output file
if name not in self.context.layers:
yield 0, (f"Layer Name {name} does not exist",)
else:
output_name = self.config.get("output", ".".join([name, "raw"]))
default_output_name = f"{name}.raw"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to do this for clarity, just wondering if there's any other reason for it?

@@ -114,10 +120,9 @@ def _generator(self):
progress_callback=self._progress_callback,
)
file_handle.close()
output_name = file_handle._output_filename
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accessing a private member is bad. If it's supposed to be exposed it should be public.

@@ -38,6 +38,11 @@ def get_requirements(cls) -> List[interfaces.configuration.RequirementInterface]
default=False,
optional=True,
),
requirements.StringRequirement(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also bear in mind, plugins should be UI agnostic. This is a very specific option to get a particular preferred filename chosen for the output. It pretty much only applies to the CLI because a web interface would have you download the output and choose the filename at the time you did that. I really don't want to clutter plugins with cosmetic options that are catering to a single UI because that reinforces that there's only one UI anyone bothers with. Luckily, this does use the FileHandler interface, so other UIs can gather and decide how to deal with files, but just as a reminder, the filename is just a preferred filename.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mh .. just to clarify, I'm not the author of this plugin and it wasn't my idea to add this option. What I did is to fix an option that is missing as the code below is expecting that.

Following the commits, it seems that, for some reason, you removed the LayerWriter plugin option in Plugins: Make layerwiter more configurable, but the option is still expected in the code. Not sure if you forgot to remove that as well or what happened there but it looks like something went wrong there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally understood, sorry if my wording was personalized. I was responding to the idea and how it relates to what we have at the moment, not necessarily you in particular, I apologize. I just started changing my mind as the review went on and by this point I felt quite strongly against the concept I think. I didn't spot that the plugin expected an output.

Having said that, since layerwriter can now accept and write out multiple layers at a time, a single output file wouldn't be right, so I don't think giving the user the choice of output name makes sense at all any more. We should name each layer after the layer's name and write it out that way. The user will then know what the file is finally called from the plugin output (although there's changes needed for that included in this PR I think) so can locate and rename it appropriately. We don't really have any other easily retrievable information to put in the name (such as filename, since the lowest layer might not be a FileLayer and there is no unified way of extracting a source from it).

@ikelos
Copy link
Member

ikelos commented May 16, 2024

In future it might be easier to submit these as separate, smaller PRs, just so that the trivial ones are easy to put through, and don't get held up by any of the others. The maximum_address fix seems really valuable, but is being held up by the output stuff in layerwriter...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants