Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write symbolic links to disk #362

Open
csisl opened this issue Aug 10, 2023 · 6 comments
Open

Write symbolic links to disk #362

csisl opened this issue Aug 10, 2023 · 6 comments
Assignees

Comments

@csisl
Copy link

csisl commented Aug 10, 2023

What is the use case for the feature?

Whenever a symbolic link is encountered, it is not written to disk because it is considered an empty file. This can lead to false negatives of a file being found when unpacking and also does not maintain the integrity of the original compressed package. The user should have the ability to resolve and follow the symbolic link as seen fit

Does the feature contain any proprietary information about another company's intellectual property?

No

How would you implement this feature?

Whenever writing files to disk, if an empty file is encountered, first check to see if it's a symbolic link. If it is, write the link to disk

Are there any (reasonable) alternative approaches?

Are you interested in implementing it yourself?

@csisl
Copy link
Author

csisl commented Aug 10, 2023

resource.get_children() does not return symbolic links

@rbs-jacob rbs-jacob self-assigned this Sep 5, 2023
@rbs-jacob
Copy link
Member

rbs-jacob commented Sep 6, 2023

Hi @csisl! I'd love to help, but could use some additional info.

  • Can you give some more information about your system (OS, processor architecture, Docker/non-Docker environment, etc.)?
  • Can you give more information about the binary that is not unpacking symbolic links?

OFRAK is definitely supposed to handle symbolic links, so I want to make sure we address this, but I'm having trouble replicating this locally.


I made the following test file using these commands on Debian: csisl-test.tar.gz

mkdir -p /tmp/test
cd /tmp/test/
echo "Hello, world!" > hello.txt
touch empty.txt
ln -s hello.txt link.txt
ln -s self-link.txt self-link.txt
ln -s link.txt second-link.txt
ln -s nonexist.txt dead.txt
cd ..
tar -czvf csisl-test.tar.gz test/

When I unpack in the OFRAK GUI, it looks like this. The resource.get_children method must necessarily return symbolic links for them to be displayed in the GUI, so I wonder if they are not being unpacked correctly in whatever binary you are testing.

image

Modifying "world" in hello.txt to "GitHub" and repacking gives this file: csisl-test-2.tar.gz

You can test the process yourself by running this script generated by the GUI.

from ofrak import *
from ofrak.core import *

async def main(ofrak_context: OFRAKContext, root_resource: Optional[Resource] = None):
    if root_resource is None:
        root_resource = await ofrak_context.create_root_resource_from_file(
            "csisl-test.tar.gz"
        )
    await root_resource.unpack()
    genericbinary_0x0 = await root_resource.get_only_child(
        r_filter=ResourceFilter(
            tags={GenericBinary},
            attribute_filters=[
                ResourceAttributeValueFilter(attribute=Data.Offset, value=0)
            ],
        )
    )
    await genericbinary_0x0.unpack()
    folder_test = await genericbinary_0x0.get_only_child(
        r_filter=ResourceFilter(
            tags={Folder},
            attribute_filters=[
                ResourceAttributeValueFilter(
                    attribute=AttributesType[FilesystemEntry].Name, value="test"
                )
            ],
        )
    )
    file_hello_txt = await folder_test.get_only_child(
        r_filter=ResourceFilter(
            tags={File},
            attribute_filters=[
                ResourceAttributeValueFilter(
                    attribute=AttributesType[FilesystemEntry].Name, value="hello.txt"
                )
            ],
        )
    )
    config = StringFindReplaceConfig(
        to_find="world",
        replace_with="GitHub",
        null_terminate=False,
        allow_overflow=True,
    )
    await file_hello_txt.run(StringFindReplaceModifier, config)
    await root_resource.pack_recursively()
    await root_resource.flush_to_disk("csisl-test-2.tar.gz")


if __name__ == "__main__":
    ofrak = OFRAK()
    ofrak.run(main)

Unpacking that repacked file again in OFRAK shows that the symbolic links are still there, so OFRAK can (at least in this case) handle unpacking and repacking symbolic links.

image

I also tried the following small script to flush a symbolic link to disk.

from ofrak import *
from ofrak.core import *

async def main(ofrak_context: OFRAKContext):
    root = await ofrak_context.create_root_resource_from_file("csisl-test.tar.gz")
    await root.unpack()
    tar = await root.get_only_child()
    await tar.unpack()
    folder_test = await tar.get_only_child()

    # Shows symbolic link children
    print(list(await folder_test.get_children()))
    file_dead_txt = await folder_test.get_only_child(
        r_filter=ResourceFilter(
            tags={FilesystemEntry},
            attribute_filters=[
                ResourceAttributeValueFilter(
                    attribute=AttributesType[FilesystemEntry].Name, value="dead.txt"
                )
            ],
        )
    )

    # Successfully writes, but writes an empty file
    await file_dead_txt.flush_to_disk("test_dead.txt")

o = OFRAK()
o.run(main)

This script runs without issue and creates test_dead.txt, but it creates an empty file instead of a symbolic link. Is this the problem you're identifying?

root@ofrak:/tmp/test# stat test_dead.txt
  File: test_dead.txt
  Size: 0               Blocks: 0          IO Block: 4096   regular empty file
Device: 37h/55d Inode: 54802789    Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2023-09-06 17:31:59.317037124 +0000
Modify: 2023-09-06 17:31:59.317037124 +0000
Change: 2023-09-06 17:31:59.317037124 +0000
 Birth: 2023-09-06 17:31:59.317037124 +0000

root@ofrak:/tmp/test# stat second-link.txt
  File: second-link.txt -> link.txt
  Size: 8               Blocks: 0          IO Block: 4096   symbolic link
Device: 37h/55d Inode: 54802762    Links: 1
Access: (0777/lrwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2023-09-06 17:12:27.417313186 +0000
Modify: 2023-09-06 17:12:17.877449957 +0000
Change: 2023-09-06 17:12:17.877449957 +0000
 Birth: 2023-09-06 17:12:17.877449957 +0000

@rbs-jacob
Copy link
Member

Based on my current understanding of the issue you're reporting (that Resource.flush_to_disk for FilesystemEntry-tagged resources doesn't work properly), I've opened PR #373 to address the problem.

Does that fix the issue you are describing?

I still can't replicate resource.get_children not returning symbolic links.

@csisl
Copy link
Author

csisl commented Sep 7, 2023

Hello! Thank you for the response. As for my environment, I am running with the following setup:

  • macos M1 / ARM
  • ofrak 3.2.0
  • python 3.8.10
  • not using docker
  • inside of a virtual env

I followed the steps above that you used to create a directory with one text file and then several symbolic links.

mkdir -p /tmp/test
cd /tmp/test/
echo "Hello, world!" > hello.txt
touch empty.txt
ln -s hello.txt link.txt
ln -s self-link.txt self-link.txt
ln -s link.txt second-link.txt
ln -s nonexist.txt dead.txt
cd ..
tar -czvf csisl-test.tar.gz test/

Whenever I do this and run ofrak on the command line, the only file that is preserved is the hello.txt file. At the end of each line for the empty files it says (not written).

% ofrak unpack test-issue.tar.gz -r
[   ofrak_cli.py:  173] No disassembler backend specified, so no disassembly will be possible                                                
Unpacking file: test-issue.tar.gz                                                                                                            
                                                                                                                                             
Extracting data to test-issue.tar.gz_extracted_20230907073332                                                                                
┌test-issue.tar.gz: [attributes=(AttributesType[FilesystemEntry], Magic), size=290 bytes, extracted-path=test-issue.tar.gz_extracted_20230907
073332/test-issue.tar.gz]                                                                                                                    
└───┬TarArchive: [attributes=(Data, Magic), size=5120 bytes, extracted-path=test-issue.tar.gz_extracted_20230907073332/test-issue.tar.gz.ofra
k_children/TarArchive]                                                                                                                       
    └───┬test-issue: [attributes=(Data, AttributesType[FilesystemEntry]), size=0 bytes, (not written)]                                       
        ├────test-issue/dead.txt: [attributes=(Data, AttributesType[FilesystemEntry], AttributesType[SymbolicLink]), size=0 bytes, (not written)]                                                                                                                                         
        ├────empty.txt: [attributes=(Data, AttributesType[FilesystemEntry], Magic), size=0 bytes, (not written)]                             
        ├────hello.txt: [attributes=(Data, AttributesType[FilesystemEntry], Magic), size=12 bytes, extracted-path=test-issue.tar.gz_extracted
_20230907073332/test-issue.tar.gz.ofrak_children/TarArchive.ofrak_children/test-issue.ofrak_children/hello.txt]                              
        ├────test-issue/link.txt: [attributes=(Data, AttributesType[FilesystemEntry], AttributesType[SymbolicLink]), size=0 bytes, (not written)]                                                                                                                                         
        ├────test-issue/second-link.txt: [attributes=(Data, AttributesType[FilesystemEntry], AttributesType[SymbolicLink]), size=0 bytes, (not written)]                                                                                                                                  
        └────test-issue/self-link.txt: [attributes=(Data, AttributesType[FilesystemEntry], AttributesType[SymbolicLink]), size=0 bytes, (not written)]                                                                                                                                    
                                                                                                                                             
It took 0.043 seconds to run the OFRAK script    

A file listing:

ls -la test-issue.tar.gz_extracted_20230907073332/test-issue.tar.gz.ofrak_children/TarArchive.ofrak_children/test-issue.ofrak_children
total 8
drwxr-xr-x  3  wheel  96 Sep  7 07:33 .
drwxr-xr-x  3  wheel  96 Sep  7 07:33 ..
-rw-r--r--  1  wheel  12 Sep  7 07:33 hello.txt

This is the behavior I'm seeing whenever I run resource.unpack() from a script as well, which is where my main issue lies.


As for the get_children() call.. This is what I'm seeing:

await resource.unpack()
children = await resource.get_children()
for child in children:
    await child.identify()
    caption = child.get_caption()
    print(caption)

While I can see the symbolic links in the GUI

image

I cannot see the symbolic links that point to files whenever I iterate over the children:

image

What's interesting is I haven't tried with a dead link until following your instructions above. As of now, it is only recognizing the dead link, not the ones that point to the file hello.txt.

@rbs-jacob
Copy link
Member

rbs-jacob commented Sep 12, 2023

Currently, we've merged #373, so it should be possible to dump symbolic links via the following (assuming you've installed OFRAK from the master branch of the source repo, instead of pip).

if resource.has_tag(FilesystemEntry):
    entry = await resource.view_as(FilesystemEntry)
    await entry.flush_to_disk()

Working on a separate PR to make the ofrak unpack CLI command use this method instead of the (now renamed) resource.flush_data_to_disk method.

@whyitfor
Copy link
Contributor

@rbs-jacob, can this issue be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants