Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dex file size increases by ~50% without making changes (DexFileFactory.loadDexFile then DexFileFactory.writeDexFile) #872

Open
marwan-bushara opened this issue Jul 2, 2023 · 7 comments

Comments

@marwan-bushara
Copy link

marwan-bushara commented Jul 2, 2023

I'm using "DexFileFactory" class to read and write dex files.
A simple example:

DexFile dexFile = DexFileFactory.loadDexFile(inputPath, null);
DexFileFactory.writeDexFile(outputPath, dexFile

Just by loading the dex file and rewriting it (without any modifications) the size of the dex files increase by sometimes up to 50%.
What are the reasons for this increase in size and are there any ways to decrease the file’s size when writing it?

@marwan-bushara marwan-bushara changed the title Dex file size increase using DexFileFactory loadDexFile and writeDexFile by almost 50% Dex file size increases by ~50% without making changes (DexFileFactory.loadDexFile then DexFileFactory.writeDexFile) Jul 2, 2023
@CunningLogic
Copy link

post the before/after dex and ill take a look

@marwan-bushara
Copy link
Author

I'm attaching one example here. The size increase is about 40% in this case.
classes.zip

@katzdan
Copy link

katzdan commented Jul 3, 2023

Hi, I'm following this issue as well.
Not sure what you mean by "modern optimizing compiler". Is there a way to reduce the changes in size between the initial dex file and the one written?
Thanks

@CunningLogic
Copy link

@katzdan after I get the brisket going on the smoker, I will post a detailed explanation. I deleted my far too simple reply (im just waking up and was being a bit lazy). I do think there is something else going on here too, but I need to run them through my disassembler to look closer.

but yes, there is a lot of duplicate data in a dex file, and a lot of data is pointed to with pointers/offsets.

For example, if two classes have identical debug sections, both can point to the same data for their debug section.

@CunningLogic
Copy link

@katzdan @marwan-bushara

The after file is 3322156 bytes longer
the debug section accounts for 46276 of those bytes

TLDR: at least a good chunk of the difference is because smali does not employ some of the space saving tricks that some compilers do, by not writing duplicate data, and by pointing multiple references at the same data

@katzdan
Copy link

katzdan commented Jul 11, 2023

Is there any hope to get a reduction in the size of a Dex in the future? as an optimization feature?

@CunningLogic
Copy link

@katzdan there are a number of ways to do that, it would be fairly easy.

Easiest route would be to just write a script to parse the smali files and remove debug information and dead code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants