-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature vm compression #2977
Feature vm compression #2977
Conversation
…ranch Similar to what build.sh does.
For uncompressed images, compressed data- and code size are equal to data- and code size fields.
Still needs to copy compressed data into temporary buffer. No decompression error handling.
Those times don’t look right. I think your Factor is loading stuff on startup maybe your -rc file. Run factor, then ‘save’ then try the compare again. On May 10, 2024, at 2:34 PM, nomennescio ***@***.***> wrote:
time ./factor -e=""
Right, thanks for the feedback; I ran a single time on my Windows machine
real 0m0.982s
user 0m0.000s
sys 0m0.015s
/c/dev/project/build/factor [30104]
$ time ./factor.com -i=factor.image -e=""
real 0m0.934s
user 0m0.000s
sys 0m0.000s
so about 50 ms on top of 930 ms. Can't test the Linux times.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because your review was requested.Message ID: ***@***.***>
|
Yes, it does a But I don't understand why you are so focusing on 50 ms for decompressing. I don't think it's relevant at all, as it's a totally reasonable amount of time to decompress into 130 MB, you cannot avoid it if you want decompression, and it's purely optional to begin with, you still have 0 ms added load time if you just use uncompressed images. What's your concern here? But I add the timing anyway: Freshly build, started, created boot image, booted,
|
Well for me, if it was 50 millis it’s a 50% performance reduction. But in
your example it’s like 15% which if it held true to my computer is only 20
millis.
It matters for some use cases like CGI or short lived command line
programs, and I’m already frustrated we take 100 millis to mmap the image
file and I want to make it faster, not slower.
…On Fri, May 10, 2024 at 7:16 PM nomennescio ***@***.***> wrote:
Those times don’t look right. I think your Factor is loading stuff on
startup maybe your -rc file. Run factor, then ‘save’ then try the compare
again.
Yes, it does a USE: on an editor, to set that variable. I think that's
quite typical use, but nothing else.
But I don't understand why you are so focusing on 50 ms for decompressing.
I don't think it's relevant at all, as it's a totally reasonable amount of
time to decompress into 130 MB, you cannot avoid it if you want
decompression, and it's purely optional to begin with, you still have 0 ms
added load time if you just use uncompressed images. What's your concern
here?
But I add the timing anyway: Freshly build, started, created boot image,
booted, save, USE: binary.image.factor.compressor, then
compress-current-image
$ time ./factor.com -i=factor.image -e=""
real 0m0.281s
user 0m0.015s
sys 0m0.015s
$ time ./factor.com -i=factor.image.compressed -e=""
real 0m0.330s
user 0m0.000s
sys 0m0.031s
—
Reply to this email directly, view it on GitHub
<#2977 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAF5AZNEIZ4TT6OL7HF7PLZBV5PZAVCNFSM6AAAAABHM4P6RCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBVGQ2TONBUGI>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
I understand your concern, but it's not related to this feature; this is about adding optional compression support, which only has a very limited cost to startup if you use a compressed image. As for using |
Some more statistics in #2577 |
With maximum compression that goes further down to:
a compression factor of 8.0. |
@mrjbq7 do you have any objections if I put this already in the master branch? We can always adapt things later. |
Well, a few things need to be addressed I guess if you want to eventually merge it:
|
It's in the header of
and 71671c mentions this is zstd release v1.5.6
Alright, we can cherry-pick
Not a problem; my thinking was to have all kinds of binary subvocabs dealing with binary formats, such as
What overhead are you referring to? As for sections; the benefit is in no preloading stage, selective compression of only code or data (in case compression doesn't yield benefit), and a possibility to use ZSTD dictionaries tuned for code and for data.
I care. It gives a reduction of 7-8 the image size at negligible cost. Other approaches do not conflict with this approach at all,
No, the VM transparently handles this, independent of filename.
If it uses the same loader, it should work. If not, it's the same as for a regular image. Such a feature can be added later,
I disagree. If you care that much about startup speed you can still keep using an uncompressed image. Both are transparently supported at the same time. I'm willing to pay a very small cost of 50ms startup time to get 7-8x size reduction.
ZSTD is the most efficient, and most fast binary compression scheme, and is therefore used by many bootloaders. Theoretically an approach you mention is possible, but that can supported separately. I don't think it will bring much. |
I think it's important that we're on the same page here; do you understand that compression/decompression is transparently done in the VM loader, which is determined by inspecting the header of an image file (not its name) and you don't get a performance penalty if you keep on using uncompressed images? And that by default Factor keeps on using uncompressed images? Which are not affected at all by this feature? |
Yes, I also understand it's 1) a lot slower that uncompressed 2) not used by anyone by default 3) not tested with deploy 4) not necessarily the best approach to solving startup speed and 5) the solution to #4 might change this a lot |
Compression and startup speed are totally unrelated issues. I'm not working on startup speed with this issue. This is about compression. Please make a separate issue out of startup speed. Compression is never going to help you with that. And improving startup speed is never going to help with compression. I don't get it why you keep lumping these together. As for 2. of course it's not used by anyone until we have such a feature... |
What is the point of implementing this if nothing ever uses it by default, if The reason I lump this with startup performance is 1) it makes it slower and 2) it's entirely possible that either whole-file compression or block-based compression is faster and the latter would allow us to lazy uncompress segments, which we don't currently do and would completely change what you have done. As an aside, I don't like the malloc in load_data_heap, if the tenured space is not initialized large enough, we should error out. The old code path made sure it was initialized large enough, why would it not be correct for the compressed path. If you have RAM to hold a zlib uncompressed file, you certainly have hard drive space to hold an uncompressed image. |
I don't get you. What do you mean by "if nothing ever uses it by default"? If you want to compress an image, you compress it, and presto, it just works. The reason I introduce the current defaults is to not disturb existing usage. I prefer such an approach; only people who explicitly want it can use it, the others don't notice anything different. Especially since people like you don't want to add 50ms startup time to Factor, which is fine. From there we can work on the other scenarios you describe. Consider it an alpha test with features to be added. However, this feature is 2 years in the making, and I don't want to postpone it another 2 years before doing all kind of intermediate steps and constantly merging in from master. It's a relatively small change which doesn't harm any future changes. As for tests, it's not a problem to introduce tests.
It wouldn't change a thing for making things slower, as it's optional. If we come up with a better feature in the future, that's fine by me. But currently we have no compression at all, and this feature is ready to be integrated to give compression. If you think you can make faster and better compression and uncompression, go ahead, but I won't hold my breath for that to happen. This feature won't block that any such improvements at all. To not introduce a feature because in the future we might find a a better solution goes against much of the used philosophy to first get things working, then improve upon it.
That's fine if you want to error out, I can implement it like that. The extra space is needed to load the compressed image and have room for an uncompressed image. That's the way decompression works. The malloc is only a fallback mechanism in case the image doesn't fit in the preallocated space, but typically isn't needed. And b.t.w, the old code path doesn't make sure it's initialized large enough, it just picks fixed sizes in which Factor images currently just happen to fit. And the exact same old code path is still in place for uncompressed images, it's 100% backwards compatible.
That's certainly untrue, because it depends on personal use cases. I have limited time to work on Factor, so I don't want this to drag for another 2 years, that's a waste of my time, and I find that very frustrating. It also doesn't exactly stimulate me to keep on contributing, and I'm afraid that has happened to other contributors in the past, who stopped contributing and left the community. I'm open to accommodate to improvements or changes you would like to see for this feature, but I don't think you've given arguments to not introduce this other than basically that you wouldn't use it. I don't think this feature is going to hurt Factor in any way imaginable, nor does it impact any future activity on "better compression/better startup times", but gives a nice feature to people who want to save disk space, of which I'm one. I'm happy to also discuss things one-on-one to see if we maybe misunderstand each other. |
Because if this is only so that you can load compressed images, it's not going to get merged. It would get merged if:
I don't want to slow startup, so then I think I'm unlikely to make compressed images a default, and I also don't want to miss an I suppose you're saying why not merge it, then we can delete it later if we dislike it, well in that context i might be interested, but you've previously been wanting to be very careful about what public guarantees we make about things like the VM header flags and backwards compatibility, that makes me a lot less interested in trying things and then getting stuck supporting them. |
isn't |
You want to vote on that or did you have something else in mind? Can't remember we ever did that for another feature.
Which tools are you missing?
Which tools?
This is already as implemented. The only thing is that save currently does not call compress-image, but that's trivial to add. Just tell me how you like it.
Isn't this the same as 1.?
I'm happy to thoroughly comment and review the source code and explain every nook and cranny.
I don't see a risk here; if at some point you want to drop VM support, you just remove it; all is needed is a utility that can read, write and uncompress existing compressed images and save it uncompressed, very similar to: And as we already discussed previously, at some point in time we're going to introduce the next version of image header. |
This indeed overallocates the data heap.
|
I merged it so you don't have to worry about tracking old branches. I reserve the right to change it later, or move things around. |
I don't understand why the inflated size would ever be larger than |
Thank you. I'm happy to support or discuss changes you like to see. |
Ready to merge