Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature vm compression #2977

Merged
merged 66 commits into from
May 14, 2024
Merged

Feature vm compression #2977

merged 66 commits into from
May 14, 2024

Conversation

nomennescio
Copy link
Contributor

Ready to merge

Typical MS braindead tool
For uncompressed images, compressed data- and code size are equal to data- and code size fields.
Still needs to copy compressed data into temporary buffer.
No decompression error handling.
@mrjbq7
Copy link
Member

mrjbq7 commented May 10, 2024 via email

@nomennescio
Copy link
Contributor Author

Those times don’t look right. I think your Factor is loading stuff on startup maybe your -rc file. Run factor, then ‘save’ then try the compare again.

Yes, it does a USE: on an editor, to set that variable. I think that's quite typical use, but nothing else.

But I don't understand why you are so focusing on 50 ms for decompressing. I don't think it's relevant at all, as it's a totally reasonable amount of time to decompress into 130 MB, you cannot avoid it if you want decompression, and it's purely optional to begin with, you still have 0 ms added load time if you just use uncompressed images. What's your concern here?

But I add the timing anyway: Freshly build, started, created boot image, booted, save, USE: binary.image.factor.compressor, then compress-current-image

$ time ./factor.com -i=factor.image -e=""

real    0m0.281s
user    0m0.015s
sys     0m0.015s
$ time ./factor.com -i=factor.image.compressed -e=""

real    0m0.330s
user    0m0.000s
sys     0m0.031s

@mrjbq7
Copy link
Member

mrjbq7 commented May 11, 2024 via email

@nomennescio
Copy link
Contributor Author

nomennescio commented May 11, 2024

Well for me, if it was 50 millis it’s a 50% performance reduction. But in your example it’s like 15% which if it held true to my computer is only 20 millis. It matters for some use cases like CGI or short lived command line programs, and I’m already frustrated we take 100 millis to mmap the image file and I want to make it faster, not slower.

I understand your concern, but it's not related to this feature; this is about adding optional compression support, which only has a very limited cost to startup if you use a compressed image.

As for using mmap, I doubt that will work for any typical code, or will work across platforms; once code starts executing, pages will get read from disk, the additional time taken will just be delayed. For me that is however a different discussion from what this patch tries to achieve. If you want to work on reduction in load time, is it not better to open a dedicated issue to that? We can exchange ideas and work on it there.

@nomennescio nomennescio self-assigned this May 11, 2024
@nomennescio
Copy link
Contributor Author

Some more statistics in #2577

@nomennescio
Copy link
Contributor Author

I'll also be doing a compression experiment with a load-all image; if we can practically start using that, then run-time loading of vocabs can be prevented at all.

Got some issues with load-all on Windows, but nevertheless:

-rwxrwxr-x+ 1  129890744 May  8 23:57 factor.image
-rwxrwx---+ 1  571727240 May 10 15:18 factor-all.image
-rwxrwx---+ 1   78380047 May 10 15:28 factor-all.image.compressed

i.e. the compressed image for a load-all, is smaller than a normal uncompressed image. Its compression factor is 7.3.

With maximum compression that goes further down to:

-rwxrwx---+ 1   71276368 May 12 08:15 factor-all.image.compressed.22

a compression factor of 8.0.

@nomennescio
Copy link
Contributor Author

@mrjbq7 do you have any objections if I put this already in the master branch? We can always adapt things later.

@mrjbq7
Copy link
Member

mrjbq7 commented May 13, 2024

Well, a few things need to be addressed I guess if you want to eventually merge it:

  1. Where does zstd.c come from? I don't see it in Facebook's release or source files.
  2. I don't generally like merge commits, and not ones with conflicts because then we have to figure out if code was appropriately de-conflicted...
  3. I don't really like the binary.image.factor.compressor vocab name, maybe tools.image-compressor or something.
  4. I'm not sure I like this approach, and it adds a lot of overhead -- why aren't we just compressing the whole file, instead of sections of it. what benefit does that give us?
  5. what use cases do we really care about file size anyway, and have you looked to see what the bulk of the image files size is taken up by anyway? maybe there are other approaches to reducing file size
  6. does the VM look for the compressed file or does this only work with -i=factor.image.compressed
  7. does this work in a deployed image?
  8. It's a lot slower than uncompressed, we need to speed up factor startup not slow it down
$ time ./factor -i=factor.image -e=""  
./factor -i=factor.image -e=""  0.10s user 0.02s system 71% cpu 0.170 total

$ time ./factor -i=factor.image.z -e=""
./factor -i=factor.image.z -e=""  0.18s user 0.04s system 90% cpu 0.235 total
  1. are there compression formats that would support random access and allow us to uncompress pages as needed instead of the whole image file? perhaps using an index generated at compression time

@nomennescio
Copy link
Contributor Author

nomennescio commented May 13, 2024

Well, a few things need to be addressed I guess if you want to eventually merge it:

  1. Where does zstd.c come from? I don't see it in Facebook's release or source files.

It's in the header of zstd.c: generated with

 *	python combine.py -r ../../lib -x legacy/zstd_legacy.h -o zstd.c zstd-in.c

and 71671c mentions this is zstd release v1.5.6

  1. I don't generally like merge commits, and not ones with conflicts because then we have to figure out if code was appropriately de-conflicted...

Alright, we can cherry-pick

  1. I don't really like the binary.image.factor.compressor vocab name, maybe tools.image-compressor or something.

Not a problem; my thinking was to have all kinds of binary subvocabs dealing with binary formats, such as binary.elf, binary.coff, binary.pe etc., and add support packages for dealing with such blobs. A future idea might be intelligent parsing of all kinds of binary formats.

  1. I'm not sure I like this approach, and it adds a lot of overhead -- why aren't we just compressing the whole file, instead of sections of it. what benefit does that give us?

What overhead are you referring to?

As for sections; the benefit is in no preloading stage, selective compression of only code or data (in case compression doesn't yield benefit), and a possibility to use ZSTD dictionaries tuned for code and for data.

  1. what use cases do we really care about file size anyway, and have you looked to see what the bulk of the image files size is taken up by anyway? maybe there are other approaches to reducing file size

I care. It gives a reduction of 7-8 the image size at negligible cost. Other approaches do not conflict with this approach at all,

  1. does the VM look for the compressed file or does this only work with -i=factor.image.compressed

No, the VM transparently handles this, independent of filename.

  1. does this work in a deployed image?

If it uses the same loader, it should work. If not, it's the same as for a regular image. Such a feature can be added later,

  1. It's a lot slower than uncompressed, we need to speed up factor startup not slow it down
$ time ./factor -i=factor.image -e=""  
./factor -i=factor.image -e=""  0.10s user 0.02s system 71% cpu 0.170 total

$ time ./factor -i=factor.image.z -e=""
./factor -i=factor.image.z -e=""  0.18s user 0.04s system 90% cpu 0.235 total

I disagree. If you care that much about startup speed you can still keep using an uncompressed image. Both are transparently supported at the same time. I'm willing to pay a very small cost of 50ms startup time to get 7-8x size reduction.

  1. are there compression formats that would support random access and allow us to uncompress pages as needed instead of the whole image file? perhaps using an index generated at compression time

ZSTD is the most efficient, and most fast binary compression scheme, and is therefore used by many bootloaders.
I don't think we can improve upon that.

Theoretically an approach you mention is possible, but that can supported separately. I don't think it will bring much.

@nomennescio
Copy link
Contributor Author

I think it's important that we're on the same page here; do you understand that compression/decompression is transparently done in the VM loader, which is determined by inspecting the header of an image file (not its name) and you don't get a performance penalty if you keep on using uncompressed images? And that by default Factor keeps on using uncompressed images? Which are not affected at all by this feature?

@mrjbq7
Copy link
Member

mrjbq7 commented May 13, 2024

Yes, I also understand it's 1) a lot slower that uncompressed 2) not used by anyone by default 3) not tested with deploy 4) not necessarily the best approach to solving startup speed and 5) the solution to #4 might change this a lot

@nomennescio
Copy link
Contributor Author

Yes, I also understand it's 1) a lot slower that uncompressed 2) not used by anyone by default 3) not tested with deploy 4) not necessarily the best approach to solving startup speed and 5) the solution to #4 might change this a lot

Compression and startup speed are totally unrelated issues. I'm not working on startup speed with this issue. This is about compression. Please make a separate issue out of startup speed. Compression is never going to help you with that. And improving startup speed is never going to help with compression.

I don't get it why you keep lumping these together.

As for 2. of course it's not used by anyone until we have such a feature...
As for 3. it should not impact deployment as normal images are still 100% supported

@mrjbq7
Copy link
Member

mrjbq7 commented May 13, 2024

What is the point of implementing this if nothing ever uses it by default, if save doesn't save a compressed image, if deploy doesn't deploy a compressed image, if this is just a side feature with no tests and only complexity it will never be used.

The reason I lump this with startup performance is 1) it makes it slower and 2) it's entirely possible that either whole-file compression or block-based compression is faster and the latter would allow us to lazy uncompress segments, which we don't currently do and would completely change what you have done.

As an aside, I don't like the malloc in load_data_heap, if the tenured space is not initialized large enough, we should error out. The old code path made sure it was initialized large enough, why would it not be correct for the compressed path.

If you have RAM to hold a zlib uncompressed file, you certainly have hard drive space to hold an uncompressed image.

@nomennescio
Copy link
Contributor Author

What is the point of implementing this if nothing ever uses it by default, if save doesn't save a compressed image, if deploy doesn't deploy a compressed image, if this is just a side feature with no tests and only complexity it will never be used.

I don't get you. What do you mean by "if nothing ever uses it by default"? If you want to compress an image, you compress it, and presto, it just works. The reason I introduce the current defaults is to not disturb existing usage. I prefer such an approach; only people who explicitly want it can use it, the others don't notice anything different. Especially since people like you don't want to add 50ms startup time to Factor, which is fine. From there we can work on the other scenarios you describe. Consider it an alpha test with features to be added.

However, this feature is 2 years in the making, and I don't want to postpone it another 2 years before doing all kind of intermediate steps and constantly merging in from master. It's a relatively small change which doesn't harm any future changes.

As for tests, it's not a problem to introduce tests.

The reason I lump this with startup performance is 1) it makes it slower and 2) it's entirely possible that either whole-file compression or block-based compression is faster and the latter would allow us to lazy uncompress segments, which we don't currently do and would completely change what you have done.

It wouldn't change a thing for making things slower, as it's optional. If we come up with a better feature in the future, that's fine by me. But currently we have no compression at all, and this feature is ready to be integrated to give compression. If you think you can make faster and better compression and uncompression, go ahead, but I won't hold my breath for that to happen. This feature won't block that any such improvements at all. To not introduce a feature because in the future we might find a a better solution goes against much of the used philosophy to first get things working, then improve upon it.

As an aside, I don't like the malloc in load_data_heap, if the tenured space is not initialized large enough, we should error out. The old code path made sure it was initialized large enough, why would it not be correct for the compressed path.

That's fine if you want to error out, I can implement it like that. The extra space is needed to load the compressed image and have room for an uncompressed image. That's the way decompression works. The malloc is only a fallback mechanism in case the image doesn't fit in the preallocated space, but typically isn't needed.

And b.t.w, the old code path doesn't make sure it's initialized large enough, it just picks fixed sizes in which Factor images currently just happen to fit. And the exact same old code path is still in place for uncompressed images, it's 100% backwards compatible.

If you have RAM to hold a zlib uncompressed file, you certainly have hard drive space to hold an uncompressed image.

That's certainly untrue, because it depends on personal use cases.

I have limited time to work on Factor, so I don't want this to drag for another 2 years, that's a waste of my time, and I find that very frustrating. It also doesn't exactly stimulate me to keep on contributing, and I'm afraid that has happened to other contributors in the past, who stopped contributing and left the community. I'm open to accommodate to improvements or changes you would like to see for this feature, but I don't think you've given arguments to not introduce this other than basically that you wouldn't use it. I don't think this feature is going to hurt Factor in any way imaginable, nor does it impact any future activity on "better compression/better startup times", but gives a nice feature to people who want to save disk space, of which I'm one.

I'm happy to also discuss things one-on-one to see if we maybe misunderstand each other.

@mrjbq7
Copy link
Member

mrjbq7 commented May 13, 2024

Because if this is only so that you can load compressed images, it's not going to get merged.

It would get merged if:

  1. compressed images are things people want to use
  2. compressed images are supported by the tools
  3. compressed images are saved/loaded appropriately by the tools
  4. compressed images are natively supported on launch (e.g., looking for factor.image.z and if not found then factor.image), or perhaps that doesn't matter since the file embeds compression and a flag, but then save needs to change to save it compressed optionally
  5. not just tests, but literally if this is a one-user feature then its not worth it, but if you paint a path of usefulness sure.

I don't want to slow startup, so then I think I'm unlikely to make compressed images a default, and I also don't want to miss an xz moment someday because now we have some complicated compression thing when it's not being used/tested enough.

I suppose you're saying why not merge it, then we can delete it later if we dislike it, well in that context i might be interested, but you've previously been wanting to be very careful about what public guarantees we make about things like the VM header flags and backwards compatibility, that makes me a lot less interested in trying things and then getting stuck supporting them.

@mrjbq7
Copy link
Member

mrjbq7 commented May 13, 2024

And b.t.w, the old code path doesn't make sure it's initialized large enough, it just picks fixed sizes in which Factor images currently just happen to fit. And the exact same old code path is still in place for uncompressed images, it's 100% backwards compatible.

isn't h->data_size correct for uncompressed size? why wouldn't that buffer be big enough to uncompress into?

@nomennescio
Copy link
Contributor Author

nomennescio commented May 13, 2024

Because if this is only so that you can load compressed images, it's not going to get merged.

It would get merged if:

  1. compressed images are things people want to use

You want to vote on that or did you have something else in mind? Can't remember we ever did that for another feature.

  1. compressed images are supported by the tools

Which tools are you missing?

  1. compressed images are saved/loaded appropriately by the tools

Which tools?

  1. compressed images are natively supported on launch (e.g., looking for factor.image.z and if not found then factor.image), or perhaps that doesn't matter since the file embeds compression and a flag, but then save needs to change to save it compressed optionally

This is already as implemented. The only thing is that save currently does not call compress-image, but that's trivial to add. Just tell me how you like it.

  1. not just tests, but literally if this is a one-user feature then its not worth it, but if you paint a path of usefulness sure.

Isn't this the same as 1.?

I don't want to slow startup, so then I think I'm unlikely to make compressed images a default, and I also don't want to miss an xz moment someday because now we have some complicated compression thing when it's not being used/tested enough.

I'm happy to thoroughly comment and review the source code and explain every nook and cranny.

I suppose you're saying why not merge it, then we can delete it later if we dislike it, well in that context i might be interested, but you've previously been wanting to be very careful about what public guarantees we make about things like the VM header flags and backwards compatibility, that makes me a lot less interested in trying things and then getting stuck supporting them.

I don't see a risk here; if at some point you want to drop VM support, you just remove it; all is needed is a utility that can read, write and uncompress existing compressed images and save it uncompressed, very similar to: binary.image.factor.compressor.

And as we already discussed previously, at some point in time we're going to introduce the next version of image header.
Again, I'm happy to thoroughly document and review the current on image header choices and its implications. I have been very careful about backward and forward compatibility.

@nomennescio
Copy link
Contributor Author

nomennescio commented May 13, 2024

And b.t.w, the old code path doesn't make sure it's initialized large enough, it just picks fixed sizes in which Factor images currently just happen to fit. And the exact same old code path is still in place for uncompressed images, it's 100% backwards compatible.

isn't h->data_size correct for uncompressed size? why wouldn't that buffer be big enough to uncompress into?

  p->tenured_size = std::max((h->data_size * 3) / 2, p->tenured_size);

  data_heap *d = new data_heap(&nursery,
                               p->young_size, p->aging_size, p->tenured_size);

This indeed overallocates the data heap.
However, not so for the code heap

  if (h->code_size > p->code_size)
    fatal_error("Code heap too small to fit image", h->code_size);

@mrjbq7 mrjbq7 merged commit 4b4b0e4 into master May 14, 2024
3 checks passed
@mrjbq7
Copy link
Member

mrjbq7 commented May 14, 2024

I merged it so you don't have to worry about tracking old branches.

I reserve the right to change it later, or move things around.

@mrjbq7
Copy link
Member

mrjbq7 commented May 14, 2024

This indeed overallocates the data heap. However, not so for the code heap

I don't understand why the inflated size would ever be larger than h->code_size and h->data_size.

@nomennescio
Copy link
Contributor Author

I merged it so you don't have to worry about tracking old branches.

I reserve the right to change it later, or move things around.

Thank you. I'm happy to support or discuss changes you like to see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimizing size of factor.image?
2 participants