Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xz decompress very time consuming #15

Open
cdebaixo opened this issue Dec 17, 2020 · 10 comments
Open

xz decompress very time consuming #15

cdebaixo opened this issue Dec 17, 2020 · 10 comments

Comments

@cdebaixo
Copy link

The process of decompressing an xz file using the SWCompression package I have found to be extremely time consuming when running in an iOS environment. For example I have a compressed file with a size of 45,7MB that is taking over 15 minutes to decompress, whereas decompressing this file on a MAC using standard decompression tools takes a matter of seconds. Is there a way to speed up the process within my iOS App?

@tsolomko
Copy link
Owner

Hi,

The most important thing for performance right now is building in Release mode, so please make sure that SWCompression is built in Release mode.

Please let me know if this helps or not, or if you're already building in Release mode.

@cdebaixo
Copy link
Author

Yes this makes a major difference. In Release Mode I am able to decompress within 2 minutes which is quite acceptable. Thanks for your your help.

@cdebaixo
Copy link
Author

I do, however, have a further issue. Because my compressed file is relatively large the amount of storage required for the decompressed data, in this case over 250MB, exceeds my memory resource limits. Is there a way to decompress directly to an output file as opposed to storing in memory?

@cdebaixo
Copy link
Author

One solution I have tried is to read the xz file in 8192 byte chunks and present the chunk to XzArchive.unarchive function but this is returning an error Precondition failed: file BitByteData/ByteReader.swift, line 59.
Is this a viable solution?

@tsolomko
Copy link
Owner

tsolomko commented Dec 21, 2020

Sorry for the long response time: my laptop is in service so I have to resort to using mobile devices.

One solution I have tried is to read the xz file in 8192 byte chunks and present the chunk to XzArchive.unarchive function but this is returning an error Precondition failed: file BitByteData/ByteReader.swift, line 59.
Is this a viable solution?

No, it's not. The XzArchive.unarchive function expects entire archive data to be passed as an argument. Moreover, the XZ format doesn't provide any reliable way to determine the end of compressed data, so it's impossible to say ahead how much data is enough for successful processing. While there is the so called "index" at the end of XZ "stream" which could help in determining the boundaries of compressed data, the XZ file can consist of multiple "streams" so we again can't really know the structure of the file without reading it entirely.

I do, however, have a further issue. Because my compressed file is relatively large the amount of storage required for the decompressed data, in this case over 250MB, exceeds my memory resource limits. Is there a way to decompress directly to an output file as opposed to storing in memory?

I understand and sympathize the issue, and I've actually considered it back when I started this project. The original idea was that if memory is a concern then the data should be read from the file using the Data.ReadingOptions.mappedIfSafe option of the Data.init(contentsOf:options:) initializer. I think, this recommendation was even mentioned in README at one point.

Some time later I abandoned this idea. Firstly, I did some very basic tests and it seemed like this option doesn't change anything in terms of memory consumption. Secondly, around the time when Swift 4.2 was released I pursued some very risky optimizations which relied on Data having a contiguous backing storage. My understanding at the time was that using this option could lead to non-contiguous data due to this option (and initializer) being actually a part of NSData, and this (bridging(?) between NSData and Data) was one of the few cases when Data could have a non-contiguous storage. Anyway, since the second point is no longer relevant, I would recommend you to try this option yourself, maybe it will help you.

In ideal world, I should provide a way to decompress data from files in a streaming manner to help in situations when such files are big enough to load them into memory completely. I haven't had a chance to look into that in details, but maybe Foundation's FileHandle could be helpful here, but I have some concerns about its performance (since it is a class, and classes negatively impact performance in Swift) and how platform-independent it is (I need it to work both on Darwin and Linux).

@cdebaixo
Copy link
Author

Thanks for your response. I will try out your suggested solution. I'll let you know how it goes.

@rudedogg
Copy link

rudedogg commented Apr 18, 2021

Hi, I wanted to chime in that I'm having performance issues with XZ decompression too. My test file is the 0.7.1 macOS release on https://ziglang.org/download/.

In a release build it takes ~9 seconds (on a i7-8700K), and I haven't been patient enough to let a debug build finish decompressing, but have probably waited a few minutes before with my CPU pegged.

I wish I was more familiar with writing low level Swift code like this so I could help look for ways to improve performance. I ran the profiler and nothing obvious jumped out at me as being the issue, though it could just be that I'm not familiar with the codebase. However, these swift_release and swift_retain times do seem high (I don't use the Profiling tools very often, so maybe it's nothing).

image

@tsolomko
Copy link
Owner

Hi @rudedogg,

I am somewhat familiar with these swift_release/retain-related (in some circles I think it is also referred to as "ARC traffic") performance issues, though in a bit different context, which I will discuss at the end.

This performance problem arises from the heavy usage of classes. Basically, each time you access a class and/or its property you incur swift_retain/swift_release call. In this specific case of LZMA/LZMA2 implementation the solution is quite simple: convert classes into structs.

With this change implemented, I observed the following performance improvements using your file on my computer:

before: approx. 11.5 secs
after: approx 7.2 secs

These improvements seem to be somewhat input depended, since in some of my other benchmarks I even noticed 2x speed-up. So thank you for pointing out this problem!

I will release these changes as part of 4.5.9 update, but since the release of Swift 5.4 seems to be imminent I will probably wait until the new Swift is released to check if there is anything broken there. Meanwhile, you can check out these changes in the hotfix-4.5.9 branch.


As a side note, I would like to note that I've encountered these swift_retain/release issues before in my other project, which SWCompression relies on for data reading capabilities. Incidentally, this is a main source of suboptimal performance of SWCompression in general. Unfortunately, in this case the issue cannot be easily resolved by converting classes into structs because the implementation relies a lot on reference semantics of classes. One can definitely do it regardless by sprinkling inout and mutating keywords, as well as ampersands all over the code, but at this point it feels like you're trying to shoehorn structs into being classes.

In case you're wondering why it was implemented this way, I should note that the majority of the code was written in the pre-Swift 4.2 era when there was no (observable) performance difference between structs and classes. With Swift 4.2 came (a more strict version of) the "exclusivity enforcement" feature which causes property access for classes to be noticeably slower than for structs. This is actually why so much time is being spent inside swift_retain/release.

@rudedogg
Copy link

@tsolomko Thank you so much for the explanation, I learned a lot and you gave me some things to think about in my own projects. And I look forward to giving 4.5.9 a spin when it is released!

@tsolomko
Copy link
Owner

tsolomko commented May 1, 2021

4.5.9 has been released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants