Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

report thread: WIP asyncio (aka multi-threading) #1334

Open
t-mat opened this issue Dec 29, 2023 · 19 comments
Open

report thread: WIP asyncio (aka multi-threading) #1334

t-mat opened this issue Dec 29, 2023 · 19 comments
Milestone

Comments

@t-mat
Copy link
Contributor

t-mat commented Dec 29, 2023

Let's discuss about / play with asyncio branch.

Since we don't have GitHub "Discussions", I'd like to put some report here.

Please feel free to post your own experiment.

@t-mat
Copy link
Contributor Author

t-mat commented Dec 29, 2023

I've tested new WIP asyncio branch a bit on Windows. Even on WSL2, it works great.

  • Target file : the almost incompressable file ggml-alpaca-7b-q4.bin
  • Target environments
    • WSL2 (ubuntu 22, gcc-11.4, on NTFS (slow file I/O))
    • Windows 11 (Visual Studio 12, cl.exe 19.38.33133)

In this case WSL (gcc, pthread) version with --best works 7 times faster than single threaded MSVC version (95.0 sec -> 13.1 sec).

Task manager's CPU graph reports that WSL version works fine and it utilizes all cores well.

lz4-multi-threaded

It seems --fast on WSL has some issue. But I think it's mainly caused by slow NTFS I/O. I'll check it with WSL's "native" file system later.

Log
# branch

> git branch -v
* asyncio c6762ec [behind 3] fixed meson build
# WSL

$ ./programs/lz4 --version
*** lz4 v1.9.5 64-bits multithread, by Yann Collet ***

$ time ./programs/lz4 -f --fast ./ggml-alpaca-7b-q4.bin
Compressed filename will be : ./ggml-alpaca-7b-q4.bin.lz4
Compressed 4212727017 bytes into 4212583972 bytes ==> 100.00%

real    0m19.549s
user    0m1.357s
sys     0m1.304s

$ time ./programs/lz4 -f --best ./ggml-alpaca-7b-q4.bin
Compressed filename will be : ./ggml-alpaca-7b-q4.bin.lz4
Compressed 4212727017 bytes into 4194322100 bytes ==> 99.56%

real    0m13.084s
user    2m2.647s
sys     0m2.135s
# MSVC (Powershell)

> .\bin\x64_Release\lz4.exe --version
*** lz4 v1.9.5 64-bits single-thread, by Yann Collet ***

> Measure-Command { .\bin\x64_Release\lz4.exe -f --fast ..\..\ggml-alpaca-7b-q4.bin test.lz4 }
Compressed 4212727017 bytes into 4212583972 bytes ==> 100.00%

TotalSeconds      : 9.158449

> Measure-Command { .\bin\x64_Release\lz4.exe -f --best ..\..\ggml-alpaca-7b-q4.bin test.lz4 }
Compressed 4212727017 bytes into 4194322100 bytes ==> 99.56%

TotalSeconds      : 94.95627

@Cyan4973
Copy link
Member

Cyan4973 commented Dec 29, 2023

I plan to make a dedicated backend for thread management on Windows,
but I first need to get this pthread one done correctly.
Slowly getting there ...

And yes, I/O speed can be a problem for this test, so I also tend to test with higher compression levels, as you did, to properly observe cpu utilization. That's also the scenario that's likely going to benefit the most.

@t-mat
Copy link
Contributor Author

t-mat commented Dec 29, 2023

This should be disucussed after the pthread stabilization,
but as for MSVC, how about <threads.h> (C11) ?

They don't fully support it yet. But they recently began to support it.

With preview version of MSVC, above workaround threads.h, this trivial patch and adding /std:c11 compiler option, it seems MSVC version is partially working with <threads.h>

> .\bin\x64_Release\lz4.exe --version
*** lz4 v1.9.5 64-bits multithread, by Yann Collet ***


# lz4.exe --fast doesn't work for some reason (not run on debugger yet)
# EDIT : I test same command with/without debugger, but this error never happened.  
# It seems this error caused by my experiment environment/setup.

> Measure-Command { .\bin\x64_Release\lz4.exe -f --fast ..\..\ggml-alpaca-7b-q4.bin test.lz4 }
Read : 0 MiB   ==> inf%   Error 38 : Write error : cannot write compressed block


# lz4.exe --best works fine

> Measure-Command { .\bin\x64_Release\lz4.exe -f --best ..\..\ggml-alpaca-7b-q4.bin test.lz4 }
Compressed 4212727017 bytes into 4194322100 bytes ==> 99.56%

TotalSeconds      : 10.390827

Definitely we (or they) need some debugging.
But if we can wait their release, <threads.h> seems promising candidate since lz4 CLI always support C/C++ standards.

@Cyan4973
Copy link
Member

This is a possible solution.
I also like the idea of trying out completion ports.
This can be a playground for diverse strategies.

@Cyan4973
Copy link
Member

Cyan4973 commented Dec 29, 2023

Since we don't have GitHub "Discussions", I'd like to put some report here.

In case it would prove useful,
I've enabled the dicussions feature on this repository.

Let me know if you feel that's useful or redundant.

Also, @t-mat, let me know if you are lacking some control / privileges to properly manage it.
You are already part of lz4 organization, but it's never clear to me if you have sufficient rights.

Maybe I should setup an additional dummy account, just to experience right management properly...

@t-mat
Copy link
Contributor Author

t-mat commented Dec 30, 2023

Also, @t-mat, let me know if you are lacking some control / privileges to properly manage it.

Thanks for the update! I've tried to the "convert to discussion" button at the bottom right pane, but no luck. It reports error with this message: "Unable to convert this issue to discussion."

lz4-project-convert-to-discussion-error

I also don't see the "Settings" tab at the top (right) of the lz4 repo UI. It seems we need some more setup on my account, lz4 organization or lz4 repo.

@Cyan4973
Copy link
Member

The MT feature is now merged indev branch

@t-mat
Copy link
Contributor Author

t-mat commented Dec 31, 2023

Almost a note for myself.

We need new versionsTest which includes both single- and multi-threaded lz4 CLI.
It also needs bigger file which must be big enough to test potential issue of multi-threading.

Not sure we need different block size test or not. e.g. Give artificially small blocks with --fast to increase potential race condition.

@Cyan4973 , do you have a good idea / requirement for MT-enable versionsTest ?

@Cyan4973
Copy link
Member

I think versionsTest is focused on ensuring that the output of any version is compatible as an input to any other version.
I see other settings, such as multithreading, --fast or block size, as mildly incidental to this objective.

I'm concerned by the potential complexity of the matrix of settings if we start to actively employ such advanced features,
as their support across versions become conditional. And then there is the question "where to stop" ?

Ensuring that at least one sample is at least larger than jobSize (4 MB) looks good to me, as it does not impact the test logic, and passively test multithreading on most recent dev.

@t-mat
Copy link
Contributor Author

t-mat commented Dec 31, 2023

Sure. So I think versionsTest should be same, keep it simple.

  • versionsTest : Test (single-threaded) compatibility between all released versions and the HEAD.

We will be able to introduce new test which may implicitly rely on versionsTest.

  • new-test : Test between single-thread mode and multi-threaded mode of lz4 CLI.

@t-mat
Copy link
Contributor Author

t-mat commented Dec 31, 2023

Just an flash idea: CLI accepts relative number of threads

Motivation

Although this should be done by external (shell) script, it may be convenient.
Especially in the simple batch invocation for backup, etc.

Possible argument notation

I think the most "intuitive" notation is

--threads=50%

But if special chracter % must be avoided,
we can employ dedicated long command name such as

--threads-in-percentage=50

Code

Actual nbWorkers becomes

nbWorkers = max(arg_threads_in_percentage * UTIL_countCores() / 100, 1);

@Cyan4973
Copy link
Member

Cyan4973 commented Jan 1, 2024

It certainly is technically possible,
but I don't (yet) see why such a capability would be worth maintaining ?

@t-mat
Copy link
Contributor Author

t-mat commented Jan 1, 2024

Possible advantages of --threads-in-percentage=N are

  • Simply it's convenient, since POSIX doesn't have portable command/variable. (Am I right?)
  • Instead of using hidden logic (LZ4IO_defaultNbWorkers()), we can provide explicit default value e.g. --threads-in-percentage=87.
  • Users also define their own value in similar fashion. e.g. --threads-in-percentage=50

@Cyan4973
Copy link
Member

Cyan4973 commented Jan 1, 2024

On first thought, I would expect users interested in explicit control to provide the nb of threads they want as a direct value, such as -T# allows.
What I'm less sure to understand is in which cases someone would prefer to express it as a %?

@Cyan4973
Copy link
Member

Cyan4973 commented Jan 3, 2024

I also don't see the "Settings" tab at the top (right) of the lz4 repo UI. It seems we need some more setup on my account, lz4 organization or lz4 repo.

@t-mat I made some changes to the repository rights, it should give you more controls now.

@t-mat t-mat added this to the v1.9.5 milestone Feb 20, 2024
@t-mat
Copy link
Contributor Author

t-mat commented Feb 20, 2024

@Cyan4973 I just marked this functionality as a part of the next release. But please change it if it's not a part of it.

@Cyan4973
Copy link
Member

@Cyan4973 I just marked this functionality as a part of the next release. But please change it if it's not a part of it.

Yes, it will be

@ottokang
Copy link
Contributor

So after 1.9.5, LZ4 will have multi threaded cli tool?

@Cyan4973
Copy link
Member

yep

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants