Skip to content
This repository has been archived by the owner on Jul 11, 2019. It is now read-only.

Why is parallel so slow for me? #52

Open
d33tah opened this issue Aug 1, 2017 · 7 comments
Open

Why is parallel so slow for me? #52

d33tah opened this issue Aug 1, 2017 · 7 comments
Milestone

Comments

@d33tah
Copy link

d33tah commented Aug 1, 2017

d33tah@d33tah-pc:/tmp$ cat /tmp/test.sh 
#!/bin/bash

export LC_ALL=C

for i in `seq 3`; do

    yes "banana" | dd count=$(( 10 ** $i )) > /tmp/yes2

    time /usr/bin/parallel                --pipe cat </tmp/yes2 >/dev/null
    time /home/d33tah/.cargo/bin/parallel --pipe cat </tmp/yes2 >/dev/null

    echo

done
d33tah@d33tah-pc:/tmp$ bash /tmp/test.sh 
10+0 records in
10+0 records out
5120 bytes (5.1 kB, 5.0 KiB) copied, 0.000121935 s, 42.0 MB/s

real    0m0.185s
user    0m0.140s
sys     0m0.032s

real    0m0.145s
user    0m0.040s
sys     0m0.240s

100+0 records in
100+0 records out
51200 bytes (51 kB, 50 KiB) copied, 0.000211743 s, 242 MB/s

real    0m0.125s
user    0m0.088s
sys     0m0.024s

real    0m1.348s
user    0m0.344s
sys     0m2.632s

1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 0.00160011 s, 320 MB/s

real    0m0.134s
user    0m0.100s
sys     0m0.016s

real    0m13.676s
user    0m3.236s
sys     0m26.820s
@mmstick
Copy link
Owner

mmstick commented Aug 1, 2017 via email

@d33tah
Copy link
Author

d33tah commented Aug 1, 2017

@mmstick

Looks like that's not the case:

[15:48:09] ➜  /tmp  cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never

Installed via cargo install.

@d33tah
Copy link
Author

d33tah commented Aug 1, 2017

@mmstick any other ideas then?

@mmstick
Copy link
Owner

mmstick commented Aug 2, 2017

I'll have to investigate this when I have time to put on this project. I'm still heavily engaged in Ion Shell development, which takes priority over this. I believe this may have to do with Parallel making a copy of the input file even though the input is already a file. A fix could be to check if the stdin is a file, and then using that directly. I'd need to get some perf profiling done to find the exact cause.

Once Ion is complete, I will be integrating it directly into Parallel, as I'll ensure that Ion can be called as a library. Then there won't be a need to call an external shell to execute commands, and it will be able to use Ion as a scripting language in the same way that GNU Parallel uses Perl. Will be a major performance and feature win, given that Ion is drastically superior to Dash, both in performance and feature set.

Something else you can try though is to compile Parallel with MUSL. It eliminates the shared dependencies on glibc, which has a high cost to short-lived parallel tasks.

rustup component add target x86_64-unknown-linux-musl
cargo build --release --target x86_64-unknown-linux-musl

@amosbird
Copy link

amosbird commented Oct 1, 2017

Hi, just out of curiousity, why would transparent_hugepages set to always hurt parallel's performance?

@mmstick
Copy link
Owner

mmstick commented Oct 1, 2017

@amosbird It's because THP has an issue where it majorly ruins memory-related performance when a binary is using jemalloc. Especially so when that program performs a lot of forks, such as this program, where most of it's time is spent forking. If set to always, it will always enact and aggressively purge caches that are used by jemalloc.

@mmstick
Copy link
Owner

mmstick commented Oct 1, 2017

So I have a new project -- concurr. Still in it's early stages, but it has a service (concurr-jobsd) and associated client for controlling nodes with that service running (concurr). Syntax will be very similar to Parallel, but it won't be drop-in compatible -- taking a different route.

The server is built using Tokio, and executes each command within embedded instances of the Ion shell. The client sends a command template to each configured node (which can contain multiple commands), and then asynchronously submits inputs to execute to each slot on each node, and then reads the results back in the order of submission. So distributed computing capabilities are a big feature with the new solution.

The client is currently very basic though. Syntax is as follows:

concurr 'COMMAND TO EXECUTE {}' : arg1 arg2 arg3 arg4
concurr ' COMMAND {}' :: file1 file2 file3

It doesn't yet support reading from stdin, or permutating inputs, or any of the more advanced optional features of Parallel (only on day 3 of development). I'll be working on that shortly. But it does offer TOML configuration and XDG app dir support. Example config:

# A list of nodes that the client will connect to.
nodes = [
    "127.0.0.1:31514",
    "192.168.1.3:31514",
    "192.168.1.194:31514"
]

# Defines whether the client should request outputs of inputs.
outputs = true

@mmstick mmstick added this to the 0.12 Rewrite milestone Dec 4, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants