Blocking vs async performance #48

skyne98 · 2024-02-05T14:33:39Z

Hi! Recently did a small test to measure the performance and got very surprising results. (on aurora merged)

Average time per 1000 runs per 34707 verts: 336ms 231us 70ns
Average time per 1000 runs per 1000 verts: 9ms 889us 149ns
Average time per 1000 runs per 34707 verts (async): 217us 79ns
Average time per 1000 runs per 1000 verts (async): 6us 384ns

Can someone explain if maybe I did something wrong (see code below) or how the async version is so much faster? It doesn't look multithreaded, so how?

#![feature(core_intrinsics)]

use std::sync::Arc;

use glam::Vec2;
use humantime::format_duration;
use polyanya::{Mesh, PolyanyaFile};
use tokio::sync::RwLock;

fn run_and_measure<F: FnOnce() -> R, R>(f: F) -> (R, std::time::Duration) {
    let start = std::time::Instant::now();
    let result = f();
    let elapsed = start.elapsed();
    (result, elapsed)
}

#[tokio::main]
async fn main() {
    let mesh: Mesh = PolyanyaFile::from_file("meshes/aurora-merged.mesh").into();
    let mesh_min_x = mesh
        .vertices
        .iter()
        .map(|v| v.coords.x)
        .fold(f32::INFINITY, |a, b| a.min(b));
    let mesh_max_x = mesh
        .vertices
        .iter()
        .map(|v| v.coords.x)
        .fold(f32::NEG_INFINITY, |a, b| a.max(b));
    let mesh_min_y = mesh
        .vertices
        .iter()
        .map(|v| v.coords.y)
        .fold(f32::INFINITY, |a, b| a.min(b));
    let mesh_max_y = mesh
        .vertices
        .iter()
        .map(|v| v.coords.y)
        .fold(f32::NEG_INFINITY, |a, b| a.max(b));
    let width = mesh_max_x - mesh_min_x;
    let height = mesh_max_y - mesh_min_y;

    let runs = 10_000;
    let duration = run_and_measure(|| {
        for _ in 0..runs {
            let start_x = rand::random::<f32>() * width + mesh_min_x;
            let start_y = rand::random::<f32>() * height + mesh_min_y;
            let end_x = rand::random::<f32>() * width + mesh_min_x;
            let end_y = rand::random::<f32>() * height + mesh_min_y;

            std::intrinsics::black_box(
                mesh.path(Vec2::new(start_x, start_y), Vec2::new(end_x, end_y)),
            );
        }
    });

    let duration_per_1000_runs = duration.1 / (runs / 1_000);
    let vertices = mesh.vertices.len();
    println!(
        "Average time per 1000 runs per {} verts: {}",
        vertices,
        format_duration(duration_per_1000_runs).to_string()
    );

    let duration_per_1000_runs_per_1000_verts = duration_per_1000_runs / (vertices as u32 / 1_000);
    println!(
        "Average time per 1000 runs per 1000 verts: {}",
        format_duration(duration_per_1000_runs_per_1000_verts).to_string()
    );

    // Now try the async version
    let runs = 1_000_000;
    let rw_lock_mesh = Arc::new(RwLock::new(mesh));
    let duration = run_and_measure(|| {
        let mut futures = vec![];
        for _ in 0..runs {
            let start_x = rand::random::<f32>() * width + mesh_min_x;
            let start_y = rand::random::<f32>() * height + mesh_min_y;
            let end_x = rand::random::<f32>() * width + mesh_min_x;
            let end_y = rand::random::<f32>() * height + mesh_min_y;

            let rw_lock_mesh = rw_lock_mesh.clone();
            futures.push(async move {
                let mesh = rw_lock_mesh.read().await;
                std::intrinsics::black_box(
                    mesh.get_path(Vec2::new(start_x, start_y), Vec2::new(end_x, end_y))
                        .await,
                );
            });
        }
        futures::future::join_all(futures)
    });

    let duration_per_1000_runs = duration.1 / (runs / 1_000);
    println!(
        "Average time per 1000 runs per {} verts (async): {}",
        vertices,
        format_duration(duration_per_1000_runs).to_string()
    );

    let duration_per_1000_runs_per_1000_verts = duration_per_1000_runs / (vertices as u32 / 1_000);
    println!(
        "Average time per 1000 runs per 1000 verts (async): {}",
        format_duration(duration_per_1000_runs_per_1000_verts).to_string()
    );
}

Thanks for the answer in advance! Very interested in the details :P

The text was updated successfully, but these errors were encountered:

mockersf · 2024-02-05T17:44:04Z

I tried in a benchmark using criterion

path sync           time:   [326.31 µs 335.50 µs 344.31 µs]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

path async futures  time:   [327.87 µs 337.28 µs 346.35 µs]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

path async smol     time:   [331.01 µs 362.15 µs 393.84 µs]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

path async tokio    time:   [319.19 µs 328.30 µs 337.53 µs]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

source for the bench

use criterion::async_executor::{FuturesExecutor, SmolExecutor};
use criterion::{black_box, criterion_group, criterion_main, Criterion};
use glam::Vec2;
use polyanya::{Mesh, PolyanyaFile};
use rand::{rngs::StdRng, Rng, SeedableRng};

fn path_sync_vs_async(c: &mut Criterion) {
    let mesh: Mesh = PolyanyaFile::from_file("meshes/aurora-merged.mesh").into();
    let mesh_min_x = mesh
        .vertices
        .iter()
        .map(|v| v.coords.x)
        .fold(f32::INFINITY, |a, b| a.min(b));
    let mesh_max_x = mesh
        .vertices
        .iter()
        .map(|v| v.coords.x)
        .fold(f32::NEG_INFINITY, |a, b| a.max(b));
    let mesh_min_y = mesh
        .vertices
        .iter()
        .map(|v| v.coords.y)
        .fold(f32::INFINITY, |a, b| a.min(b));
    let mesh_max_y = mesh
        .vertices
        .iter()
        .map(|v| v.coords.y)
        .fold(f32::NEG_INFINITY, |a, b| a.max(b));
    let width = mesh_max_x - mesh_min_x;
    let height = mesh_max_y - mesh_min_y;

    let mut random = StdRng::from_seed([0; 32]);
    c.bench_function(&"path sync".to_string(), |b| {
        b.iter(|| {
            let start_x = random.gen::<f32>() * width + mesh_min_x;
            let start_y = random.gen::<f32>() * height + mesh_min_y;
            let end_x = random.gen::<f32>() * width + mesh_min_x;
            let end_y = random.gen::<f32>() * height + mesh_min_y;

            black_box(mesh.path(Vec2::new(start_x, start_y), Vec2::new(end_x, end_y)))
        })
    });

    // recreate random from the same seed
    let mut random = StdRng::from_seed([0; 32]);
    c.bench_function(&"path async futures".to_string(), |b| {
        b.to_async(FuturesExecutor).iter(|| {
            let start_x = random.gen::<f32>() * width + mesh_min_x;
            let start_y = random.gen::<f32>() * height + mesh_min_y;
            let end_x = random.gen::<f32>() * width + mesh_min_x;
            let end_y = random.gen::<f32>() * height + mesh_min_y;

            black_box(mesh.get_path(Vec2::new(start_x, start_y), Vec2::new(end_x, end_y)))
        })
    });

    // recreate random from the same seed
    let mut random = StdRng::from_seed([0; 32]);
    c.bench_function(&"path async smol".to_string(), |b| {
        b.to_async(SmolExecutor).iter(|| {
            let start_x = random.gen::<f32>() * width + mesh_min_x;
            let start_y = random.gen::<f32>() * height + mesh_min_y;
            let end_x = random.gen::<f32>() * width + mesh_min_x;
            let end_y = random.gen::<f32>() * height + mesh_min_y;

            black_box(mesh.get_path(Vec2::new(start_x, start_y), Vec2::new(end_x, end_y)))
        })
    });

    // recreate random from the same seed
    let mut random = StdRng::from_seed([0; 32]);
    c.bench_function(&"path async tokio".to_string(), |b| {
        let rt = tokio::runtime::Runtime::new().unwrap();
        b.to_async(rt).iter(|| {
            let start_x = random.gen::<f32>() * width + mesh_min_x;
            let start_y = random.gen::<f32>() * height + mesh_min_y;
            let end_x = random.gen::<f32>() * width + mesh_min_x;
            let end_y = random.gen::<f32>() * height + mesh_min_y;

            black_box(mesh.get_path(Vec2::new(start_x, start_y), Vec2::new(end_x, end_y)))
        })
    });
}

criterion_group!(benches, path_sync_vs_async);
criterion_main!(benches);

sync and async with the futures crate seem to have the same kind of performance, a tiny bit slower with smog, and a tiny bit faster with tokio.

I don't have a good explanation, maybe tokio is better at small tasks?

skyne98 · 2024-02-06T15:25:50Z

My bad! I wasn't awaiting the join_all properly in my test! I would consider the issue resolved 👍🏻
One last question I have though: what's the idea behind the async endpoint? Is it better than just quickly creating a task and calculating a path inside? 🤔 I think it's the first time I see an async endpoint in any kind of pathfinding library, so I am curious!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blocking vs async performance #48

Blocking vs async performance #48

skyne98 commented Feb 5, 2024

mockersf commented Feb 5, 2024

skyne98 commented Feb 6, 2024

Blocking vs async performance #48

Blocking vs async performance #48

Comments

skyne98 commented Feb 5, 2024

mockersf commented Feb 5, 2024

skyne98 commented Feb 6, 2024