Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blocking vs async performance #48

Open
skyne98 opened this issue Feb 5, 2024 · 2 comments
Open

Blocking vs async performance #48

skyne98 opened this issue Feb 5, 2024 · 2 comments

Comments

@skyne98
Copy link

skyne98 commented Feb 5, 2024

Hi! Recently did a small test to measure the performance and got very surprising results. (on aurora merged)

Average time per 1000 runs per 34707 verts: 336ms 231us 70ns
Average time per 1000 runs per 1000 verts: 9ms 889us 149ns
Average time per 1000 runs per 34707 verts (async): 217us 79ns
Average time per 1000 runs per 1000 verts (async): 6us 384ns

Can someone explain if maybe I did something wrong (see code below) or how the async version is so much faster? It doesn't look multithreaded, so how?

#![feature(core_intrinsics)]

use std::sync::Arc;

use glam::Vec2;
use humantime::format_duration;
use polyanya::{Mesh, PolyanyaFile};
use tokio::sync::RwLock;

fn run_and_measure<F: FnOnce() -> R, R>(f: F) -> (R, std::time::Duration) {
    let start = std::time::Instant::now();
    let result = f();
    let elapsed = start.elapsed();
    (result, elapsed)
}

#[tokio::main]
async fn main() {
    let mesh: Mesh = PolyanyaFile::from_file("meshes/aurora-merged.mesh").into();
    let mesh_min_x = mesh
        .vertices
        .iter()
        .map(|v| v.coords.x)
        .fold(f32::INFINITY, |a, b| a.min(b));
    let mesh_max_x = mesh
        .vertices
        .iter()
        .map(|v| v.coords.x)
        .fold(f32::NEG_INFINITY, |a, b| a.max(b));
    let mesh_min_y = mesh
        .vertices
        .iter()
        .map(|v| v.coords.y)
        .fold(f32::INFINITY, |a, b| a.min(b));
    let mesh_max_y = mesh
        .vertices
        .iter()
        .map(|v| v.coords.y)
        .fold(f32::NEG_INFINITY, |a, b| a.max(b));
    let width = mesh_max_x - mesh_min_x;
    let height = mesh_max_y - mesh_min_y;

    let runs = 10_000;
    let duration = run_and_measure(|| {
        for _ in 0..runs {
            let start_x = rand::random::<f32>() * width + mesh_min_x;
            let start_y = rand::random::<f32>() * height + mesh_min_y;
            let end_x = rand::random::<f32>() * width + mesh_min_x;
            let end_y = rand::random::<f32>() * height + mesh_min_y;

            std::intrinsics::black_box(
                mesh.path(Vec2::new(start_x, start_y), Vec2::new(end_x, end_y)),
            );
        }
    });

    let duration_per_1000_runs = duration.1 / (runs / 1_000);
    let vertices = mesh.vertices.len();
    println!(
        "Average time per 1000 runs per {} verts: {}",
        vertices,
        format_duration(duration_per_1000_runs).to_string()
    );

    let duration_per_1000_runs_per_1000_verts = duration_per_1000_runs / (vertices as u32 / 1_000);
    println!(
        "Average time per 1000 runs per 1000 verts: {}",
        format_duration(duration_per_1000_runs_per_1000_verts).to_string()
    );

    // Now try the async version
    let runs = 1_000_000;
    let rw_lock_mesh = Arc::new(RwLock::new(mesh));
    let duration = run_and_measure(|| {
        let mut futures = vec![];
        for _ in 0..runs {
            let start_x = rand::random::<f32>() * width + mesh_min_x;
            let start_y = rand::random::<f32>() * height + mesh_min_y;
            let end_x = rand::random::<f32>() * width + mesh_min_x;
            let end_y = rand::random::<f32>() * height + mesh_min_y;

            let rw_lock_mesh = rw_lock_mesh.clone();
            futures.push(async move {
                let mesh = rw_lock_mesh.read().await;
                std::intrinsics::black_box(
                    mesh.get_path(Vec2::new(start_x, start_y), Vec2::new(end_x, end_y))
                        .await,
                );
            });
        }
        futures::future::join_all(futures)
    });

    let duration_per_1000_runs = duration.1 / (runs / 1_000);
    println!(
        "Average time per 1000 runs per {} verts (async): {}",
        vertices,
        format_duration(duration_per_1000_runs).to_string()
    );

    let duration_per_1000_runs_per_1000_verts = duration_per_1000_runs / (vertices as u32 / 1_000);
    println!(
        "Average time per 1000 runs per 1000 verts (async): {}",
        format_duration(duration_per_1000_runs_per_1000_verts).to_string()
    );
}

Thanks for the answer in advance! Very interested in the details :P

@mockersf
Copy link
Member

mockersf commented Feb 5, 2024

I tried in a benchmark using criterion

path sync           time:   [326.31 µs 335.50 µs 344.31 µs]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

path async futures  time:   [327.87 µs 337.28 µs 346.35 µs]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

path async smol     time:   [331.01 µs 362.15 µs 393.84 µs]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

path async tokio    time:   [319.19 µs 328.30 µs 337.53 µs]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe
source for the bench
use criterion::async_executor::{FuturesExecutor, SmolExecutor};
use criterion::{black_box, criterion_group, criterion_main, Criterion};
use glam::Vec2;
use polyanya::{Mesh, PolyanyaFile};
use rand::{rngs::StdRng, Rng, SeedableRng};

fn path_sync_vs_async(c: &mut Criterion) {
    let mesh: Mesh = PolyanyaFile::from_file("meshes/aurora-merged.mesh").into();
    let mesh_min_x = mesh
        .vertices
        .iter()
        .map(|v| v.coords.x)
        .fold(f32::INFINITY, |a, b| a.min(b));
    let mesh_max_x = mesh
        .vertices
        .iter()
        .map(|v| v.coords.x)
        .fold(f32::NEG_INFINITY, |a, b| a.max(b));
    let mesh_min_y = mesh
        .vertices
        .iter()
        .map(|v| v.coords.y)
        .fold(f32::INFINITY, |a, b| a.min(b));
    let mesh_max_y = mesh
        .vertices
        .iter()
        .map(|v| v.coords.y)
        .fold(f32::NEG_INFINITY, |a, b| a.max(b));
    let width = mesh_max_x - mesh_min_x;
    let height = mesh_max_y - mesh_min_y;

    let mut random = StdRng::from_seed([0; 32]);
    c.bench_function(&"path sync".to_string(), |b| {
        b.iter(|| {
            let start_x = random.gen::<f32>() * width + mesh_min_x;
            let start_y = random.gen::<f32>() * height + mesh_min_y;
            let end_x = random.gen::<f32>() * width + mesh_min_x;
            let end_y = random.gen::<f32>() * height + mesh_min_y;

            black_box(mesh.path(Vec2::new(start_x, start_y), Vec2::new(end_x, end_y)))
        })
    });

    // recreate random from the same seed
    let mut random = StdRng::from_seed([0; 32]);
    c.bench_function(&"path async futures".to_string(), |b| {
        b.to_async(FuturesExecutor).iter(|| {
            let start_x = random.gen::<f32>() * width + mesh_min_x;
            let start_y = random.gen::<f32>() * height + mesh_min_y;
            let end_x = random.gen::<f32>() * width + mesh_min_x;
            let end_y = random.gen::<f32>() * height + mesh_min_y;

            black_box(mesh.get_path(Vec2::new(start_x, start_y), Vec2::new(end_x, end_y)))
        })
    });

    // recreate random from the same seed
    let mut random = StdRng::from_seed([0; 32]);
    c.bench_function(&"path async smol".to_string(), |b| {
        b.to_async(SmolExecutor).iter(|| {
            let start_x = random.gen::<f32>() * width + mesh_min_x;
            let start_y = random.gen::<f32>() * height + mesh_min_y;
            let end_x = random.gen::<f32>() * width + mesh_min_x;
            let end_y = random.gen::<f32>() * height + mesh_min_y;

            black_box(mesh.get_path(Vec2::new(start_x, start_y), Vec2::new(end_x, end_y)))
        })
    });

    // recreate random from the same seed
    let mut random = StdRng::from_seed([0; 32]);
    c.bench_function(&"path async tokio".to_string(), |b| {
        let rt = tokio::runtime::Runtime::new().unwrap();
        b.to_async(rt).iter(|| {
            let start_x = random.gen::<f32>() * width + mesh_min_x;
            let start_y = random.gen::<f32>() * height + mesh_min_y;
            let end_x = random.gen::<f32>() * width + mesh_min_x;
            let end_y = random.gen::<f32>() * height + mesh_min_y;

            black_box(mesh.get_path(Vec2::new(start_x, start_y), Vec2::new(end_x, end_y)))
        })
    });
}

criterion_group!(benches, path_sync_vs_async);
criterion_main!(benches);

sync and async with the futures crate seem to have the same kind of performance, a tiny bit slower with smog, and a tiny bit faster with tokio.

I don't have a good explanation, maybe tokio is better at small tasks?

@skyne98
Copy link
Author

skyne98 commented Feb 6, 2024

My bad! I wasn't awaiting the join_all properly in my test! I would consider the issue resolved 👍🏻
One last question I have though: what's the idea behind the async endpoint? Is it better than just quickly creating a task and calculating a path inside? 🤔 I think it's the first time I see an async endpoint in any kind of pathfinding library, so I am curious!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants