Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runner #41

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Runner #41

wants to merge 11 commits into from

Conversation

aldanor
Copy link

@aldanor aldanor commented Nov 28, 2020

@nbigaouette This is somewhat WIP in that if this is deemed acceptable, then some docs should be added, things may be cleaned up a bit etc, but it's fully functional. Basically I wanted to push this asap to see what you think.

  • On tiny examples it reduces the total runtime from 15us to 8us. Pretty much 100% of that time is now being spent within ONNX C API.
  • For big examples, I think this would be handy as well since it allows to avoid both copying the inputs and avoids allocations for the outputs.
  • There's no more tracing calls on the hot path (since there's no drops).
  • There's Default currently required for output type. As long as we're sure we'll never use strings, might add a Copy to the element type requirements and then Default can be dropped (so that the output array can be zero-initialized).

Can be used like this:

let mut runner = session.make_runner(input_arrays).with_output::<f32, Ix3>()?;
runner.execute()?; // no allocations to generate outputs, no copying of inputs
dbg!(&runner.outputs()[0]);
*(&mut runner.inputs()[0]) *= 2.0f32; // modify the inputs without copying or allocation
runner.execute()?; // no allocations, no copying
dbg!(&runner.outputs()[0]);

@aldanor
Copy link
Author

aldanor commented Dec 26, 2020

@nbigaouette Any thoughts about this? This has been been open for a month now...

@nbigaouette
Copy link
Owner

Sorry about the delay, personal time is quite short these days and was hopping holidays would make things easier.

I'm taking a look right now.

@nbigaouette
Copy link
Owner

Love the zero copy aspects. Right now if I understand correctly an addition to the API allows this zero-copy. I would like to have a single API instead and thus get rid of the original one. I'll play around a bit with the new one.

@nbigaouette
Copy link
Owner

The zero-copy aspect is issue #11.

@aldanor
Copy link
Author

aldanor commented Dec 26, 2020

Yea, this was kind of the main point. In my use case it's absolutely prohibiting to spend any time (even if it's microseconds) on either cloning arrays, re-validating what's already been validated or doing any ffi calls aside from the actual execute().

I've added it as a separate API so as not to touch the original one but I think it does fully cover the original one at least in the single-threaded case. Note that this is not thread-safe (I think) as it holds pointers to some mutable arrays that you are supposed to fill out. You would probably have to create one runner per thread if you wanted MT (#38)?

Would sure be up for any discussion, anyway :)

+@marshallpierce as well since they've shown interest in the original issue as well.

@nbigaouette
Copy link
Owner

I was about to suggest to merge the runner into the session and get rid of the original API. But I think it makes sense to keep both actually.

The make_runner() takes ownership of the array... Outputs gets allocated and a reference (slice) to it is made available. What I would like to see is to pass a reference to the input. This way the user is free to decide how to handle the allocation (construct new inputs or reuse the buffer). That introduces a lot of lifetime but I guess it should be possible...

Copy link
Owner

@nbigaouette nbigaouette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Public items will have to be documented when the API is settled.

self,
) -> Result<Runner<'s, TIn, DIn, TOut, DOut>> {
Runner::new(self.session, self.input_arrays)
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is meant to be used like this:

    let mut runner = session
        .make_runner(input_tensor_values)
        .with_output::<f32, Ix4>()?;

Since the model (and thus the session) knows the shape of the output, can this be leveraged? Or would that require const generics (probably...)?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, looking at the following with_output_dyn() answers the comment above, please ignore.

.collect()
}

fn compute_output_shapes<TIn, DIn: Dimension>(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if the trait bounds were using where clauses to be consistent with the rest of the crates.

@nbigaouette
Copy link
Owner

Forget my comment about the input reference, I see that it's made available to modification.

@nbigaouette
Copy link
Owner

I don't want to push to the branch but I did modify the 'sample' example. Here's the patch if you don't mind applying it:

From ade58d96d16703edb15ad806b8a37d64475e5d54 Mon Sep 17 00:00:00 2001
From: Nicolas Bigaouette <nbigaouette@gmail.com>
Date: Sat, 26 Dec 2020 14:44:50 -0500
Subject: [PATCH] Adapt 'sample' to runner API

---
 onnxruntime/examples/sample.rs | 35 +++++++++++++++++++++++++++-------
 1 file changed, 28 insertions(+), 7 deletions(-)

diff --git a/onnxruntime/examples/sample.rs b/onnxruntime/examples/sample.rs
index fff0772..3d5ee51 100644
--- a/onnxruntime/examples/sample.rs
+++ b/onnxruntime/examples/sample.rs
@@ -1,9 +1,11 @@
 #![forbid(unsafe_code)]
 
-use ndarray::Array;
+use ndarray::{Array, Ix4, IxDyn};
 
 use onnxruntime::{
-    environment::Environment, tensor::OrtOwnedTensor, GraphOptimizationLevel, LoggingLevel,
+    environment::Environment,
+    runner::{Outputs, Runner},
+    GraphOptimizationLevel, LoggingLevel,
 };
 use tracing::Level;
 use tracing_subscriber::FmtSubscriber;
@@ -62,12 +64,31 @@ fn run() -> Result<(), Error> {
         .unwrap();
     let input_tensor_values = vec![array];
 
-    let outputs: Vec<OrtOwnedTensor<f32, _>> = session.run(input_tensor_values)?;
+    // You can simply run the session with the input to get the output...
+    // let outputs: Vec<OrtOwnedTensor<f32, _>> = session.run(input_tensor_values)?;
 
-    assert_eq!(outputs[0].shape(), output0_shape.as_slice());
-    for i in 0..5 {
-        println!("Score for class [{}] =  {}", i, outputs[0][[0, i, 0, 0]]);
-    }
+    // Or, you can build a runner to pre-allocate the output
+    let mut runner = session
+        .make_runner(input_tensor_values)
+        .with_output::<f32, Ix4>()?;
+    runner.execute()?;
+
+    print_runner_outputs(&runner);
+
+    // Since the runner now owns the input and keep it alive, we can access it
+    // and modify it without reallocation.
+    *(&mut runner.inputs()[0]) *= 2.0f32;
+    runner.execute()?;
+
+    print_runner_outputs(&runner);
 
     Ok(())
 }
+
+fn print_runner_outputs(runner: &Runner<f32, IxDyn, f32, Ix4>) {
+    let outputs: Outputs<f32, Ix4> = runner.outputs();
+    let output = &outputs[0];
+    for i in 0..5 {
+        println!("Score for class [{}] =  {}", i, output[[0, i, 0, 0]]);
+    }
+}
-- 
2.29.2

@aldanor
Copy link
Author

aldanor commented Dec 26, 2020

Yea, inputs are available for modification of course.

Here's the use case that's not covered: you have an &InputArray and you would like to pass it in as is, and without copying, for execution. However, this would require creating tensor wrappers, revalidation of the shapes etc; if you're doing it in a hot loop you would probably want to avoid it, hence the chosen approach.

It would probably be possible to cover this use case as well, if the runner had something like execute_with(...) which allowed overriding inputs for just one execution (with all the involved overhead) if it's really needed.

@marshallpierce
Copy link
Contributor

FWIW, this style of API seems like it would be a good fit for my use case: an http service that receives onnx tensor protobuf requests, runs them through onnxruntime via this crate, and emits the output tensor as the response. The model doesn't change in the lifetime of one process, so I would ideally need one Send + Sync runner ever.

@nbigaouette
Copy link
Owner

I have updated the branch with latest changes from master (sorry if I pushed to your fork? not sure how github handled this...)

Only thing remaining is documentation and trivial clippy comments.

@aldanor
Copy link
Author

aldanor commented Jan 7, 2021

@nbigaouette Apologies, was a bit busy and was planning to get back to this on Friday/weekend.

(As a minor note, it's generally a bad idea to merge from master back into feature branches which will then be merged back into master - I'll see if I can try rebasing this branch on master and force pushing)

@aldanor
Copy link
Author

aldanor commented Jan 7, 2021

I can try handling the documentation.

What else do you think should be done, overall?

@nbigaouette
Copy link
Owner

I agree a rebase should be done instead of merging master. GitHub's UI is not always clear as the action's result...

What remains to be done:

  • Documentation
  • Trait bounds as "where clause" as commented above
  • Trivial clippy errors

After that I'd like to merge this and release a new version to crates.io to be experimented more with.

@nbigaouette nbigaouette mentioned this pull request Jan 9, 2021
Comment on lines +37 to +40
.map(|(jdx, dim)| match dim {
None => input_arrays[idx].shape()[jdx],
Some(d) => *d as usize,
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes a bad assumption about the output shape corresponding to the input shape (see #56).

Is there an ORT API that can be used to get the output shape before execution? If not, can we allow users to specify the output shape?

@steckes
Copy link

steckes commented Mar 15, 2023

No more updates on this branch? I think a lot of people could use a faster version, why did it never got merged?

@decahedron1 decahedron1 mentioned this pull request Apr 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants