Runner #41

aldanor · 2020-11-28T20:15:33Z

@nbigaouette This is somewhat WIP in that if this is deemed acceptable, then some docs should be added, things may be cleaned up a bit etc, but it's fully functional. Basically I wanted to push this asap to see what you think.

On tiny examples it reduces the total runtime from 15us to 8us. Pretty much 100% of that time is now being spent within ONNX C API.
For big examples, I think this would be handy as well since it allows to avoid both copying the inputs and avoids allocations for the outputs.
There's no more tracing calls on the hot path (since there's no drops).
There's Default currently required for output type. As long as we're sure we'll never use strings, might add a Copy to the element type requirements and then Default can be dropped (so that the output array can be zero-initialized).

Can be used like this:

let mut runner = session.make_runner(input_arrays).with_output::<f32, Ix3>()?;
runner.execute()?; // no allocations to generate outputs, no copying of inputs
dbg!(&runner.outputs()[0]);
*(&mut runner.inputs()[0]) *= 2.0f32; // modify the inputs without copying or allocation
runner.execute()?; // no allocations, no copying
dbg!(&runner.outputs()[0]);

aldanor · 2020-12-26T18:14:51Z

@nbigaouette Any thoughts about this? This has been been open for a month now...

nbigaouette · 2020-12-26T18:36:00Z

Sorry about the delay, personal time is quite short these days and was hopping holidays would make things easier.

I'm taking a look right now.

nbigaouette · 2020-12-26T18:50:24Z

Love the zero copy aspects. Right now if I understand correctly an addition to the API allows this zero-copy. I would like to have a single API instead and thus get rid of the original one. I'll play around a bit with the new one.

nbigaouette · 2020-12-26T18:51:13Z

The zero-copy aspect is issue #11.

aldanor · 2020-12-26T19:07:08Z

Yea, this was kind of the main point. In my use case it's absolutely prohibiting to spend any time (even if it's microseconds) on either cloning arrays, re-validating what's already been validated or doing any ffi calls aside from the actual execute().

I've added it as a separate API so as not to touch the original one but I think it does fully cover the original one at least in the single-threaded case. Note that this is not thread-safe (I think) as it holds pointers to some mutable arrays that you are supposed to fill out. You would probably have to create one runner per thread if you wanted MT (#38)?

Would sure be up for any discussion, anyway :)

+@marshallpierce as well since they've shown interest in the original issue as well.

nbigaouette · 2020-12-26T19:18:09Z

I was about to suggest to merge the runner into the session and get rid of the original API. But I think it makes sense to keep both actually.

The make_runner() takes ownership of the array... Outputs gets allocated and a reference (slice) to it is made available. What I would like to see is to pass a reference to the input. This way the user is free to decide how to handle the allocation (construct new inputs or reuse the buffer). That introduces a lot of lifetime but I guess it should be possible...

nbigaouette

Public items will have to be documented when the API is settled.

nbigaouette · 2020-12-26T19:22:29Z

onnxruntime/src/runner.rs

+        self,
+    ) -> Result<Runner<'s, TIn, DIn, TOut, DOut>> {
+        Runner::new(self.session, self.input_arrays)
+    }


This is meant to be used like this:

let mut runner = session .make_runner(input_tensor_values) .with_output::<f32, Ix4>()?;

Since the model (and thus the session) knows the shape of the output, can this be leveraged? Or would that require const generics (probably...)?

Ok, looking at the following with_output_dyn() answers the comment above, please ignore.

nbigaouette · 2020-12-26T19:24:34Z

onnxruntime/src/runner.rs

+        .collect()
+}
+
+fn compute_output_shapes<TIn, DIn: Dimension>(


I'd prefer if the trait bounds were using where clauses to be consistent with the rest of the crates.

nbigaouette · 2020-12-26T19:30:41Z

Forget my comment about the input reference, I see that it's made available to modification.

nbigaouette · 2020-12-26T19:47:40Z

I don't want to push to the branch but I did modify the 'sample' example. Here's the patch if you don't mind applying it:

From ade58d96d16703edb15ad806b8a37d64475e5d54 Mon Sep 17 00:00:00 2001
From: Nicolas Bigaouette <nbigaouette@gmail.com>
Date: Sat, 26 Dec 2020 14:44:50 -0500
Subject: [PATCH] Adapt 'sample' to runner API

---
 onnxruntime/examples/sample.rs | 35 +++++++++++++++++++++++++++-------
 1 file changed, 28 insertions(+), 7 deletions(-)

diff --git a/onnxruntime/examples/sample.rs b/onnxruntime/examples/sample.rs
index fff0772..3d5ee51 100644
--- a/onnxruntime/examples/sample.rs
+++ b/onnxruntime/examples/sample.rs
@@ -1,9 +1,11 @@
 #![forbid(unsafe_code)]
 
-use ndarray::Array;
+use ndarray::{Array, Ix4, IxDyn};
 
 use onnxruntime::{
-    environment::Environment, tensor::OrtOwnedTensor, GraphOptimizationLevel, LoggingLevel,
+    environment::Environment,
+    runner::{Outputs, Runner},
+    GraphOptimizationLevel, LoggingLevel,
 };
 use tracing::Level;
 use tracing_subscriber::FmtSubscriber;
@@ -62,12 +64,31 @@ fn run() -> Result<(), Error> {
         .unwrap();
     let input_tensor_values = vec![array];
 
-    let outputs: Vec<OrtOwnedTensor<f32, _>> = session.run(input_tensor_values)?;
+    // You can simply run the session with the input to get the output...
+    // let outputs: Vec<OrtOwnedTensor<f32, _>> = session.run(input_tensor_values)?;
 
-    assert_eq!(outputs[0].shape(), output0_shape.as_slice());
-    for i in 0..5 {
-        println!("Score for class [{}] =  {}", i, outputs[0][[0, i, 0, 0]]);
-    }
+    // Or, you can build a runner to pre-allocate the output
+    let mut runner = session
+        .make_runner(input_tensor_values)
+        .with_output::<f32, Ix4>()?;
+    runner.execute()?;
+
+    print_runner_outputs(&runner);
+
+    // Since the runner now owns the input and keep it alive, we can access it
+    // and modify it without reallocation.
+    *(&mut runner.inputs()[0]) *= 2.0f32;
+    runner.execute()?;
+
+    print_runner_outputs(&runner);
 
     Ok(())
 }
+
+fn print_runner_outputs(runner: &Runner<f32, IxDyn, f32, Ix4>) {
+    let outputs: Outputs<f32, Ix4> = runner.outputs();
+    let output = &outputs[0];
+    for i in 0..5 {
+        println!("Score for class [{}] =  {}", i, output[[0, i, 0, 0]]);
+    }
+}
-- 
2.29.2

aldanor · 2020-12-26T19:52:36Z

Yea, inputs are available for modification of course.

Here's the use case that's not covered: you have an &InputArray and you would like to pass it in as is, and without copying, for execution. However, this would require creating tensor wrappers, revalidation of the shapes etc; if you're doing it in a hot loop you would probably want to avoid it, hence the chosen approach.

It would probably be possible to cover this use case as well, if the runner had something like execute_with(...) which allowed overriding inputs for just one execution (with all the involved overhead) if it's really needed.

marshallpierce · 2020-12-28T16:54:56Z

FWIW, this style of API seems like it would be a good fit for my use case: an http service that receives onnx tensor protobuf requests, runs them through onnxruntime via this crate, and emits the output tensor as the response. The model doesn't change in the lifetime of one process, so I would ideally need one Send + Sync runner ever.

nbigaouette · 2021-01-07T15:06:50Z

I have updated the branch with latest changes from master (sorry if I pushed to your fork? not sure how github handled this...)

Only thing remaining is documentation and trivial clippy comments.

aldanor · 2021-01-07T17:10:27Z

@nbigaouette Apologies, was a bit busy and was planning to get back to this on Friday/weekend.

(As a minor note, it's generally a bad idea to merge from master back into feature branches which will then be merged back into master - I'll see if I can try rebasing this branch on master and force pushing)

aldanor · 2021-01-07T17:10:57Z

I can try handling the documentation.

What else do you think should be done, overall?

nbigaouette · 2021-01-07T18:41:38Z

I agree a rebase should be done instead of merging master. GitHub's UI is not always clear as the action's result...

What remains to be done:

Documentation
Trait bounds as "where clause" as commented above
Trivial clippy errors

After that I'd like to merge this and release a new version to crates.io to be experimented more with.

ccbrown · 2021-01-13T23:02:35Z

onnxruntime/src/runner.rs

+                .map(|(jdx, dim)| match dim {
+                    None => input_arrays[idx].shape()[jdx],
+                    Some(d) => *d as usize,
+                })


This makes a bad assumption about the output shape corresponding to the input shape (see #56).

Is there an ORT API that can be used to get the output shape before execution? If not, can we allow users to specify the output shape?

steckes · 2023-03-15T07:08:40Z

No more updates on this branch? I think a lot of people could use a faster version, why did it never got merged?

aldanor added 7 commits November 28, 2020 12:31

Add IDEA files to .gitignore

c8e9651

Reduce tracing level in tensor API to trace

10dedc7

Implement DerefMut for OrtTensor

2f57def

Session::validate_input_shapes() doesn't need &mut

fc3d1a1

Make session_ptr and memory_info pub(crate)

6524450

Add initial working implementation of Runner

4874fc1

Add Session::make_runner()

a20bea6

aldanor force-pushed the feature/runner branch from 0a7da39 to a20bea6 Compare November 28, 2020 20:16

Properly drop name strings in Runner

6a1c591

aldanor mentioned this pull request Dec 2, 2020

An idea - a Runner that owns the input and output arrays and freezes shapes and names #40

Open

nbigaouette reviewed Dec 26, 2020

View reviewed changes

nbigaouette added 3 commits January 7, 2021 14:30

Merge branch 'master' into feature/runner

50c90a4

Fix lifetime errors following merge from master

c717527

Adapt sample example to runner API

6c0d254

nbigaouette mentioned this pull request Jan 9, 2021

Update Dependencies #48

Closed

ccbrown reviewed Jan 13, 2021

View reviewed changes

decahedron1 mentioned this pull request Apr 6, 2023

IO overhaul pykeio/ort#38

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runner #41

Runner #41

aldanor commented Nov 28, 2020 •

edited

aldanor commented Dec 26, 2020

nbigaouette commented Dec 26, 2020

nbigaouette commented Dec 26, 2020

nbigaouette commented Dec 26, 2020

aldanor commented Dec 26, 2020 •

edited

nbigaouette commented Dec 26, 2020

nbigaouette left a comment

nbigaouette Dec 26, 2020

nbigaouette Dec 26, 2020

nbigaouette Dec 26, 2020

nbigaouette commented Dec 26, 2020

nbigaouette commented Dec 26, 2020

aldanor commented Dec 26, 2020

marshallpierce commented Dec 28, 2020

nbigaouette commented Jan 7, 2021

aldanor commented Jan 7, 2021

aldanor commented Jan 7, 2021

nbigaouette commented Jan 7, 2021

ccbrown Jan 13, 2021

steckes commented Mar 15, 2023

Runner #41

Are you sure you want to change the base?

Runner #41

Conversation

aldanor commented Nov 28, 2020 • edited

aldanor commented Dec 26, 2020

nbigaouette commented Dec 26, 2020

nbigaouette commented Dec 26, 2020

nbigaouette commented Dec 26, 2020

aldanor commented Dec 26, 2020 • edited

nbigaouette commented Dec 26, 2020

nbigaouette left a comment

Choose a reason for hiding this comment

nbigaouette Dec 26, 2020

Choose a reason for hiding this comment

nbigaouette Dec 26, 2020

Choose a reason for hiding this comment

nbigaouette Dec 26, 2020

Choose a reason for hiding this comment

nbigaouette commented Dec 26, 2020

nbigaouette commented Dec 26, 2020

aldanor commented Dec 26, 2020

marshallpierce commented Dec 28, 2020

nbigaouette commented Jan 7, 2021

aldanor commented Jan 7, 2021

aldanor commented Jan 7, 2021

nbigaouette commented Jan 7, 2021

ccbrown Jan 13, 2021

Choose a reason for hiding this comment

steckes commented Mar 15, 2023

aldanor commented Nov 28, 2020 •

edited

aldanor commented Dec 26, 2020 •

edited