Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

impl Default for Tensor? #822

Open
emchristiansen opened this issue Jul 20, 2023 · 3 comments
Open

impl Default for Tensor? #822

emchristiansen opened this issue Jul 20, 2023 · 3 comments

Comments

@emchristiansen
Copy link

Thanks for working on this!
I once had a project where we regularly worked with 6 dimensional tensors and it was such a pain to keep track of the axes we wrote a separate library to track them for us - something like this would have been great!

Is it possible to impl Default for Tensor in any reasonable way?
E.g. if I only have the type Tensor<S, E, D, T>, can I generate a Tensor of zeros of that type?
So far the only construction method I've seen for Tensors uses a device object.

Why I care: I have crazy nested datastructures that I want to compute gradients through using dfdx.
The datastructures can be parameterized with anything "number like", and for something to be "number like" it has to have an additive identity element (zero), i.e. the default value.

Relatedly, ensuring Tensor<Rank0, _, _, _> impls all the num-like traits would be amazing, as it would make it a drop-in replacement for f32, with the side-effect of getting gradients for free.
This would make me very happy.

@coreylowman
Copy link
Owner

I think this would require thread local device objects (like rand::thread_rng(), but if we had that it would be possible. Imagining something like:

pub fn thread_cpu() -> Cpu { ... }
pub fn thread_cuda(ordinal: usize) -> Cuda { ... }

impl<S: Shape, E: Dtype> Default for Tensor<S, E, Cpu> {
    fn default() -> Self {
         thread_cpu().zeros()
    }
}

impl<S: Shape, E: Dtype> Default for Tensor<S, E, Cuda> {
    fn default() -> Self {
         thread_cuda(0).zeros()
    }
}

I'm unsure how sound these thread local objects are though, would have to think about it. It would be weird to mix the use of the thread local object and a separate object.

@emchristiansen
Copy link
Author

emchristiansen commented Jul 20, 2023

As a workaround, assuming I'm doing everything on a single device (say the CPU for now), could I just define something like this and use it for my device everywhere, assuming I'm careful to remain inside the same system thread*?

pub static DFDX_DEVICE: Lazy<Cpu> = Lazy::new(|| Cpu::default());

fn foo() {
  let weight: Tensor<Rank2<4, 2>, f32, _, NoneTape> =
    DFDX_DEVICE.sample_normal();
  ...
}

But even if that worked, what about the gradient tape?
If T is NoneTape it's pretty clear what to do, but what if T is OwnedTape<..>?
What would the correct default value be in that case?

*Also, is thread locality important for Cpu or just Cuda?

@coreylowman
Copy link
Owner

As a workaround, assuming I'm doing everything on a single device (say the CPU for now), could I just define something like this and use it for my device everywhere, assuming I'm careful to remain inside the same system thread*?

Yeah definitely!

But even if that worked, what about the gradient tape?

Probably just call .traced() after construction - none of the tensor creation methods currently create OwnedTapes, so that would be consistent.

*Also, is thread locality import for Cpu or just Cuda?

If it's just for you you probably don't need to worry about it. The main thing is to minimize the number of different device objects. For CPU its mainly important if you want enable allocation caching (https://docs.rs/dfdx/latest/dfdx/tensor/trait.Cache.html). Same for CUDA, but CUDA also will load in kernels into the object, so you don't want to create different ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants