Home

Welcome to the ffi-experimental wiki!

What is this?

Experimental work on next-gen ffi bindings into the c++ libtorch library in preparation for 0.0.2 which targets the 1.0 backend.

pytorch-project provides prebuild-binaries of c++ libtorch library on official page and debian-package for ubuntu. By using the reliable binaries, we can start running haskell-programs on various environments quickly. (Complation of c++ libtorch for CUDA takes long time.)

The development of pytorch-project is very fast, then the API is changed frequently. So it is difficult to keep maintenance of haskell's API for it manually.

Approach

Support almost all libtorch-API

There is a plan that Declarations.yaml becomes the single, externally visible API. See this issue.

Use generated Declarations.yaml spec instead of header parsing for code generation. Declarations.yaml is located at ffi-experimental/deps/pytorch/build/aten/src/ATen/Declarations.yaml. The file is generated by building libtorch-binary or running deps/get-deps.sh. It supports the funcitons of Native, TH and NN. It does not support the methods of c++'s class. The codes for methods are generated by spec/cppclass/*.yaml.

The dataflow is below.

spec/Declarations.yaml(pytorch) -> codegen(a program of this repo.) -> ffi(ffi bindings of this repo.)
spec/cppclass/*.yaml(this repo.)-|

The method of connecting c++-API of libtorch to haskell

Use inline-c-cpp functionality to bind the C++ API instead of the C API. inline-c-cpp generates c++-codes and haskell-codes at compilation time. To generate the codes, it uses template-haskell.

Technically, symbols of c++-codes are wrapped by extern "C". See How to mix C and C++. The generated haskell-codes use FFI.

Original inline-c-cpp does not support namespace and template of c++. To support namespace and template of c++, we use modified inline-c-cpp. See this PR.

Mapping of data-types

C++ has 2 memory models. One is heap. Another is stack. libtorch functions return stack's object. When the function using the object of local variable returns, the object on stack is deleted,
For example, see below, when test() returns, "Tensor a" on stack is deleted.

void test(){
  at::Tensor a = at::ones({2, 2}, at::kInt);
  at::Tensor b = at::randn({2, 2});
  auto c = a + b.to(at::kInt);
}

So this ffi puts it on the heap using new so that it is not deleted.

at::Tensor* ones_for_haskell(){
  at::Tensor a = at::ones({2, 2}, at::kInt);
  return new at::Tensor(a);
}

Mapping of arguments-type

c-lang's data is passed to function-argument directly. c++'s object is passed to function-argument by using object-pointer.

Mapping of return-type

??? In end of function-call, c-lang's data returns by value. c++'s object returns by object-pointer with new.

Memory Management

Use garbage collection of GHC. Generated ffi-codes have unmanaged codes(ffi-experimental/ffi/src/Aten/Unmanaged/*) and managed codes(ffi-experimental/ffi/src/Aten/Managed/*). Unmanaged codes use 'Ptr'-type which is the same as c/c++'s raw-pointer.

Managed codes use ForeignPtr-type which is managed by GHC.

To convert unmanaged codes to managed codes, c++'s object have to be a instance of CppObject-type-class and managed codes is wrapped by cast of Castable-type-calss. You can see details of cast in ffi-experimental/ffi/src/Aten/Cast.hs.

class CppObject a where
  fromPtr :: Ptr a -> IO (ForeignPtr a)

class Castable a b where
  cast   :: a -> (b -> IO r) -> IO r
  uncast :: b -> (a -> IO r) -> IO r

instance (CppObject a) => Castable (ForeignPtr a) (Ptr a) where
  cast x f = withForeignPtr x f
  uncast x f = fromPtr x >>= f

cast0 :: (Castable a ca) => (IO ca) -> IO a
cast0 f = f >>= \ca -> uncast ca return

cast1 :: (Castable a ca, Castable y cy)
       => (ca -> IO cy) -> a -> IO y
cast1 f a = cast a $ \ca -> f ca >>= \cy -> uncast cy return
...

Support tuple-type of c++

Support operators of c++

Error Handling

When c++-function of libtorch fail, throw exception.

Operations

Development Environment

For now, use stack. (To use cabal-v2, update shell.nix and cabal.project)

CI Environment

CircleCI
Ubuntu18.04
stack
Use pined libtorch-binary

Update generated codes.

# Download libtorch-binary and generate 'Declarations.yaml'
> pushd deps
> ./get-deps.sh 
> popd
# Generate ffi-codes to output-direcotory.
> stack exec codegen-exe
# Check difference and copy the generated codes.
> diff -r output/Aten ffi/src/Aten
> cp -r output/Aten ffi/src/
# Build and test
> stack test ffi

Test

Memory leak

See MemorySpec.hs.

Basic Test of Aten to use this ffi.

See BasicTest.hs.

Files

Notes

Prebuild libtorch uses old ABI of gcc to maintain backwards compatibility. Pass -D_GLIBCXX_USE_CXX11_ABI=0 to gcc.

Issues

Integrate this ffi to hasktorch/hasktorch.

FAQ

What does generated function's suffix mean? e.g. tts of add_tts.
- c++ supports overload. Haskell does not support it. We use the suffix not to conflict the names of function on Haskell.
Is torch::Tensor the same as at::Tensor?
- Yes.
Why not use fficxx?
- fficxx does not support mananaged codes using ForeignPtr.
What are native_functions.yaml and nn.yaml?
- These files is used to generate Declarations.yaml.

References

Policy

Please feel free to update this document and add FAQ.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly