OpenCL support #22

outlace · 2015-11-09T17:41:26Z

I understand TensorFlow only supports CUDA. What would need to be done to add in OpenCL support?

nmabhinandan · 2015-11-09T18:21:24Z

It's strange that Google ditched open OpenCL for proprietary CUDA.

ebrevdo · 2015-11-09T18:24:40Z

At the very least, the Eigen library would have to support OpenCL.

bhack · 2015-11-09T20:54:37Z

👍

jamesliu96 · 2015-11-10T02:52:59Z

👍

alexatknit · 2015-11-10T23:53:16Z

👍

dhess · 2015-11-11T05:01:46Z

thumbs up and all that.

gujunli · 2015-11-11T07:45:42Z

I will be interested in expanding Tensor Flow with OpenCL. As we have already released OpenCL caffe. https://github.com/amd/OpenCL-caffe. Hopefully it can get integrated in light way? Is anyone interested in working together on this?

bhack · 2015-11-11T07:50:27Z

@gujunli Nice to see AMD here. /cc @naibaf7 @lunochod

nmabhinandan · 2015-11-11T08:20:57Z

would be great.

sasadep · 2015-11-11T19:31:06Z

👍

bhack · 2015-11-15T15:00:05Z

/cc @lukeiwanski for Eigen/OpenCL/SYCL

ankdesh · 2015-11-16T13:40:01Z

@gujunli Certainly would be interested in contributing. Please let me know when you plan to start.

lukeiwanski · 2015-11-25T11:05:13Z

Hi all,

Here at Codeplay we are looking into Eigen's tensor running on GPU using SYCL (a modern C++ layer on top of OpenCL). From what we have gathered so far, GPU tensor design is very closely coupled with CUDA and it will require interface changes for another programming model and particularly a SYCL and OpenCL 1.2 version.

If anyone is interested in digging deeper / helping out, we are most certainly interested in contributing.

Thanks,
Luke

bhack · 2015-11-25T11:19:50Z

@lukeiwanski Thank you for the feedback. I think that @benoitsteiner worked at the tensor extension part of eigen.

jszuppe · 2015-12-06T12:05:33Z

👍 I can help code some OpenCL/SYCL if someone makes a plan, divides work into tasks etc. I recommend using Boost.Compute as a wrapper for OpenCL (it makes running kernels, testing, templating easier).

ieee8023 · 2015-12-07T16:43:05Z

+1

armish · 2015-12-07T20:36:54Z

👍

lukeiwanski · 2015-12-08T18:19:13Z

Hi all,

Just to keep you posted, we are still investigating how we can change the Eigen interface to better fit the SYCL/OpenCL 1.2 programming model.
Once we come up with a reasonable approach that targets heterogeneous programming models ( not only OpenCL / SYCL ) we will create a proposal.

Thanks,
Luke

gujunli · 2015-12-08T19:30:09Z

Pls keep me update. I developed opencl-caffe for AMD. I am also looking at
tensor flow.

Thanks.
Junlu
On Dec 8, 2015 10:19 AM, "Luke Iwanski" notifications@github.com wrote:

Hi all,

Just to keep you posted, we are still investigating how we can change the
Eigen interface to better fit the SYCL/OpenCL 1.2 programming model.
Once we come up with a reasonable approach we will create a proposal.

Thanks,
Luke

—
Reply to this email directly or view it on GitHub
#22 (comment)
.

bhack · 2015-12-09T08:41:10Z

/cc @ptillet @gongzg Is there any interest in this by Intel? I really hope that we don't fragment OPENCL here like in Caffe where we have an AMD fork, Intel unmerged PRs, another semi-unofficial AMD PR, and a long staging user PR (plus two old abandoned Opencl efforts). If somebody is interested in the history can take a look at BVLC/caffe#2610 comments.

gongzg · 2015-12-17T09:57:28Z

@bhack We do have interest in this. Thanks for letting me know. If there is a proposal for Eigen's OpenCL/SYCL implementation, we will see what we can do from Intel side.

ZirconCode · 2015-12-23T14:57:34Z

👍

bhack · 2016-01-01T19:36:35Z

An interesting initiative at https://github.com/ptillet/isaac also if here we rely on Eigen tensor extension.

DanMcLaughlin · 2016-01-19T15:42:59Z

I also would like to contribute. @benoitsteiner can you organize it?

FelixSchwarz · 2018-08-27T10:34:22Z

I think this thread is mostly meaningless for developers (too much noise - and I'll add some more ;-) but I think many comments are missing the point:
If you want to run Tensorflow with AMD cards OpenCL IS NOT what you are looking for - please head over to https://github.com/ROCmSoftwarePlatform/ and install the ROCm stack. AFAIK AMD's current strategy is based on ROCm instead of OpenCL for Tensorflow/pytorch.

Generic OpenCL was too much maintenance/did not give enough performance benefits to be worthwhile for AMD. Therefore this ticket is only interesting if you are running (e.g.) an ARM platform which uses OpenCL only.

(Disclaimer: just an outsider, no real inside into Tensorflow development so maybe the information above completely wrong and misleading. Feel free to bash me if you know better.)

ghost · 2018-08-27T17:15:19Z

Just a thought, what about llvm with the new GPU offload? That would put a great level of abstraction between tensorflow and cuda specific code.

mirh · 2018-08-27T18:16:49Z

What about all of you reading just 10 posts above and noticing there already is a fork by lukeiwanski/codeplaysoftware you can try ?
(also my hats off to xiaomi for, once, contributing some serious kind of open source effort)

fantesykikachu · 2018-09-04T02:12:59Z

@FelixSchwarz Just so you are aware ROCm uses OpenCL, it is AMD's userspace OpenCL driver on Linux (that is why is why it doesn't support windows), so if you are not aware of how AMD's driver ecosystem on linux works, they have their kernel side drivers AMDGPU and AMDKFD(which is now getting merged into AMDGPU) then there is the userspace drivers RadeonSI(for OpenGL) RadV/AMDVLK(for Vulkan) and ROCm(for OpenCL).

XVilka · 2018-09-15T06:40:34Z

Judging by the dynamics of this bug and other forks Google has zero interest in this and will never implement this in the official repository. I would vote for closing this issue (or locking it) at all to not give any false hopes for everyone.

klokik · 2018-09-15T06:48:33Z

The issue should be here to at least point here all the folks who will inevitably open it again.

…

On Sat, Sep 15, 2018, 09:45 Anton Kochkov ***@***.***> wrote: Judging by the dynamics of this bug and other forks Google has zero interest in this and will *never* implement this in the official repository. I would vote for closing this issue (or locking it) at all to not give any false hopes for everyone. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#22 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB1qNyDrfbiQ4h3kQyqObEfpK3O0FqRGks5ubKIBgaJpZM4Gex3i> .

tamusjroyce · 2018-11-28T16:33:45Z

There is a TensorRT that supports Movidius Pi Hat. And that Movidius Pi Hat is Google’s $45 “AIY Vision Kit”. Google links to Target to buy it.

This doesn't have any ties to CUDA or Nvidia? Says it uses an Intel chip. At its heart, maybe the chip is a FPGA? Anyone know anything more about it?

znmeb · 2018-11-28T19:49:15Z

I know quite a bit about the big Movidius unit - it's inference only and it runs either TensorFlow or Caffe pre-compiled models. IIRC they're all in 16 bit mode.

The Movidius chip itself is much more powerful but you have to be a qualified partner to get the SDK.

talregev · 2018-12-11T23:56:07Z

Have some link for other that try to have tensor opencl:

https://github.com/hughperkins/tf-coriander
https://github.com/ChiahungTai/tensorflow-cl
https://github.com/guoyejun/tensorflow-cl
https://github.com/honggui/tensorflow-cl
https://github.com/benoitsteiner/tensorflow-opencl
https://github.com/lukeiwanski/tensorflow (repository is out of date)
https://github.com/codeplaysoftware/tensorflow
Maybe worth to check also:

https://documen.tician.de/pyopencl/
https://pypi.org/project/DeepCL/
https://www.khronos.org/sycl/

Feel free add working projects.

filips123 · 2019-01-09T21:28:04Z

Is there any update? This issue is over 3 years old.

mirh · 2019-01-09T21:50:49Z

YES THERE IS JUST LOOK AT THE LAST HANDFUL OF POSTS.

XVilka · 2019-01-10T02:06:37Z

@filips123 no, there are no updates and will never be in any foreseeable future - probability of that is lower than of alien invasion and finding a way to travel back in time.

lppier · 2019-01-10T04:02:47Z

This intel initiative PlaidML works reasonably well, worth checking it out.
https://github.com/plaidml/plaidml
It runs on opencl OR metal on mac. It works with Macbook Pro AMD gpus, which is what I was looking for.
Meanwhile, could you guys help vote for Pytorch support in PlaidML? plaidml/plaidml#63

mirh · 2019-01-10T13:24:41Z

PlaidML is certainly all nice and dandy (I, for one, somehow could get more performance on an nvidia gpu on opencl than with tf's cuda itself)..
But it's a backend for keras? In complete replacement to tensorflow, which you know, it's the repo we are discussing this in?
(for as much as I seem to understand latest tf versions can export models directly to keras? so there's that..)

Anyway, for the fourth damn time, if you want a recent solution on opencl and something still being actively developed (and also the thing with the actual chances to be merged here for real one day), there's just codeplay stack.
Again:
https://developer.codeplay.com/computecppce/latest/tensorflow-overview
https://github.com/Rbiessy/tensorflow/tree/dev/amd_gpu

lppier · 2019-01-11T05:40:29Z

PlaidML is certainly all nice and dandy (I, for one, somehow could get more performance on an nvidia gpu on opencl than with tf's cuda itself)..
But it's a backend for keras? In complete replacement to tensorflow, which you know, it's the repo we are discussing this in?
(for as much as I seem to understand latest tf versions can export models directly to keras? so there's that..)

Anyway, for the fourth damn time, if you want a recent solution on opencl and something still being actively developed (and also the thing with the actual chances to be merged here for real one day), there's just codeplay stack.
Again:
https://developer.codeplay.com/computecppce/latest/tensorflow-overview
https://github.com/Rbiessy/tensorflow/tree/dev/amd_gpu

My apologies, I had not realised there was no tensorflow support. My assuming brain thought that keras gpu support == tensorflow support.

iperov · 2019-02-18T13:35:01Z

plaidML is super cool. Works on keras.
Of course I had to transfer some tf code to pure keras in order to work on plaidML backend (for example tf.image.ssim)
But result - my code works on NVIDIA and AMD cards.

Also plaidML is heaven for researchers. It automatically generates gradient for any function you will write on "Tile" language and it will work on your GPU with 80% speed of tensorflow.

So I cannot understand why ML researchers still using PyTorch ? Let's boost ML science with Intel's plaidML ?

Degerz · 2019-02-26T06:59:30Z

@iperov Care to know why practically no one uses PlaidML ?

It runs pitifully slow on AMD's OpenCL implementations compared to Tensorflow's CUDA backend so there goes at least half the reason to use it. Performance so bad that using Tensorflow with CPUs is competitive or even outright beats their hardware using PlaidML ?
Nobody is interested in maintaining their specialized Tile programming language in which only someone like a pure maths professor would concoct so PlaidML's code quality just goes down the drain and no serious programmers in their right mind would want to deal with overly clever code ...
This pretty much ties into OSX Yosemite "can't determine number of CPU cores: assuming 4" #2 but ever since Intel bought out Vertex.AI, they don't care about PlaidML anymore. Intel's solution for GPU compute accelerated machine learning is introducing a new compiler specifically for deep learning now known as nGraph to target Tensorflow, PyTorch or other deep learning frameworks as a backend for them. No reason for them to keep developing PlaidML anymore as their intermediary when they have nGraph ...

People use PyTorch for other reasons such as maintainability or other features so to sum it up PlaidML is Intel's tool and they probably don't intend for it to play in any role of the final parts of their plans. nGraph's current Intel GPU backend is based off of OpenCL 2.1 of which only Intel has a conformant implementation so Intel only exists to look out for themselves rather than purely for the betterment of machine learning. When Intel goes on to further developing nGraph, I can't see them continue basing off their GPU backend on OpenCL 2.1 alone since many deep learning frameworks have templated kernels which are not compatible with OpenCL, Metal or Vulkan's separate source programming models so it's probably only for experimentation purposes. Intel's final GPU backend is probably going to either be based off of SYCL 2.2 or something else entirely different like OpenMP and maybe they'll even bring a vendor specific solution ...

As for AMD, who cares ? OpenCL is irrelevant to them and they're finally showing some results with their work on HIP ...

talregev · 2019-02-26T07:40:05Z

@iperov Care to know why practically no one uses PlaidML ?

It runs pitifully slow on AMD's OpenCL implementations compared to Tensorflow's CUDA backend so there goes at least half the reason to use it. Performance so bad that using Tensorflow with CPUs is competitive or even outright beats their hardware using PlaidML ?

Nobody is interested in maintaining their specialized Tile programming language in which only someone like a pure maths professor would concoct so PlaidML's code quality just goes down the drain and no serious programmers in their right mind would want to deal with overly clever code ...

This pretty much ties into OSX Yosemite "can't determine number of CPU cores: assuming 4" #2 but ever since Intel bought out Vertex.AI, they don't care about PlaidML anymore. Intel's solution for GPU compute accelerated machine learning is introducing a new compiler specifically for deep learning now known as nGraph to target Tensorflow, PyTorch or other deep learning frameworks as a backend for them. No reason for them to keep developing PlaidML anymore as their intermediary when they have nGraph ...

People use PyTorch for other reasons such as maintainability or other features so to sum it up PlaidML is Intel's tool and they probably don't intend for it to play in any role of the final parts of their plans. nGraph's current Intel GPU backend is based off of OpenCL 2.1 of which only Intel has a conformant implementation so Intel only exists to look out for themselves rather than purely for the betterment of machine learning. When Intel goes on to further developing nGraph, I can't see them continue basing off their GPU backend on OpenCL 2.1 alone since many deep learning frameworks have templated kernels which are not compatible with OpenCL, Metal or Vulkan's separate source programming models so it's probably only for experimentation purposes. Intel's final GPU backend is probably going to either be based off of SYCL 2.2 or something else entirely different like OpenMP and maybe they'll even bring a vendor specific solution ...

As for AMD, who cares ? OpenCL is irrelevant to them and they're finally showing some results with their work on HIP ...

What about all GPU inside arm machine like mobile phones and raspberry pi odroid and etc?
They don't support opencl?
Google should care about insert tensorflow on gpu on android.
The biggest libraries of neural network training run only on Nvidia gpu, it just make Nvidia gpu more and more expensive (because it people and companies only buy it for professional neural network training), then google will lose more money that way.

iperov · 2019-02-26T08:44:38Z

@Degerz from which planet you are came from?
How you can compare tf-CPU and AMD GPU ?
AMD GPU on plaidML x30 faster than tf-CPU

It runs pitifully slow on AMD's OpenCL implementations compared to Tensorflow's CUDA backend so there goes at least half the reason to use it

in my deepfakes tests OpenCL slower only by 20%, but in some mini networks OpenCL is 20% FASTER.

My project DeepFaceLab has many users that have been waiting for the support of AMD. How many people were delighted when deepfakes can finally be trained on AMD cards.
Also plaidML is the only backend for keras that supports AMD/IntelHD out of the box.
If a new AMD backend for keras appears, of course my project will switch to it.
PyTorch has no future.

What to maintain in plaidML ? Ops are auto differentiable, there is nothing to maintain.

Tile programming language in which only someone like a pure maths professor would concoct

Machine learning is invented by professors of mathematics, isn't it?

Degerz · 2019-02-26T21:01:57Z

@talregev What about ARM or Broadcom ? The former probably has subpar OpenCL implementation and the latter doesn't even officially provide OpenCL drivers! It's not Google's responsibility to create and maintain a competent compute stack for hardware vendors ...

@iperov You realize that training neural nets with embedding layers on PlaidML is painful, right ? PlaidML also has a bunch of other limitations as well such as not being all that well suited for DenseNets or the fact that it's computation graphs are static and does PlaidML even work well with RNNs ?

As for your project, don't worry about it. You'll move on to something better like Tensorflow since AMD will soon offer a native GPU backend for it once MIOpen gets upstreamed which is their GPU accelerated library of primitives for deep neural networks similar to their competitor's cuDNN library both of which will leave PlaidML in the dust in terms of performance. Who cares about Intel iGPUs anyway ? If Intel is truly committed to delivering high performance deep learning on their future discrete graphics hardware then they'll offer a single source option just like the others (AMD/HIP and Nvidia/CUDA) did before them ...

PyTorch has no future.

Envy much ? PyTorch is ~10x more popular than PlaidML, newest techniques in DL are implemented easily on PyTorch, tons of different contributors and is actively developed by Facebook all the while Intel hasn't contributed to PlaidML in nearly a month ?

What to maintain in plaidML ? Ops are auto differentiable, there is nothing to maintain.

So I take it from you that PlaidML shouldn't receive any new fixes or new features in the future going forward ? If you don't see the value in improving code then there's no point in convincing you to acknowledge PlaidML's glaring flaws ...

Machine learning is invented by professors of mathematics, isn't it?

Doesn't mean we have to take up whatever programming language they make up especially in the case of Tile where elegance is clearly favoured over readability. It's no wonder why so many potential contributors are scared away from contributing ...

unoexperto · 2019-02-26T21:32:19Z

Jesus, I wish you guys STFU and get back to work instead. I'll have to unsubscribe from the ticket because it's unbearable to get emails with flame wars. Too bad maintainers do not mute the thread.

@gunan @caisq @sanjoy Could you please do something about it ?

rthadur · 2021-06-24T18:56:30Z

Closing the issue , please refer here https://blog.tensorflow.org/2020/08/faster-mobile-gpu-inference-with-opencl.html

vrv mentioned this issue Nov 9, 2015

Could port to OpenCL? #28

Closed

keveman added the cuda label Nov 9, 2015

vrv mentioned this issue Nov 11, 2015

configure script hardcodes location of cuda that makes it fail on OSX #111

Closed

martinwicke added the stat:contribution welcome Status - Contributions welcome label Dec 16, 2015

benoitsteiner self-assigned this Dec 23, 2015

bhack mentioned this issue Jan 14, 2016

Any Roadmap Available? #162

Closed

Harshini-Gadige mentioned this issue Sep 21, 2018

Intel GPU Support #22450

Closed

gunan mentioned this issue Oct 26, 2018

OpenCL GPU support issue #19557

Closed

chenjiasheng mentioned this issue Dec 12, 2018

Distributed Training Randomly Stops During the Training Process #12667

Closed

lorenzoriano mentioned this issue Jan 11, 2019

BUS Error, likely with blas #24844

Closed

eric-czech mentioned this issue Feb 8, 2019

Feature request:Voxel sizes for PSF and data hammerlab/flowdec#12

Closed

tensorflow locked as too heated and limited conversation to collaborators Feb 26, 2019

rthadur removed the cuda label Mar 19, 2019

rthadur closed this as completed Jun 24, 2021

OpenCL support #22

OpenCL support #22

Comments

outlace commented Nov 9, 2015

nmabhinandan commented Nov 9, 2015

ebrevdo commented Nov 9, 2015

bhack commented Nov 9, 2015

jamesliu96 commented Nov 10, 2015

alexatknit commented Nov 10, 2015

dhess commented Nov 11, 2015

gujunli commented Nov 11, 2015

bhack commented Nov 11, 2015

nmabhinandan commented Nov 11, 2015

sasadep commented Nov 11, 2015

bhack commented Nov 15, 2015

ankdesh commented Nov 16, 2015

lukeiwanski commented Nov 25, 2015

bhack commented Nov 25, 2015

jszuppe commented Dec 6, 2015

ieee8023 commented Dec 7, 2015

armish commented Dec 7, 2015

lukeiwanski commented Dec 8, 2015

gujunli commented Dec 8, 2015

bhack commented Dec 9, 2015

gongzg commented Dec 17, 2015

ZirconCode commented Dec 23, 2015

bhack commented Jan 1, 2016

DanMcLaughlin commented Jan 19, 2016

FelixSchwarz commented Aug 27, 2018 • edited

ghost commented Aug 27, 2018

mirh commented Aug 27, 2018

fantesykikachu commented Sep 4, 2018

XVilka commented Sep 15, 2018

klokik commented Sep 15, 2018 via email

tamusjroyce commented Nov 28, 2018 • edited

znmeb commented Nov 28, 2018

talregev commented Dec 11, 2018 • edited

filips123 commented Jan 9, 2019

mirh commented Jan 9, 2019

XVilka commented Jan 10, 2019

lppier commented Jan 10, 2019 • edited

mirh commented Jan 10, 2019

lppier commented Jan 11, 2019

iperov commented Feb 18, 2019

Degerz commented Feb 26, 2019

talregev commented Feb 26, 2019

iperov commented Feb 26, 2019

Degerz commented Feb 26, 2019

unoexperto commented Feb 26, 2019

rthadur commented Jun 24, 2021

FelixSchwarz commented Aug 27, 2018 •

edited

tamusjroyce commented Nov 28, 2018 •

edited

talregev commented Dec 11, 2018 •

edited

lppier commented Jan 10, 2019 •

edited