Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default build system support for CUDA/PTX #19302

Closed
maleadt opened this issue Nov 11, 2016 · 17 comments
Closed

Default build system support for CUDA/PTX #19302

maleadt opened this issue Nov 11, 2016 · 17 comments
Labels
domain:building Build system, or building Julia or its dependencies domain:gpu Affects running Julia on a GPU
Milestone

Comments

@maleadt
Copy link
Member

maleadt commented Nov 11, 2016

As master is getting closer to be compatible with CUDAnative.jl, I'd like to discuss how we make the default build and/or the binary releases compatible with it. I think it would be a great addition for users to be able to target GPUs without too much effort.

  1. I've been using LLVM 3.9 on my branch, including some patches to get rid of some PTX-specific bugs. I guess Upgrade to LLVM 3.9 #19123 is bound to happen before, at which point including some extra patches shouldn't hurt?

  2. The next obvious change is to enable the PTX back-end in the default build of LLVM. I'm not sure how hard we try to keep libLLVM small (vs. eg. the effort to keep sysimg small), but FWIW enabling the PTX target next to X86 increases the LLVM library file size from 36624 to 37664 kB (Linux x64), a mere 3% increase.

  3. Lastly, CUDAnative.jl uses LLVM.jl, which wraps and extends the C API. This requires llvm-config as well as the LLVM headers, both of which aren't part of the binary build. @vtjnash suggested creating both a regular build tarball, and a extras or tools archive containing non-critical headers and tools (this could also include, eg., libclang for Cxx.jl).

cc @vchuravy @tkelman @staticfloat

Added a tentative 0.6.0 milestone, it would be nice to have experimental GPU support in that version.
But I haven't been on the triage calls, so please change if that has already been decided against.

@maleadt maleadt added the domain:building Build system, or building Julia or its dependencies label Nov 11, 2016
@maleadt maleadt added this to the 0.6.0 milestone Nov 11, 2016
@tkelman
Copy link
Contributor

tkelman commented Nov 11, 2016

I don't think we should be adding "nice to have" to the milestone if we plan on feature freezing next month. If this can be made to work and doesn't break anything or make the binaries that noticeably larger, we can do it, but I wouldn't consider it release blocking. In terms of llvm-config and headers for packages, I think the packages are going to need to figure out how to build and distribute binaries for the platforms they want to support. I don't want the base buildbots being responsible for providing package binaries for anything beyond "stdlib." There's not enough bandwidth or S3 budget for providing to all packages so we have to say no to some, and then you get into playing favorites - what conditions qualify a package as "important enough" to provide binaries for? I think the only sensible answer for that is if it's in the future stdlib then it qualifies, if it isn't then it doesn't.

@maleadt
Copy link
Member Author

maleadt commented Nov 11, 2016

The "nice to have" extends it just being nice: I'd like to get real user feedback on how people want to use GPU codegen support in order to improve and stabilize the interface (params, hooks) with inference and codegen before 1.0.

Moving the burden of providing the necessary binaries to the packages (LLVM.jl, in my case) is definitely an option, but given how there's gone quite some development into tweaking and tuning the buildbots I'm not sure how easy it would be to faithfully reproduce that on eg. Travis?

@ViralBShah
Copy link
Member

ViralBShah commented Nov 11, 2016

I think in the early user feedback stage, we can go with a Cxx like model. Let's not worry about the buildbots, but get it to a stage where people can easily build from source and make it work.

Patches can certainly be included in master for ease of building - even though gpu support may only work in source builds.

As for S3 budget, I don't think adding some GPU support is going to even register. Lesser of a concern. As gpu support starts becoming more and more reliable and usable, we will certainly want to provide more support - but seems like more of a 1.0 timeframe discussion. This is not a package thing, but a core compiler capability, the way I see it.

@tkelman
Copy link
Contributor

tkelman commented Nov 11, 2016

S3 budget was referring to building multiple classes of binary distributions, with separate binaries for things like headers or libraries/tools only used at build time by base.

We will be working on making a buildbot-like generic binary build system easier to reproduce for packages anyway. The current buildbot setup is fragile and not automated enough for it to really scale any further beyond what we're currently using it to do.

@StefanKarpinski
Copy link
Sponsor Member

@maleadt: this can go on the milestone if you're willing to own it and make it happen by end of year (feature freeze for 0.6). Regardless, I think we should switch to LLVM 3.9 ASAP.

@ViralBShah
Copy link
Member

If we are to move to LLVM 3.9, shouldn't we be doing it about now, to give adequate testing time?

@StefanKarpinski
Copy link
Sponsor Member

Yes – ASAP. I've been preaching this for a while. There was some pushback but I forget why.

@maleadt
Copy link
Member Author

maleadt commented Nov 14, 2016

I'll focus in getting source builds GPU compatible for 0.6, which should be pretty easy and non-controversial (enable PTX back-end, add some patches).

I also spent some time on figuring out if it's possible to distribute binaries for LLVM.jl, but that most likely won't work, see LLVM.jl/#10. But I agree that figuring this out is 1.0 territory.

@tkelman
Copy link
Contributor

tkelman commented Dec 29, 2016

closed by #19323 + #19678

@tkelman tkelman closed this as completed Dec 29, 2016
@maleadt
Copy link
Member Author

maleadt commented Dec 30, 2016

Point 3 hasn't really been resolved yet, but we can do without for now. It'll require users wanting GPU support to do a source build, albeit an unmodified one, and we can work on providing the necessary auxiliary files at a later time.

Also, for those wanting to try out, there's some outstanding issues in CUDAnative due to #17057, but I'll be working on those next week.

@tkelman
Copy link
Contributor

tkelman commented Dec 31, 2016

How much extra code are you needing to compile that needs llvm-config and the headers? If it's a small set of functions you want to expose, could we put entry points to them here in libjulia?

@maleadt
Copy link
Member Author

maleadt commented Jan 4, 2017

(missed your comment)

It's not that much code, but it needs to change in lockstep with the rest of the package so I'd rather keep it there. Although it would solve the problem of course...

@tkelman
Copy link
Contributor

tkelman commented Jan 4, 2017

Can you ship a binary version in the package then?

@maleadt
Copy link
Member Author

maleadt commented Jan 4, 2017

I tried, but couldn't get it to work:

This is probably not going to work. The extras library uses some LLVM C++ API's, eg. here, which might result in object-layout dependent code getting baked into the extras library. E.g., continuing on the example above, Function::getAttributes and AttributeSet::isEmpty are both inlined, resulting in a call to AttributeSetImpl::getNumSlots which is defined in the header, hence compiled into our library.

And indeed, memory layout is mostly implementation defined, and does differ between eg. Travis' compiler (on their trusty image) and my local clang 3.8, resulting in faulty behavior.

If anybody reading this has any suggestions, please chime in.

@tkelman
Copy link
Contributor

tkelman commented Jan 4, 2017

Were you building with the same compiler as the Julia binaries, or the system compiler? The latter can easily differ in ABI. Either way, I guess revisit when things stabilize a bit in the package? You can always be more specific about the package's supported version range of Julia if that helps make this kind of thing more predictable going forward.

@maleadt
Copy link
Member Author

maleadt commented Jan 4, 2017

Yeah I wouldn't want to rush that code into base right now, having users test it and revisit this in a couple of months seems fine.

But continuing on ABI differences, wouldn't that imply that it's never safe to build and and link against a prebuilt C++ library (ie. libLLVM) unless the toolchain is identical (which it never will)?

@tkelman
Copy link
Contributor

tkelman commented Jan 4, 2017

The toolchain will be identical if we standardize the way packages build binaries. On Linux especially doing generic binaries does require that you use a uniform toolchain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:building Build system, or building Julia or its dependencies domain:gpu Affects running Julia on a GPU
Projects
None yet
Development

No branches or pull requests

4 participants