Skip to content

Version 0.6.9: Documentation update, unique_span's, bug fixes, many small improvements

Latest
Compare
Choose a tag to compare
@eyalroz eyalroz released this 10 Mar 23:10

(This is planned to be the last release before 0.7.0, which will add support for CUDA graphs.)

Changes since v0.6.8:

Memory allocation & copying-related changes

  • #606 Can now copy directly to and from containers with contiguous storage - without going through pointers or specifying the size

Owning typed and untyped memory: unique_span and unique_region

  • #291 Added a unique_span<T> template class, combining the functionality of cuda::unique_ptr and cuda::span (and being somewhat similar to std::dynarray which almost made it into C++14). Many CUDA programs want to represent both the ownership of allocated memory, and the range of that memory for actual use, in the same variable - without the on-the-fly reallocation behavior of std::vector. This is now possible. Also implemented an untyped version of this, named unique_region.
  • #617 Replaced memory::external::mapped_region_t with memory::unique_region
  • #601 Added an empty() method to cuda::span (to match that of std::span - as it is no sometimes used)
  • #603 Use unique_span instead of our cuda::dynarray (which had been an std::vector under the hood), in various places in the API, especially RTC
  • #610 Return unique_span's from the cuda::rtc::program_output class methods which allocated their own buffers: The methods for getting the compilation log, the cubin data, the PTX and the LTO IR.

More robust memory regions (memory::region_t)

  • #592 Changed the approach used in v0.6.8 to bring managed regions and general regions in line with each other; now, memory::managed::region_t inherits memory::region_t
  • #594 Now using memory::region_t for mapped memory rather than a different, mapped-memory specific region class
  • #602 Make memory::region_t more constexpr-friendly
  • #604 memory::region_t's are now CUDA-independent, i.e. do not utilize any CUDA-specific definitions
  • #605 Can now construct const_region_t's from rvalue references to regions
  • #640 User no longer needs to know about range_attribute_t or advice_t - those are left to detail_ namespaces; also, fixed implementation of attribute setting for device-inspecific attributes
  • #647 Mapped memory: Can now implicitly convert memory::mapped::span_pair_t<T> into a pair of region_t's

Documentation & comments

  • #595 Correct the documentation for supports_memory_pools

Launch configuration & launch config builder changes

  • #596 Corrected a check against the associated device in the kernel-setting method of the launch config builder
  • #619, #618 Fixed launch configuration comparisons and now user defaulted comparison
  • #619 Fixed a bug in checking whether some CUDA-12-introduced launch config parameters are set

CUDA libraries and in-library, non-associated kernel support

  • #598 Corrected the API and implementation of get_attribute() and set_attribute() for library kernels

Internal refactoring

  • #607 Split off a detail/type_traits.hpp from types.hpp
  • #620 context::current::scoped_override_t now declared in current_context.hpp
  • #611 Reduced code repetition between context_t and primary_context_t.
  • #622 link::marshalled_options_t and link::option_t are now in the detail_ namespace - the user should typically never used
  • #624 Now collecting the log-related link options into a sub-structure of link::options_t
  • #625 Dropped specify_default_load_caching_mode from link::options_t, in favor of using an stdx::optional
  • #626 Now using optional's instead of bespoke constructs in pci_location_t
  • #628 Corrected the signature of context::current::peer_to_peer functions
  • #630 Moved program_base_t into the detail_ namespace
  • #632 Move rtc::marshalled_options_t and rtc::marshal() into the detail_ subnamespace - users should not need to use this themselves
  • #643 Moved memory::pool::ipc::ptr_handle_t out of ipc.hpp up into types.hpp (so that memory_pool.hpp doesn't depend on ipc.hpp)
  • #621 Renamed: link::fallback_strategy_t -> link::fallback_strategy_for_binary_code_t
  • #600 Now adhering to underscore suffix for proxy class field names

Other changes

  • #599 An invalid file, name_caching_program.hpp, had snuck into our code - removed it
  • #609 "Robustified" the buffers returned from cuda::rtc::program_output's various methods, so that they are all padded with an extra '\0' character past the end of the span's actual range. This is not necessarily and that data should hopefully not actually be reached, but - let's be
  • on the safe side.
  • #627 Dropped context-specification from host-memory allocator functions - it's not actually used
  • #629 Added a device ID field to the texture view object
  • #631 Dropped examples/rtc_common.hpp, which is no longer in which now
  • #638 Dropped the native_word_t type
  • #639 Simplified the memory access permissions code somewhat + some renaming (access_permissions -> permissions)
  • #637 Devices and contexts no longer have flag-related members in their public interface; these are now just implementation details
  • #645 Bug fix: Now using the correct free() function in memory::managed::detail_::deleter