Version 0.6.9: Documentation update, unique_span's, bug fixes, many small improvements
Latest(This is planned to be the last release before 0.7.0, which will add support for CUDA graphs.)
Changes since v0.6.8:
Memory allocation & copying-related changes
- #606 Can now copy directly to and from containers with contiguous storage - without going through pointers or specifying the size
Owning typed and untyped memory: unique_span
and unique_region
- #291 Added a
unique_span<T>
template class, combining the functionality ofcuda::unique_ptr
andcuda::span
(and being somewhat similar tostd::dynarray
which almost made it into C++14). Many CUDA programs want to represent both the ownership of allocated memory, and the range of that memory for actual use, in the same variable - without the on-the-fly reallocation behavior ofstd::vector
. This is now possible. Also implemented an untyped version of this, namedunique_region
. - #617 Replaced
memory::external::mapped_region_t
withmemory::unique_region
- #601 Added an
empty()
method tocuda::span
(to match that ofstd::span
- as it is no sometimes used) - #603 Use
unique_span
instead of ourcuda::dynarray
(which had been anstd::vector
under the hood), in various places in the API, especially RTC - #610 Return
unique_span
's from thecuda::rtc::program_output
class methods which allocated their own buffers: The methods for getting the compilation log, the cubin data, the PTX and the LTO IR.
More robust memory regions (memory::region_t
)
- #592 Changed the approach used in v0.6.8 to bring managed regions and general regions in line with each other; now,
memory::managed::region_t
inheritsmemory::region_t
- #594 Now using
memory::region_t
for mapped memory rather than a different, mapped-memory specific region class - #602 Make
memory::region_t
more constexpr-friendly - #604
memory::region_t
's are now CUDA-independent, i.e. do not utilize any CUDA-specific definitions - #605 Can now construct
const_region_t
's from rvalue references to regions - #640 User no longer needs to know about
range_attribute_t
oradvice_t
- those are left todetail_
namespaces; also, fixed implementation of attribute setting for device-inspecific attributes - #647 Mapped memory: Can now implicitly convert
memory::mapped::span_pair_t<T>
into a pair ofregion_t
's
Documentation & comments
- #595 Correct the documentation for
supports_memory_pools
Launch configuration & launch config builder changes
- #596 Corrected a check against the associated device in the kernel-setting method of the launch config builder
- #619, #618 Fixed launch configuration comparisons and now user defaulted comparison
- #619 Fixed a bug in checking whether some CUDA-12-introduced launch config parameters are set
CUDA libraries and in-library, non-associated kernel support
- #598 Corrected the API and implementation of
get_attribute()
andset_attribute()
for library kernels
Internal refactoring
- #607 Split off a
detail/type_traits.hpp
fromtypes.hpp
- #620
context::current::scoped_override_t
now declared incurrent_context.hpp
- #611 Reduced code repetition between
context_t
andprimary_context_t
. - #622
link::marshalled_options_t
andlink::option_t
are now in thedetail_
namespace - the user should typically never used - #624 Now collecting the log-related link options into a sub-structure of
link::options_t
- #625 Dropped
specify_default_load_caching_mode
fromlink::options_t
, in favor of using anstdx::optional
- #626 Now using optional's instead of bespoke constructs in
pci_location_t
- #628 Corrected the signature of
context::current::peer_to_peer
functions - #630 Moved
program_base_t
into thedetail_
namespace - #632 Move
rtc::marshalled_options_t
andrtc::marshal()
into thedetail_
subnamespace - users should not need to use this themselves - #643 Moved
memory::pool::ipc::ptr_handle_t
out ofipc.hpp
up intotypes.hpp
(so thatmemory_pool.hpp
doesn't depend onipc.hpp
) - #621 Renamed:
link::fallback_strategy_t
->link::fallback_strategy_for_binary_code_t
- #600 Now adhering to underscore suffix for proxy class field names
Other changes
- #599 An invalid file,
name_caching_program.hpp
, had snuck into our code - removed it - #609 "Robustified" the buffers returned from
cuda::rtc::program_output
's various methods, so that they are all padded with an extra '\0' character past the end of the span's actual range. This is not necessarily and that data should hopefully not actually be reached, but - let's be - on the safe side.
- #627 Dropped context-specification from host-memory allocator functions - it's not actually used
- #629 Added a device ID field to the texture view object
- #631 Dropped
examples/rtc_common.hpp
, which is no longer in which now - #638 Dropped the
native_word_t
type - #639 Simplified the memory access permissions code somewhat + some renaming (
access_permissions
->permissions
) - #637 Devices and contexts no longer have
flag
-related members in their public interface; these are now just implementation details - #645 Bug fix: Now using the correct
free()
function inmemory::managed::detail_::deleter