Nahim shader defines #643

devshgraphicsprogramming · 2024-01-10T09:06:25Z

Description

Testing

TODO list:

include/nbl/builtin/hlsl/device_capabilities_traits.hlsl

devshgraphicsprogramming · 2024-01-13T22:41:44Z

include/nbl/builtin/hlsl/device_capabilities_traits.hlsl

+#define NBL_GENERATE_GET_OR_DEFAULT(field, ty, default) \
+template<typename S, bool = has_member_##field<S>::value> struct get_or_default_##field : integral_constant<ty,S::field> {}; \
+template<typename S> struct get_or_default_##field<S,false> : integral_constant<ty,default> {};


this macro causes Boost.Waveto throw exceptions, find out why

@nipunG314 does it still throw exceptions?

include/nbl/builtin/hlsl/device_capabilities_traits.hlsl

devshgraphicsprogramming · 2024-01-13T22:45:52Z

include/nbl/builtin/hlsl/device_capabilities_traits.hlsl

+NBL_GENERATE_GET_OR_DEFAULT(nonCoherentAtomSize, uint16_t, 0);
+
+NBL_GENERATE_GET_OR_DEFAULT(subgroupSize, uint16_t, 0);
+NBL_GENERATE_GET_OR_DEFAULT(subgroupOpsShaderStages, core::bitflag<asset::IShader::E_SHADER_STAGE, 0);


obviously there's no bitflag on hlsl, so use a plain uint8_t

just a small annoying note, not every bitflag is a uint8_t

maybe see if you can make a nbl::hlsl::bitflag thats similar to core::bitflag except without constexpr ?

Else if its not possible/usable, you could move the enums to .hlsl headers, e.g.

namespace nbl { namespace hlsl { enum ShaderStage : uint8_t { ... }; } }

then add a using E_SHADER_STAGE = ShaderStage; alias in place of the original enum

include/nbl/builtin/hlsl/device_capabilities_traits.hlsl

devshgraphicsprogramming · 2024-01-13T22:47:07Z

include/nbl/builtin/hlsl/device_capabilities_traits.hlsl

+NBL_GENERATE_GET_OR_DEFAULT(allowCommandBufferQueryCopies, bool, false);
+NBL_GENERATE_GET_OR_DEFAULT(maxOptimallyResidentWorkgroupInvocations, uint32_t, 0);
+NBL_GENERATE_GET_OR_DEFAULT(maxResidentInvocations, uint32_t, 0);
+NBL_GENERATE_GET_OR_DEFAULT(spirvVersion, asset::CGLSLCompiler::E_SPIRV_VERSION, 0);


we might need to make that enum global and then alias or something?

I'm not sure what you mean here. Can't we just replace the enum with an integral type like the rest?

this is what I mean
#643 (comment)

devshgraphicsprogramming · 2024-01-13T22:50:31Z

Need a matching modification to nbl/video/CJITIncludeLoader.cpp

Use uint32_t instead of VkExtent2D Use vector types for small arrays Replace core::bitflag<> with uint8_t

…into nahim_shader_defines

devshgraphicsprogramming · 2024-05-15T12:22:55Z

include/nbl/builtin/hlsl/device_capabilities_traits.hlsl

+NBL_GENERATE_MEMBER_TESTER(largePoints);
+NBL_GENERATE_MEMBER_TESTER(shaderCullDistance);


alphaToOne is missing

devshgraphicsprogramming · 2024-05-15T12:35:54Z

include/nbl/builtin/hlsl/device_capabilities_traits.hlsl

+NBL_GENERATE_MEMBER_TESTER(lineWidthGranularity);
+
+NBL_GENERATE_MEMBER_TESTER(strictLines);
+


add standard sample locations

devshgraphicsprogramming · 2024-05-15T12:45:31Z

include/nbl/builtin/hlsl/device_capabilities_traits.hlsl

+NBL_GENERATE_MEMBER_TESTER(shaderUniformBufferArrayNonUniformIndexingNative);
+NBL_GENERATE_MEMBER_TESTER(shaderSampledImageArrayNonUniformIndexingNative);
+NBL_GENERATE_MEMBER_TESTER(shaderStorageImageArrayNonUniformIndexingNative);
+NBL_GENERATE_MEMBER_TESTER(shaderInputAttachmentArrayNonUniformIndexingNative);


missing shaderStorageBufferArrayNonUniformIndexingNative I think

devshgraphicsprogramming · 2024-05-15T13:14:43Z

include/nbl/builtin/hlsl/device_capabilities_traits.hlsl

+NBL_GENERATE_GET_OR_DEFAULT(maxBufferViewTexels, uint32_t, 0);
+NBL_GENERATE_GET_OR_DEFAULT(maxUBOSize, uint32_t, 0);
+NBL_GENERATE_GET_OR_DEFAULT(MaxSSBOSize, uint32_t, 0);
+NBL_GENERATE_GET_OR_DEFAULT(maxPushConstantsSize, uint32_t, 0);


pay attention to match the types in SPhysicalDeviceLimits.h

ouch, also the default values... they can't all be 0

We'll eventually write a JSON + python based generator for these three headers

devshgraphicsprogramming · 2024-05-31T11:55:02Z

src/nbl/video/device_features.json

+    "headerGuard": "_NBL_VIDEO_S_PHYSICAL_DEVICE_FEATURES_H_INCLUDED_",
+    "includePath": "nbl/video/",
+    "includes": [
+        "ECommonEnums.h"
+    ],
+    "namespace": "nbl::video",
+    "structName": "SPhysicalDeviceFeatures",
+    "structComment": [
+        "Usage of feature API",
+        "## LogicalDevice creation enabled features shouldn't necessarily equal the ones it reports as enabled (superset)",
+        "**RARE: Creating a physical device with all advertised features/extensions:**",
+        "auto features = physicalDevice->getFeatures();",
+        "",
+        "ILogicalDevice::SCreationParams params = {};",
+        "params.queueParamsCount = ; // set queue stuff",
+        "params.queueParams = ; // set queue stuff",
+        "params.enabledFeatures = features;",
+        "auto device = physicalDevice->createLogicalDevice(params);",
+        "**FREQUENT: Choosing a physical device with the features**",
+        "IPhysicalDevice::SRequiredProperties props = {}; // default initializes to apiVersion=1.1, deviceType = ET_UNKNOWN, pipelineCacheUUID = '\\0', device UUID=`\\0`, driverUUID=`\\0`, deviceLUID=`\\0`, deviceNodeMask= ~0u, driverID=UNKNOWN",
+        "// example of particular config",
+        "props.apiVersion = 1.2;",
+        "props.deviceTypeMask = ~IPhysicalDevice::ET_CPU;",
+        "props.driverIDMask = ~(EDI_AMD_PROPRIETARY|EDI_INTEL_PROPRIETARY_WINDOWS); // would be goot to turn the enum into a mask",
+        "props.conformanceVersion = 1.2;",
+        "",
+        "SDeviceFeatures requiredFeatures = {};",
+        "requiredFeatures.rayQuery = true;",
+        "",
+        "SDeviceLimits minimumLimits = {}; // would default initialize to worst possible values (small values for maximum sizes, large values for alignments, etc.)"
+    ],
+    "content": {


don't have this stuff in the json and python...

Nobody is stopping you from making a SPhysicalDeviceFeatures.h that just looks like

#ifndef __NBL_VIDEO_S_PHYSICAL_DEVICE_FEATURES_H_INCLUDED_ #define __NBL_VIDEO_S_PHYSICAL_DEVICE_FEATURES_H_INCLUDED_ ... all the includes ... // the big comment struct SPhysicalDeviceLimits.h { #include "generated/SPhysicalDeviceLimits_members.h" // or some other appropriate path as per @AnastaZIuk }; #endif

it will be much saner

devshgraphicsprogramming · 2024-05-31T11:55:58Z

src/nbl/video/device_features.json

+                    "[DO NOT EXPOSE] just reserved numbers, extension never shipped",
+                    "VK_NV_extension_152",
+                    "[ALIASED TO] VK_AMD_mixed_attachment_samples",
+                    "VK_NV_framebuffer_mixed_samples",
+                    "[DO NOT EXPOSE] For now. For 2D ui",
+                    "VK_NV_fill_rectangle",
+                    "[EXPOSE AS A LIMIT] Enabled by Default, Moved to Limits",
+                    "ShaderSMBuiltinsFeaturesNV",
+                    "VK_NV_shader_sm_builtins",
+                    "Enabled by Default, Moved to Limits",
+                    "VK_EXT_post_depth_coverage",
+                    "[DEPRECATED] Vulkan 1.1 Core and ROADMAP 2022",
+                    "SamplerYcbcrConversionFeaturesKHR",
+                    "VK_KHR_sampler_ycbcr_conversion",
+                    "[DEPRECATED] Core 1.1 implemented on default path and there's no choice in not using it",
+                    "VK_KHR_bind_memory2",
+                    "[DO NOT EXPOSE] Too ",
+                    "intrinsically",
+                    "linux\"\"VK_EXT_image_drm_format_modifier",
+                    "[DO NOT EXPOSE] just reserved numbers, extension never shipped",
+                    "VK_EXT_extension_160",
+                    "[TODO LATER] Expose when we start to experience slowdowns from validation",
+                    "VK_EXT_validation_cache",
+                    "[DEPRECATED] KHR extension supersedes and then Vulkan 1.2",
+                    "VK_EXT_descriptor_indexing",
+                    "[DO NOT EXPOSE] We don't support yet",
+                    "VK_EXT_shader_viewport_index_layer",
+                    "VK_KHR_portability_subset - PROVISINAL/NOTAVAILABLEANYMORE",
+                    "[DO NOT EXPOSE] not implementing or exposing VRS in near or far future, also has interactions with fragment density maps",
+                    "ShadingRateImageFeaturesNV",
+                    "VK_NV_shading_rate_image",
+                    "[DEPRECATED] Superseded by KHR",
+                    "VK_NV_ray_tracing",
+                    "RepresentativeFragmentTestFeaturesNV",
+                    "VK_NV_representative_fragment_test"


thats broken down in a super weird way, I'd rather have them as hidden variables (no export) with visible comments

devshgraphicsprogramming · 2024-05-31T11:56:26Z

src/nbl/video/device_limits.json

+    "headerGuard": "_NBL_VIDEO_S_PHYSICAL_DEVICE_LIMITS_H_INCLUDED_",
+    "includePath": "nbl/asset/",
+    "includes": [
+        "utils/CGLSLCompiler.h",
+        "IImage.h",
+        "IRenderpass.h"
+    ],
+    "stlIncludes": [
+        "type_traits"
+    ],
+    "namespace": "nbl::video",
+    "structName": "SPhysicalDeviceLimits",
+    "structComment": [
+        "Struct is populated with Nabla Core Profile Limit Minimums"
+    ],
+    "content": {


same as for the feature struct

devshgraphicsprogramming · 2024-05-31T11:58:55Z

src/nbl/video/device_limits.json

+                "type": "function",
+                "function": [
+                    "// we don't compare certain capabilities because they don't mean better/worse",
+                    "inline bool isSubsetOf(const SPhysicalDeviceLimits& _rhs) const",
+                    "{",
+                    "    if (maxImageDimension1D > _rhs.maxImageDimension1D) return false;",
+                    "    if (maxImageDimension2D > _rhs.maxImageDimension2D) return false;",
+                    "    if (maxImageDimension3D > _rhs.maxImageDimension3D) return false;",
+                    "    if (maxImageDimensionCube > _rhs.maxImageDimensionCube) return false;",
+                    "    if (maxImageArrayLayers > _rhs.maxImageArrayLayers) return false;",
+                    "    if (maxBufferViewTexels > _rhs.maxBufferViewTexels) return false;",
+                    "    if (maxUBOSize > _rhs.maxUBOSize) return false;",


this should be missing from the JSON, and the body should be autogenerated by python by just knowing the type of the member variable you declared above.

If you manually neeed to maintain this then we're back to square 1 this becomes a handwritten header but worse

devshgraphicsprogramming · 2024-05-31T12:01:55Z

src/nbl/video/device_limits.json

+                    "// utility functions",
+                    "// In the cases where the workgroups synchronise with each other such as work DAGs (i.e. `CScanner`),",
+                    "// `workgroupSpinningProtection` is meant to protect against launching a dispatch so wide that",
+                    "// a workgroup of the next cut of the DAG spins for an extended time to wait on a workgroup from a previous one.",
+                    "inline uint32_t computeOptimalPersistentWorkgroupDispatchSize(const uint64_t elementCount, const uint32_t workgroupSize, const uint32_t workgroupSpinningProtection=1u) const",
+                    "{",
+                    "    assert(elementCount!=0ull && \"Input element count can't be 0!\");",
+                    "    const uint64_t infinitelyWideDeviceWGCount = (elementCount-1ull)/(static_cast<uint64_t>(workgroupSize)*static_cast<uint64_t>(workgroupSpinningProtection))+1ull;",
+                    "    const uint32_t maxResidentWorkgroups = maxResidentInvocations/workgroupSize;",
+                    "    return static_cast<uint32_t>(core::min<uint64_t>(infinitelyWideDeviceWGCount,maxResidentWorkgroups));",
+                    "}"


if you follow the

Some handwritten code struct MyStruct { #include "python_generated.h" More handwritten code }; Some more handwritten code

pattern, you can add some handwritten members/methods to the struct.

devshgraphicsprogramming · 2024-05-31T12:02:21Z

src/nbl/CMakeLists.txt

+# JSON
+	${NBL_ROOT_PATH}/src/nbl/video/device_limits.json
+	${NBL_ROOT_PATH}/src/nbl/video/device_features.json


@AnastaZIuk is this even a right thing to do?

of course it's invalid, currently they seem to be inputs for combined gen.py and DeviceGen.py python scripts, they should be handled with custom target + custom command pair, those json files should be flagged as DEPENDS of the custom command. I guess output locations are known as well - they should be passed to the custom command's OUTPUT and custom target's DEPENDS.

also handled with python or not - those scripts should never hardcode location to the inputs like do do currently, the script should take an input and expected output + be executed by a custom command which prepares the execution of the script given correct arguments for the script execution

add shader defines

7aaf269

devshgraphicsprogramming commented Jan 13, 2024

View reviewed changes

include/nbl/builtin/hlsl/device_capabilities_traits.hlsl Outdated Show resolved Hide resolved