Adam Sawicki - Homepage View RSS

Graphics programming, game programming, C++, games, Windows, Internet and more...
Hide details



3 States of Preprocessor Macros 30 Jun 8:16 AM (5 days ago)

This will be a beginner-level article for programmers working in C, C++, or other languages that use a similar preprocessor - such as shader languages like HLSL or GLSL. The preprocessor is a powerful feature. While it can be misused in ways that make code more complex and error-prone, it can also be a valuable tool for building programs and libraries that work across multiple platforms and external environments.

In this post, I’ll focus specifically on conditional compilation using the #if and #ifdef directives. These allow you to include or exclude parts of your code at compile time, which is much more powerful than a typical runtime if() condition. For example, you can completely remove a piece of code that might not even compile in certain configurations. This is especially useful when targeting specific platforms, external libraries, or particular versions of them.

When it comes to enabling or disabling a feature in your code, there are generally two common approaches:

Solution 1: Define or don’t define a macro and use #ifdef:

// To disable the feature: leave the macro undefined.

// To enable the feature: define the macro (with or without a value).
#define M

// Later in the code...

#ifdef M
    // Use the feature...
#else
    // Use fallback path...
#endif

Solution 2: Define a macro with a numeric value (0 or 1), and use #if:

// To disable the feature: define the macro as 0.
#define M 0
// To enable the feature: define the macro as a non-zero value.
#define M 1

// Later in the code...

#if M
    // Use the feature...
#else
    // Use fallback path...
#endif

There are more possibilities to consider, so let’s summarize how different macro definitions behave with #ifdef and #if in the table below:

  #ifdef M #if M
(Undefined) No No
#define M Yes ERROR
#define M 0 Yes No
#define M 1 Yes Yes
#define M (1) Yes Yes
#define M FOO Yes No
#define M "FOO" Yes ERROR

The #ifdef M directive simply checks whether the macro M is defined, no matter if it has empty value or any other value. On the other hand, #if M attempts to evaluate the value of M as an integer constant expression. This means it works correctly if M is defined as a literal number like 1 or even as an arithmetic expression like (OTHER_MACRO + 1). Interestingly, using an undefined symbol in #if evaluates to 0, but defining a macro with an empty value or a non-numeric token (like a string) will cause a compilation error - such as “error C1017: invalid integer constant expression” in Visual Studio.

It's also worth noting that #if can be used to check whether a macro is defined by writing #if defined(M). While this is more verbose than #ifdef M, it’s also more flexible and robust. It allows you to combine multiple conditions using logical operators like && and ||, enabling more complex preprocessor logic. It is also the only option when doing #elif defined(OTHER_M), unless you are using C++23, which adds missing #elifdef and #elifndef directives.

So, which of the two approaches should you choose? We may argue about the one or the other, but when developing Vulkan Memory Allocator and D3D12 Memory Allocator libraries, I decided to treat some configuration macros as having three distinct states:

  1. The user explicitly defined the macro as 0, meaning they want the feature disabled.
  2. The user explicitly defined the macro as 1, meaning they want the feature enabled.
  3. The user left the macro undefined, meaning they have no preference - so I make the decision based on internal logic.

To support this pattern, I use the following structure:

#ifndef M
    #if (MY OWN CONDITION...)
        #define M 1
    #else
        #define M 0
    #endif
#endif

// Somewhere later...

#if M
    // Use the feature...
#else
    // Use fallback path...
#endif

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

The Secrets of Floating-Point Numbers - a New Article 28 May 6:53 AM (last month)

Today I would like to present my new article: "The Secrets of Floating-Point Numbers". I can be helpful to any programmer no matter what programming language they use. In this article, I discuss floating-point numbers compliant with the IEEE 754 standard, which are available in most programming languages. I describe their structure, capabilities, and limitations. I also address the common belief that these numbers are inaccurate or nondeterministic. Furthermore, I highlight many non-obvious pitfalls that await developers who use them.

The article was first published few months ago in Polish in issue 5/2024 (115) (November/December 2024) of the Programista magazine. Now I have a right to show it publicly for free, so I share it in two language versions:

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

D3d12info app and online GPU database 29 Apr 12:16 PM (2 months ago)

This post is about D3d12info open-source project that I'm involved in. The project is in continuous development, while I noticed I didn't blog about it since I first announced it in 2022. Here, I describe the story behind it and the current state of it. The post may be interesting to you if you are a programmer coding graphics for Windows using DirectX 12.

Introduction

Various GPUs (discrete graphics cards, processor integrated graphics chips) from various vendors (AMD, Intel, Nvidia, …) have various capabilities. Even when a GPU supports a specific API (OpenGL, DirectX 11, DirectX 12, Vulkan), some of the features may not be supported. These features span from the big ones that even non-programmers recognize, like ray tracing, to the most obscure, like the lengthy D3D12_FEATURE_DATA_D3D12_OPTIONS::VPAndRTArrayIndexFromAnyShaderFeedingRasterizerSupportedWithoutGSEmulation and even lengthier Vulkan VkPhysicalDeviceShaderIntegerDotProductProperties::integerDotProductAccumulatingSaturating4x8BitPackedMixedSignednessAccelerated 🙂

Before using any of these features in our apps, we need to query if the feature is supported on the current GPU. Checking it programmatically is relatively simple. Graphics APIs offer functions for that purpose, like ID3D12Device::CheckFeatureSupport and vkGetPhysicalDeviceProperties2. When the feature is not supported, the app should either fall back to some other implementation (e.g. using screen-space reflections instead of ray-traced reflections) or display an error telling that the GPU doesn’t meet our minimum hardware requirements.

However, when we plan using some optional feature of the API and we think about testing it on a variety of platforms and eventually shipping it to end users, we may ask:

  1. Is there any existing tool that would query and display the capabilities of the local GPU without need to develop our own program?
  2. Is there a way to know which features are widely supported and which are not, like a database of various GPUs and capabilities they support?

State of Vulkan

For Vulkan, answers to these questions are: yes & yes. For querying the capabilities of the local GPU, Vulkan SDK comes with a small command-line program called “vulkaninfo”. After calling it, we can see all the extensions, properties, features, and limits of the GPU, in a human-readable text format. Alternatively, JSON and HTML format is also available.

For the database of GPUs, Sascha Willems maintains Vulkan Hardware Database and an accompanying GUI app Vulkan Hardware Capability Viewer that presents the capabilities of the local GPU and also allows submitting this report to the database.

Former state of DX12

For Direct3D 12, however, I wasn’t aware of any such application or database. Windows SDK comes with a GUI app that can be found in "c:\Program Files (x86)\Windows Kits\10\bin\*\x64\dxcapsviewer.exe". It presents some features of DirectDraw, Direct3D9, 11, DXGI, also some options of Direct3D12, but doesn’t seem to be complete in terms of all the latest options available. There is no updated version of it distributed with DirectX 12 Agility SDK. There is also no way to use it from command line. At least Microsoft open sourced it: DxCapsViewer @ GitHub.

D3d12info

This is why I decided to develop D3d12info, to become a DX12 equivalent of vulkaninfo. Written in C++, this small Windows console app prints all the capabilities of DX12 to the standard output, in text format. The project is open source, under MIT license, but you can also download precompiled binaries by picking the latest release.

JSON is also available as the output format, which makes the app suitable for automated processing as part of some larger pipeline.

I published first “draft” version of D3d12info in 2018, but it wasn’t until July 2022 that I released first version I considered complete and marked as 1.0.0. The app had many releases since then. I update it as Microsoft ships new versions of the Agility SDK to fetch newly added capabilities (including ones from “preview” version of the SDK).

There are some other information fetched and printed by the app apart from DX12 features. The ones I consider most important are:

However, I try to limit the scope of the project to avoid feature creep, so I refuse some feature requests. For example, I decided not to include capabilities queried from DirectX Video or WDDM.

D3d12infoGUI

D3d12info would stay only a command-line tool without Dmytro Bulatov “Devaniti” - a Ukrainian developer working at ZibraAI, who joined the project and developed D3d12infoGUI. This app is a convenient overlay that unpacks the command-line D3d12info, launches it, converts its output into a nicely looking HTML page, which is then saved to a temporary file and opened in a web browser. This allows browsing capabilities of the current GPU in a convenient way. Dmytro also contributed significantly to the code of my D3d12info project.

If you scroll down the report, you can see a table with texture formats and the capabilities they support. Many of them are mandatory for every GPU supporting feature level 12_0, which are marked by a hollow check mark. However, as you can see below, my GPU supports some additional formats as “UAV Typed Load”:

D3d12infoDB

The web page with the report also offers a large green button near the top that submits it to the online database. Here comes the last part of the ecosystem: D3d12infoDB. This is something I was dreaming about for years, but I couldn’t make it since I am not a proficient web developer. Now, Dmytro along with other contributors from the open source community developed a website that gathers reports about various GPUs, offering multiple ways of browsing, searching, and filtering them.

One great feature they’ve added recently is Feature Table. It gathers DX12 capabilities as rows, while columns are subsequent generations of the GPUs from AMD, Nvidia, Intel, and Qualcomm. This way, we can easily see which features are supported by older GPU generations to make a better decision about minimum feature set required by the game we develop. For example, we can see that ray tracing (DXR 1.1) and mesh shaders are supported by Nvidia since Turning architecture (GeForce RTX 2000 series, released in 2018), but support from AMD is more recent, since RDNA2 architecture (Radeon RX 6000 series, released in 2020).

What’s next?

As I mentioned above, I keep the D3d12info tool updated to the latest DirectX 12 Agility SDK, to fetch and print newly added capabilities. This also includes major features like DirectSR or metacommands. D3d12infoGUI app and D3d12infoDB website are also updated frequently.

I want to avoid expanding my app too much. One major feature I consider adding is a separate executable for x86 32-bit, x86 64-bit, and ARM architecture, as I heard there are differences in DX12 capabilities supported between them, while some graphics programmers (e.g. on the demoscene) still target 32 bits. Please let me know if it would be useful to you!

Call to action!

Finally, here is my call to action! You can help the project by submitting your GPU to the online database. Every submission counts. Even having a different version of the graphics driver constitutes a separate entry. Please download the latest D3d12infoGUI release, launch it, and when the local web page opens, press that large green button to submit your report.

Only if you are one of those developers working for a GPU vendor and you use some prototype future GPU hardware or an internal unreleased build of the graphics driver, then, please don’t do it. We don’t want to leak any confidential information through this website. If you accidentally submitted such report, please contact us and we will remove it.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

Fixing Godot 4.3 Hang on ASUS TUF Gaming Laptop 26 Mar 1:08 PM (3 months ago)

In January 2025, I participated in PolyJam - a Global Game Jam site in Warsaw, Poland. I shared my experiences in a blog post: Global Game Jam 2025 and First Impressions from Godot. This post focuses on a specific issue I encountered during the jam: Godot 4.3 frequently hanging on my ASUS TUF Gaming laptop. If you're in a hurry, you can SCROLL DOWN to skip straight to the solution that worked for me.

The laptop I used was an ASUS TUF Gaming FX505DY. Interestingly, it has two different AMD GPUs onboard - a detail that becomes important later:

The game we developed wasn’t particularly complex or demanding - it was a 2D pixel art project. Yet, the Godot editor kept freezing frequently, even without running the game. The hangs occurred at random moments, often while simply navigating the editor UI. Each time, I had to force-close and restart the process. I was using Godot 4.3 Stable at the time.

I needed a quick solution. My first step was verifying that both Godot 4.3 and my AMD graphics drivers were up to date (they were). Then, I launched Godot via "Godot_v4.3-stable_win64_console.exe", which displays a console window with debug logs alongside the editor. That’s when I noticed an error message appearing every time the hang occurred:

ERROR: Condition "err != VK_SUCCESS" is true. Returning: FAILED
   at: command_queue_execute_and_present (drivers/vulkan/rendering_device_driver_vulkan.cpp:2266)

This suggested the issue might be GPU-related, specifically involving the Vulkan API. However, I wasn’t entirely sure - the same error message occasionally appeared even when the engine wasn’t hanging, so it wasn’t a definitive indicator.

To investigate further, I decided to enable the Vulkan validation layer, hoping it would reveal more detailed error messages about what the engine was doing wrong. Having Vulkan SDK installed in my system, I launched the Vulkan Configurator app that comes with it ("Bin\vkconfig.exe"), I selected Vulkan Layers Management = Layers Controlled by the Vulkan Configurator, and selected Validation.

Unfortunately, when I launched Godot again, no new error messages appeared in the console. (Looking back, I’m not even sure if that console window actually captured the process’s standard output.) For a brief moment, I thought enabling the Vulkan validation layer had fixed the hangs - but they soon returned. Maybe they were less frequent, or perhaps it was just wishful thinking.

Next, I considered forcing Godot to use the integrated GPU (Radeon Vega 8) instead of the more powerful discrete GPU (RX 560X). To test this, I adjusted Windows power settings to prioritize power saving over maximum performance. However, this didn’t work - Godot still reported using the Radeon RX 560X.

THE SOLUTION: What finally worked was forcing Godot to use the integrated GPU by launching it with a specific command-line parameter. Instead of running the editor normally, I used:

Godot_v4.3-stable_win64_console.exe --verbose --gpu-index 1

This made Godot use the second GPU (index 1) - the slower Radeon Vega 8 - instead of the default RX 560X. The result? No more hangs. While the integrated GPU is less powerful, it was more than enough for our 2D pixel art game.

I am not sure why it helped, considering that both GPUs on my laptop are from AMD and they are supported by one driver. I also didn't check whether the new Godot 4.4 that was released since then has this bug fixed. I am just leaving this story here, in case someone stumbles upon the same problem in the future.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

DirectX 12 Agility SDK 1.716.0-preview Explained 2 Feb 5:40 AM (5 months ago)

On January 30th 2025 Microsoft released a new version of DirectX 12 Agility SDK: 1.615.0 (D3D12SDKVersion = 615) and 1.716.0-preview (D3D12SDKVersion = 716). The main article announcing this release is: AgilitySDK 1.716.0-preview and 1.615-retail. Files are available to download from DirectX 12 Agility SDK Downloads, as always, in form of .nupkg files (which are really ZIP archives).

I can see several interesting additions in the new SDK, so in this article I am going to describe them and delve into details of some of them. This way, I aim to consolidate information that is scattered across multiple Microsoft pages and provide links to all of them. The article is intended for advanced programmers who use DirectX 12 and are interested in the latest developments of the API and its surrounding ecosystem, including features that are currently in preview mode and will be included in future retail versions.

Shader hash bypass

This is the only feature added to both the retail and preview versions of the new SDK. The article announcing it is: Agility SDK 1.716.0-preview & 1.615-retail: Shader hash bypass. A more extensive article explaining this feature is available here: Validator Hashing.

The problem:

If you use DirectX 12, you most likely know that shaders are compiled in two stages. First, the source code in HLSL (High-Level Shading Language) is compiled using the Microsoft DXC compiler into an intermediate binary code. This often happens offline when the application is built. The intermediate form is commonly referred to as DXBC (as the container format and the first 4 bytes of the file) or DXIL (as the intermediate language of the shader code, somewhat similar to SPIR-V or LLVM IR). This intermediate code is then passed to a DirectX 12 function that creates a Pipeline State Object (PSO), such as ID3D12Device::CreateGraphicsPipelineState. During this step, the second stage of compilation occurs within the graphics driver, converting the intermediate code into machine code (ISA) specific to the GPU. I described this process in more detail in my article Shapes and forms of DX12 root signatures, specifically in the "Shader Compilation" section.

What you may not know is that the intermediate compiled shader blob is digitally signed by the DXC compiler using a hash embedded within it. This hash is then validated during PSO creation, and the function fails if the hash doesn’t match. Moreover, despite the DXC compiler being open source and hosted on github.com/microsoft/DirectXShaderCompiler, the signing process is handled by a separate library, "dxil.dll", which is not open source.

If you only use the DXC compiler provided by Microsoft, you may never encounter any issues with this. I first noticed this problem when I accidentally used "dxc.exe" from the Vulkan SDK instead of the Windows SDK to compile my shaders. This happened because the Vulkan SDK appeared first in my "PATH" environment variable. My shaders compiled successfully, but since the closed-source "dxil.dll" library is not distributed with the Vulkan SDK, they were not signed. As a result, I couldn’t create PSO objects from them. As the ecosystem of graphics APIs continues to grow, this could also become a problem for libraries and tools that aim to generate DXIL code directly, bypassing the HLSL source code and DXC compiler. Some developers have even reverse-engineered the signing algorithm to overcome this obstacle, as described by Stephen Gutekanst / Hexops in this article: Building the DirectX shader compiler better than Microsoft?.

The solution:

With this new SDK release, Microsoft has made two significant changes:

Technologies that generate DXIL shader code can now use either of these methods to produce a valid shader.

The capability to check whether this new feature is supported is exposed through D3D12_FEATURE_DATA_BYTECODE_BYPASS_HASH_SUPPORTED::Supported. However, it appears to be implemented entirely at the level of the Microsoft DirectX runtime rather than the graphics driver, as it returns TRUE on every system I tested.

One caveat is that "dxil.dll" not only signs the shader but also performs some form of validation. Microsoft didn’t want to leave developers without the ability to validate their shaders when using the bypass hash. To address this, they have now integrated the validation code into the D3D Debug Layer, allowing shaders to be validated as they are passed to the PSO creation function.

Tight alignment of resources

This feature is only available in the preview SDK version. The article announcing it is: Agility SDK 1.716.0-preview: Tight Alignment of Resources. There is also specification: Direct3D 12 Tight Placed Resource Alignment, but it very low level, describing even the interface for the graphics driver.

The problem:

This one is particularly interesting to me, as I develop the D3D12 Memory Allocator and Vulkan Memory Allocator libraries, which focus on GPU memory management. In DirectX 12, buffers require alignment to 64 KB, which can be problematic and lead to significant memory waste when creating a large number of very small buffers. I previously discussed this issue in my older article: Secrets of Direct3D 12: Resource Alignment.

The solution:

This is one of many features that the Vulkan API got right, and Microsoft is now aligning DirectX 12 in the same direction. In Vulkan, developers need to query the required size and alignment of each resource using functions like vkGetBufferMemoryRequirements, and the driver can return a small alignment if supported. For more details, you can refer to my older article: Differences in memory management between Direct3D 12 and Vulkan. Microsoft is now finally allowing buffers in DirectX 12 to support smaller alignments by introducing the following new API elements:

I have already implemented support for this new feature in the D3D12MA library. Since this is a preview feature, I’ve done so on a separate branch for now. You can find it here: D3D12MemoryAllocator branch resource-tight-alignment.

This feature requires support from the graphics driver, and as of today, no drivers support it yet. The announcement article mentions that AMD plans to release a supporting driver in early February, while other GPU vendors are also interested and will support it in an "upcoming driver" or at some indefinite point in the future - similar to other preview features described below.

However, testing is possible right now using the software (CPU) implementation of DirectX 12 called WARP. Here’s how you can set it up:

Microsoft has also shared a sample application to test this feature: DirectX-Graphics-Samples - HelloTightAlignment.

Application specific driver state

This feature is only available in the preview SDK version. The article announcing it is: Agility SDK 1.716.0-preview: Application Specific Driver State. It is intended for capture-replay tools rather than general usage in applications.

The problem:

A graphics API like Direct3D or Vulkan serves as a standardized contract between a game, game engine, or other graphics application, and the graphics driver. In an ideal world, every application that correctly uses the API would work seamlessly with any driver that correctly implements the API. However, we know that software is far from perfect and often contains bugs, which can exist on either side of the API: in the application or in the graphics driver.

It’s no secret that graphics drivers often detect specific popular or problematic games and applications to apply tailored settings to them. These settings might include tweaks to the DirectX 12 driver or the shader compiler, for example. Such adjustments can improve performance in cases where default heuristics are not optimal for a particular application or shader, or they can provide workarounds for known bugs.

For the driver to detect a specific application, it would be helpful to pass some form of application identification. Vulkan includes this functionality in its core API through the VkApplicationInfo structure, where developers can provide the application name, engine name, application version, and engine version. DirectX 12, however, lacks this feature. The AMD GPU Services (AGS) library adds this capability with the AGSDX12ExtensionParams structure, but this is specific to AMD and not universally adopted by all applications.

Because of this limitation, DirectX 12 drivers must rely on detecting applications solely by their .exe file name. This can cause issues with capture-replay tools such as PIX on Windows, RenderDoc or GFXReconstruct. These tools attempt to replay the same sequence of DirectX 12 calls but use a different executable name, which means driver workarounds are not applied.

Interestingly, there is a workaround for PIX that you can try if you encounter issues opening or analyzing a capture:

  1. Rename the 'WinPixEngineHost.exe" file to match the name of the original application, such as "ThatGreatGame.exe".
  2. Create a file system link called "WinPixEngineHost.exe" pointing to that new file: mklink WinPixEngineHost.exe ThatGreatGame.exe
  3. Launch PIX, open the capture, and start the Analysis.

This way, PIX will use "WinPixEngineHost.exe" to launch the DirectX 12 workload, but the driver will see the original executable name. This ensures that the app-specific profile is applied, which may resolve the issue.

The solution:

With this new SDK release, Microsoft introduces an API to retrieve and apply an "application-specific driver state." This state will take the form of an opaque blob of binary data. With this feature and a supporting driver, capture-replay tools will hopefully be able to instruct the driver to apply the same app-specific profile and workarounds when replaying a recorded graphics workload as it would for the original application - even if the executable file name of the replay tool is different. This means that workarounds like the one described above will no longer be necessary.

The support for this feature can be queried using D3D12_FEATURE_DATA_APPLICATION_SPECIFIC_DRIVER_STATE::Supported. Since this feature is intended for tools rather than typical graphics applications, I won’t delve into further details here.

Recreate at GPUVA

This feature is only available in the preview SDK version. The article announcing it is: Agility SDK 1.716.0-preview: Recreate At GPUVA. It is intended for capture-replay tools rather than general usage in applications.

The problem:

Graphics APIs are gradually moving toward the use of free-form pointers, known as GPU Virtual Addresses (GPUVA). If such pointers are embedded in buffers, capture-replay tools may struggle to replay the workload accurately, as the addresses of the resources may differ in subsequent runs. Microsoft mentions that in PIX, they intercept the indirect argument buffer used for ExecuteIndirect to patch these pointers, but this approach may not always be fully reliable.

The solution:

With this new SDK release, Microsoft introduces an API to retrieve the address of a resource and to request the creation of a new resource at a specific address. To ensure that no other resources are assigned to the intended address beforehand, there will also be an option to reserve a list of GPUVA address ranges before creating a Direct3D 12 device.

The support for this feature can be queried using D3D12_FEATURE_DATA_D3D12_OPTIONS20::RecreateAtTier. Since this feature is intended for tools rather than typical graphics applications, I won’t delve into further details here.

This is yet another feature that Vulkan already provides, while Microsoft is only now adding it. In Vulkan, the ability to recreate resources at a specific address was introduced alongside the VK_KHR_buffer_device_address extension, which introduced free-form pointers. This functionality is provided through "capture replay" features, such as the VkBufferOpaqueCaptureAddressCreateInfo structure.

Runtime bypass

This feature works automatically and does not introduce any new API. It improves performance by passing some DirectX 12 function calls directly to the graphics driver, bypassing intermediate functions in Microsoft’s DirectX 12 runtime code.

If I understand it correctly, this appears to be yet another feature that Vulkan got right, and Microsoft is now catching up. For more details, see the article Architecture of the Vulkan Loader Interfaces, which describes how dynamically fetching pointers to Vulkan functions using vkGetInstanceProcAddr and vkGetDeviceProcAddr can point directly to the "Installable Client Driver (ICD)," bypassing "trampoline functions."

Additional considerations

There are also some additions to D3D12 Video. The article announcing them is: Agility SDK 1.716.0-preview: New D3D12 Video Encode Features. However, since I don’t have much expertise in D3D12 Video, I won’t describe them here.

Microsoft also released new versions of PIX that support all these new features from day 0! See the announcement article for PIX version 2501.30 and 2501.30-preview.

Queries for the new capabilities added in this update to the Agility SDK (both retail and preview versions) have already been integrated into the D3d12info command-line tool, the D3d12infoGUI tool, and the D3d12infoDB online database of DX12 GPU capabilities. You can contribute to this project by running the GUI tool and submitting your GPU’s capabilities to the database!

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

Global Game Jam 2025 and First Impressions from Godot 29 Jan 12:25 PM (5 months ago)

Last weekend, 24-26 January 2025 I participated in Global Game Jam, and, more specifically, PolyJam 2025 - a site in Warsaw, Poland. In this post I'll share the game we've made (including the full source code) and describe my first impressions of the Godot Engine, which we used for development.

We made a simple 2D pixel-art game with mechanics similar to Overcooked. It was designed for 2 to 4 players in co-op mode, using keyboard and gamepads.

Entry of the game at globalgamejam.org

GitHub repository with the source code

A side note: The theme of GGJ 2025 was "Bubble". Many teams created games about bubbles in water, while others interpreted it more creatively. For example, the game Startup Panic: The Grind Never Stops featured minigames like drawing graphs or typing buzzwords such as "Machine Learning" to convince investors to fund your startup – an obvious bubble 🙂 Our game, on the other hand, focused on taking care of babies and fulfilling their needs so they could grow up successfully. In Polish, the word for "bubbles" is "bąbelki", but it’s also informally used to refer to babies. Deliberately misspelled as "bombelki", it is a wordplay that makes sense and fits the theme in Polish.

My previous game jam was exactly two years ago. Before that jam, I had learned a bit of the Cocos Creator and used it to develop my game, mainly to try something new. I described my impressions in this post: Impressions After Global Game Jam 2023. This time, I took a similar approach and I started learning Godot engine about three weeks before the jam. Having some experience with Unity and Unreal Engine, my first impressions of Godot have been very positive. Despite being an open-source project, it doesn’t have that typical "open-source feeling" of being buggy, unfinished, or inconvenient to use. Quite the opposite! Here are the things I especially like about the engine:

I like that it’s small, lightweight, and easy to set up. All you need to do is download a 55 MB archive, unpack it, and you’re ready to start developing. This is because it’s a portable executable that doesn’t require any installation. The only time you need to download additional files (over 1 GB) is when you’re preparing to create a build for a specific platform.

I also appreciate how simple the core ideas of the engine are:

I’m not sure if this approach is optimal in terms of performance or whether it’s as well-optimized as the full Entity Component System (ECS) that some other engines use. However, I believe a good engine should be designed like this one – with a simple and intuitive interface, while handling performance optimizations seamlessly under the hood.

I also appreciate the idea that the editor is built using the same GUI controls available for game development. This approach provides access to a wide range of advanced controls: not just buttons and labels, but also movable splitters, multi-line text editors, tree views, and more. They can all be skinned with custom colors and textures.

Similarly, files saved by the engine are text files in the INI-like format with sections like [SectionName] and key-value pairs like Name = Value. Unlike binary files, XML, or JSON, these files are very convenient to merge when conflicts arise after two developers modify the same file. The same format is also available and recommended for use in games, such as for saving settings.

Then, there is GDScript - a custom scripting language. While Godot also offers a separate version that supports C# and even has a binding to Rust, GDScript is the native way of implementing game logic. I like it a lot. Some people compare it to Python, but it’s not a fork or extension of Python; it’s a completely separate language. The syntax shares similarities with Python, such as using indentation instead of braces {} to define scopes. However, GDScript includes many features that Python lacks, specifically tailored for convenient and efficient game development.

One such feature is an interesting mix of dynamic and static typing. By default, variables can have dynamic types (referred to as "variant"), but there are ways to define a static type for a variable. In such cases, assigning a value of a different type results in an error – a feature that Python lacks.

var a = 0.1
a = "Text" # OK - dynamic type.
var b: float
b = 0.1
b = "Text" # Error! b must be a number.
var c := 0.1
c = "Text" # Error! c must be a number.

Another great feature is the inclusion of vector types for 2D, 3D, or 4D vectors of floats or integers. These types are both convenient and intuitive to use – they are passed by value (creating an implicit copy) and are mutable, meaning you can modify individual xyzw components. This is something that Python cannot easily replicate: in Python, tuples are immutable, while lists and custom classes are passed by reference. As a result, assigning or passing them as function parameters in Python makes the new variable refer to the original object. In GDScript, on the other hand:

var a := Vector2(1.0, 2.0)
var b := a # Made a copy.
b.x = 3.0  # Can modify a single component.
print(a)   # Prints (1, 2).

I really appreciate the extra language features that are clearly designed for game development. For example, the @export attribute before a variable exposes it to the Inspector as a property of a specific type, making it available for visual editing. The $NodeName syntax allows you to reference other nodes in the scene, and it supports file system-like paths, such as using / to navigate down the hierarchy and .. to go up. For instance, you can write something like $../AudioPlayers/HitAudioPlayer.play().

I also like how easy it is to animate any property of any object using paths like the one shown above. This can be done using a dedicated AnimationPlayer node, which provides a full sequencer experience with a timeline. Alternatively, you can dynamically change properties over time using a temporary Tween object. For example, the following code changes the font color of a label to a transparent color over 0.5 seconds, using a specific easing function selected from the many available options (check out the Godot tweening cheat sheet for more details):

create_tween().tween_property(addition_label.label_settings, ^":font_color", transparent_color, 0.5).set_trans(Tween.TRANS_CUBIC).set_ease(Tween.EASE_IN)

I really appreciate the documentation. All core language features, as well as the classes and functions available in the standard library, seem to be well-documented. The documentation is not only available online but also integrated into the editor (just press F1), allowing you to open documentation tabs alongside your script code tabs.

I also like the debugger. Being able to debug the code I write is incredibly important to me, and Godot delivers a full debugging experience. It allows you to pause the game (automatically pausing when an error occurs), inspect the call stack, view variable values, explore the current scene tree, and more.

That said, I’m sure Godot isn’t perfect. For me, it was just a one-month adventure, so I’ve only described my first impressions. There must be reasons why AAA games aren’t commonly made in this engine. It likely has some rough edges and missing features. I only worked with 2D graphics, but I can see it supports 3D graphics with a Forward+ renderer and PBR materials. While it could potentially be used for 3D projects, I’m certain it’s not as powerful as Unreal Engine in that regard. I also encountered some serious technical issues with the engine during the game jam, but I’ll describe those in separate blog posts to make them easier to find for anyone searching the Internet for a solution.

I also don’t know much about Godot’s performance. The game we made was very simple. If we had thousands of objects on the scene to render and complex logic to calculate every frame, performance would become a critical factor. Doing some work in every object every frame using _process function is surely an anti-pattern and it runs serially on a single thread. However, I can see that GDScript also supports multithreading – another feature that sets it apart from Python.

To summarize, I now believe that Godot is a great engine at least for game jams and fast prototyping.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

Thoughts Beyond C++ 30 Dec 2024 9:37 AM (6 months ago)

Earlier this month, Timothy Lottes published a document on Google Docs called “Fixing the GPU”, where he describes many ideas about how programming compute shaders on the GPU could be improved. It might be an interesting read for those advanced enough to understand it. The document is open for suggestions and comments, and there are few comments there already.

On a different topic, 25 November I attended Code::Dive conference in Wrocław, Poland. It was mostly dedicated to programming in C++ language. I usually attend conferences about game development, so it was an interesting experience for me. Big thanks to Tomasz Łopuszański from Programista magazine for inviting me there! It was great to see Bjarne Stroustrup and Herb Sutter live, among other good speakers. By the way, recordings from the talks are available on YouTube.

Those two events inspired me to write down my thoughts – my personal “wishlist” about programming languages, from the perspective of someone interesting in games and real-time graphics programming. I gathered my opinions about things I like and dislike in C++ and some ideas about how a new, better language could look like. It is less about a specific syntax to propose and more about high-level ideas. You can find it under the following shortened address, but it is really a document at Google Docs. Comments are welcome.

» “Thoughts Beyond C++” «

Of course I am aware of Rust, D, Circle, Carbon, and other programming languages that share the same goal of replacing C++. I just wanted to write down my own thoughts about this topic.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

FP8 data type - all values in a table 24 Sep 2024 12:02 PM (9 months ago)

Floating-point numbers are a great invention. Thanks to dedicating separate bits to the sign, exponent, and mantissa (also called significand), they can represent a wide range of numbers on a limited number of bits - numbers that are positive or negative, very large or very small (close to zero), integer or fractional.

In programming, we typically use double-precision (64b) or single-precision (32b) numbers. These are the data types available in programming languages (like double and float in C/C++) and supported by processors, which can perform calculations on them efficiently. Those of you who deal with graphics programming using graphics APIs like OpenGL, DirectX, or Vulkan, may know that some GPUs also support 16-bit floating-point type, also known as half-float.

Such 16b "half" type obviously has limited precision and range compared to the "single" or "double" version. Because of these limitations, I am reserved in recommending them to use in graphics. I summarized capabilities and limits of these 3 types in a table in my old "Floating-Point Formats Cheatsheet".

Now, as artificial intelligence (AI) / machine learning (ML) is a popular topic, programmers use low precision numbers in this domain. When I learned that floating-point formats based only on 8 bits were proposed, I immediately thought: 256 possible value is little enough that they could be all visualized in a 16x16 table! I developed a script that generates such tables, and so I invite you to take a look at my new article:

"FP8 data type - all values in a table"

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

How to do a good code review - a new article 19 Aug 2024 9:24 AM (10 months ago)

Today I would like to present my new, comprehensive article: "How to do a good code review". I can be helpful to any programmer no matter what programming language they use. Conducting good code reviews is a skill worth mastering. In this article, we will discuss the advantages and disadvantages of this process and explore the types of projects where it is most beneficial. We will consider the best approach to take when reviewing code, how to effectively carry out the process, which aspects of the code to focus on, and finally – how to write comments on the code in a way that benefits the project. The goal is to ensure that the review process serves as an opportunity for fruitful communication among team members rather than a source of conflict.

The article was first published few months ago in Polish in issue 112 (March/April 2024) of the Programista magazine. Now I have a right to show it publicly for free, so I share it in two language versions:

I wasn't active on my blog in the past months because I took some time for vacation, but also because I'm now learning about machine learning. I may be late to the party, but I recognize that machine learning algorithms are useful tools in many applications. As people learn the basics, the often feel an urge to teach others about it. Some good reads authored by game/graphics developers are: "Machine Learning for Game Devs" by Alan Wolfe and "Crash Course in Deep Learning (for Computer Graphics)" by Jakub Boksansky. I don't want to duplicate their work, so I will only blog about it when I have something unique to show.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

Shapes and forms of DX12 root signatures 14 May 2024 10:17 AM (last year)

This article is for you if you are a programmer using Direct3D 12. We will talk about a specific part of the API: root signatures. I will provide a comprehensive description of various formats in which they can be specified, stored, and ways to convert between them. The difficulty of this article is intermediate. You are expected to know at least some basics of D3D12. I think that advanced developers will can also learn something new, as some of the topics shown here are not what we typically use in a day-to-day development with D3D12.

Tools of the trade

I will use C++ as the programming language. Wherever possible, I will also try to use standalone command-line tools instead of writing a custom code. To repeat my experiments demonstrated in this article, you will need two of these:

You don't need to know the command-line syntax of these tools to understand the article. I will describe everything step-by-step.

Warning about DXC: If you also have Vulkan SDK installed, very likely your PATH environmental variable points to "dxc.exe" in that SDK instead of Windows SDK, which can cause problems. To check this, type command: where dxc. If you find Vulkan SDK listed first, make sure you call "dxc.exe" from Windows SDK, e.g. by explicitly specifying full path to the executable file.

Warning about RGA: If you want to repeat command-line experiments presented here, make sure to use Radeon GPU Analyzer in the latest version, at least 2.9.1. In older versions, the commands I present wouldn't work.

Shader compilation

A side note about shader compilation: Native CPU code, like the one we create when compiling our C++ programs, is saved in .exe files. I contains instructions in a common format called x86, which is sent directly to CPU for execution. It works regardless if you have an AMD or Intel processor in your computer, because they comply to the same standard. With programs written for the GPU (which we call shaders), things are different. Every GPU vendor (AMD, Nvidia, Intel) has its own instruction set, necessitating a two-step process for shader compilation:

  1. As graphics programmers, we write shaders in high-level languages like HLSL or GLSL. We then compile them using a shader compiler like "dxc.exe" to a binary format. It is actually an intermediate format common to all GPU vendors, defined by Microsoft for Direct3D (called DXIL) or by Khronos for Vulkan (called SPIR-V). We are encouraged to compile our shaders offline and only ship these compiled binaries to end users.
  2. When our application uses a graphics API (like Direct3D 12, Vulkan) and creates a pipeline state object (PSO), it specifies these shaders as inputs. This intermediate code then goes to the graphics driver, which performs second stage of the compilation - translates it to instructions valid for the specific GPU (also called Instruction Set Architecture - ISA). We typically don't see this assembly code and we never write it directly, although inspecting it can be useful for optimizations. Nvidia's ISA is secret, but AMD and Intel publish documents describing theirs. RGA tool mentioned below can show the AMD ISA.

What is a root signature?

In Direct3D 12, a root signature is a data structure that describes resource bindings used by a pipeline on all the shader stages. Let's see an example. Let's work with file "Shader1.hlsl": a very simple HLSL code that contains 2 entry points: function VsMain for vertex shader and function PsMain for pixel shader:

struct VsInput
{
 float3 pos : POSITION;
 float2 tex_coord : TEXCOORD;
};
struct VsOutput
{
 float4 pos : SV_Position;
 float2 tex_coord : TEXCOORD;
};

struct VsConstants
{
 float4x4 model_view_proj;
};
ConstantBuffer<VsConstants> vs_constant_buffer : register(b4);

VsOutput VsMain(VsInput i)
{
 VsOutput o;
 o.pos = mul(float4(i.pos, 1.0), vs_constant_buffer.model_view_proj);
 o.tex_coord = i.tex_coord;
 return o;
}

Texture2D<float4> color_texture : register(t0);
SamplerState color_sampler : register(s0);

float4 PsMain(VsOutput i) : SV_Target
{
 return color_texture.Sample(color_sampler, i.tex_coord);
}

I assume you already know that a shader is a program executed on a GPU that processes a single vertex or pixel with clearly defined inputs and outputs. To perform the work, it can also reach out to video memory to access additional resources, like buffers and textures. In the code shown above:

A root signature is a data structure that describes what I said above - what resources should be bound to the pipeline at individual shader stages. In this specific example, it will be a constant buffer at register b4, a texture at t0, and a sampler at s0. It can also be shown in form of a table:

Root param index Register Shader stage
0 b4 VS
1 t0 PS
2 s0 PS

I am simplifying things here, because this article is not about teaching you the basics of root signatures. For more information about them, you can check:

To prepare for our experiments, let's compile the shaders shown above using commands:

dxc -T vs_6_0 -E VsMain -Fo Shader1.vs.bin Shader1.hlsl
dxc -T ps_6_0 -E PsMain -Fo Shader1.ps.bin Shader1.hlsl

Note that a single HLSL source file can contain multiple functions (VsMain, PsMain). When we compile it, we need to specify one function as an entry point. For example, the first command compiles "Shader1.hlsl" file using VsMain function as the entry point (-E parameter) treated as a vertex shader in Shader Model 6.0 (-T parameter). Similarly, the second command compiles PsMain function as a pixel shader. Compiled shaders are saved in two separate files: "Shader1.vs.bin" and "Shader1.ps.bin".

#1. Data structure

It is time to show some C++ code. Imagine we have D3D12 already initialized, our compiled shaders loaded from files to memory, and now we want to render something on the screen. I said a root signature is a data structure, and indeed, we can create one by filling in some structures. The main one is D3D12_ROOT_SIGNATURE_DESC. Let's fill in the structures according to the table above.

// There will be 3 root parameters.
D3D12_ROOT_PARAMETER root_params[3] = {};

// Root param 0: CBV at b4, passed as descriptor table, visible to VS.
D3D12_DESCRIPTOR_RANGE vs_constant_buffer_desc_range = {};
vs_constant_buffer_desc_range.RangeType = D3D12_DESCRIPTOR_RANGE_TYPE_CBV;
vs_constant_buffer_desc_range.NumDescriptors = 1;
vs_constant_buffer_desc_range.BaseShaderRegister = 4; // b4

root_params[0].ParameterType = D3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE;
root_params[0].ShaderVisibility = D3D12_SHADER_VISIBILITY_VERTEX;
root_params[0].DescriptorTable.NumDescriptorRanges = 1;
root_params[0].DescriptorTable.pDescriptorRanges = &vs_constant_buffer_desc_range;

// Root param 1: SRV at t0, passed as descriptor table, visible to PS.
D3D12_DESCRIPTOR_RANGE color_texture_desc_range = {};
color_texture_desc_range.RangeType = D3D12_DESCRIPTOR_RANGE_TYPE_SRV;
color_texture_desc_range.NumDescriptors = 1;
color_texture_desc_range.BaseShaderRegister = 0; // t0

root_params[1].ParameterType = D3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE;
root_params[1].ShaderVisibility = D3D12_SHADER_VISIBILITY_PIXEL;
root_params[1].DescriptorTable.NumDescriptorRanges = 1;
root_params[1].DescriptorTable.pDescriptorRanges = &color_texture_desc_range;

// Root param 2: sampler at s0, passed as descriptor table, visible to PS.
D3D12_DESCRIPTOR_RANGE color_sampler_desc_range = {};
color_sampler_desc_range.RangeType = D3D12_DESCRIPTOR_RANGE_TYPE_SAMPLER;
color_sampler_desc_range.NumDescriptors = 1;
color_sampler_desc_range.BaseShaderRegister = 0; // s0

root_params[2].ParameterType = D3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE;
root_params[2].ShaderVisibility = D3D12_SHADER_VISIBILITY_PIXEL;
root_params[2].DescriptorTable.NumDescriptorRanges = 1;
root_params[2].DescriptorTable.pDescriptorRanges = &color_sampler_desc_range;

// The main structure describing the whole root signature.
D3D12_ROOT_SIGNATURE_DESC root_sig_desc = {};
root_sig_desc.NumParameters = 3;
root_sig_desc.pParameters = root_params;
root_sig_desc.Flags = D3D12_ROOT_SIGNATURE_FLAG_ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT;

Variable root_sig_desc of type D3D12_ROOT_SIGNATURE_DESC is our data structure specifying the root signature. Let's call it a root signature representation number #1.

The code may look scary at first, but if you analyze it carefully, I am sure you can recognize the parameters of the 3 resources to bind that we talked about earlier. This code is so complex because a buffer or a texture can be bound in multiple ways, differing in the number of levels of indirection. Describing it is out of scope of this article, but I explained it comprehensively in my old article: Direct3D 12: Long Way to Access Data.

There is also an even more general structure D3D12_VERSIONED_ROOT_SIGNATURE_DESC that allows to use root signatures in versions higher than 1.0, but we won't talk about it in this article to not complicate things.

#2. Serialized root signature

If you also use Vulkan, you may recognize that the equivalent structure is VkDescriptorSetLayoutCreateInfo. From it, you can call function vkCreateDescriptorSetLayout to create an object of type VkDescriptorSetLayout, and then VkPipelineLayout, which is roughly equivalent to the DX12 root signature.

In DX12, however, this is not that simple. There is an intermediate step we need to go through. Microsoft requires converting this data structure to a special binary format first. They call it "serialization". We can do it using function D3D12SerializeRootSignature, like this:

ComPtr<ID3DBlob> root_sig_blob, error_blob;
HRESULT hr = D3D12SerializeRootSignature(&root_sig_desc, D3D_ROOT_SIGNATURE_VERSION_1_0,
    &root_sig_blob, &error_blob);
// Check hr...
const void* root_sig_data = root_sig_blob->GetBufferPointer();
size_t root_sig_data_size = root_sig_blob->GetBufferSize();

An object of type ID3DBlob is just a simple container that owns a memory buffer with binary data of some size. ("BLOB" stands for "Binary Large OBject".) This buffer we created here is our representation number #2 of the root signature.

If we save it to a file, we can see that our example root signature has 188 bytes. It starts from characters "DXBC", just like the shaders we previously complied with dxc tool, which indicates root signatures use the same container format as compiled shaders. I am not sure this binary format is documented somewhere. It should be possible to decipher anyway, as DirectX Shader Compiler (dxc) is open source. I never needed to work with this binary format directly, and we won't do it here either.

I guess Microsoft's intention was to encourage developers to prepare root signatures beforehand and store them in files, just like compiled shaders, so they are not assembled in runtime on every application launch. Is it worth it, though? Shader compilation is slow for sure, but would loading a file be faster than filling in the data structure and serializing it with D3D12SerializeRootSignature? I doubt it, unless Microsoft implemented this function extremely inefficiently. Very likely, this additional level of indirection is just an extra unnecessary complication that Microsoft prepared for us. That wouldn't be the only case they did it, as you can read in my old article Do RTV and DSV descriptors make any sense?

Note that if a serialized root signature is saved to a file and loaded later, it doesn't need to be stored in a ID3DBlob object. All we need is a pointer to the data and the size (number of bytes). The data can be stored in a byte array like char* arr = new char[size], or std::vector<char> (I like to use this one), or any other form.

#3. Root signature object

With this extra level of indirection done, we can use this serialized binary root signature to create an object of type ID3D12RootSignature. This is an opaque object that represents the root signature in memory, ready to be used by D3D12. Let's call it root signature representation number #3. The code for creating it is very simple:

ComPtr<ID3D12RootSignature> root_sig_obj;
hr = g_Device->CreateRootSignature(0, root_sig_data, root_sig_data_size,
    IID_PPV_ARGS(&root_sig_obj));
// Check hr...

#4. Pipeline state object

Having this root signature object, we can pass it as part of the D3D12_GRAPHICS_PIPELINE_STATE_DESC and use it to create a ID3D12PipelineState - a Pipeline State Object (PSO) that can be used for rendering.

D3D12_GRAPHICS_PIPELINE_STATE_DESC pso_desc = {};
pso_desc.pRootSignature = root_sig_obj.Get(); // Root signature!
pso_desc.VS.pShaderBytecode = vs_data; // Vertex shader from "Shader1.vs.bin".
pso_desc.VS.BytecodeLength = vs_data_size;
pso_desc.PS.pShaderBytecode = ps_data; // Pixel shader from "Shader1.ps.bin".
pso_desc.PS.BytecodeLength = ps_data_size;
pso_desc.RasterizerState.FillMode = D3D12_FILL_MODE_SOLID;
pso_desc.RasterizerState.CullMode = D3D12_CULL_MODE_NONE;
pso_desc.InputLayout.NumElements = _countof(input_elems);
pso_desc.InputLayout.pInputElementDescs = input_elems;
pso_desc.PrimitiveTopologyType = D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE;
pso_desc.NumRenderTargets = 1;
pso_desc.RTVFormats[0] = DXGI_FORMAT_R8G8B8A8_UNORM_SRGB;
pso_desc.SampleDesc.Count = 1;

ComPtr<ID3D12PipelineState> pso;
hr = g_Device->CreateGraphicsPipelineState(&pso_desc, IID_PPV_ARGS(&pso));
// Check hr...

If we have the serialized root signature saved to a file "RootSigFromCode.bin", we can also play around with assembling a PSO without any coding, but using Radeon GPU Analyzer instead. Try the following command:

rga -s dx12 -c gfx1100 --all-hlsl Shader1.hlsl --all-model 6_0 --vs-entry VsMain --ps-entry PsMain --rs-bin RootSigFromCode.bin --offline --isa AMD_ISA

The meaning of individual parameters is:

When succeeded, this command creates 2 text files with the disassembly of the vertex and pixel shader: "gfx1100_AMD_ISA_vert.isa", "gfx1100_AMD_ISA_pixel.isa". The pixel shader looks like this:

; D3D12 Shader Hash 0x46f0bbb15b95e2453380ad3c9765222a
; API PSO Hash 0xd96cc024d8cb165d
; Driver Internal Pipeline Hash 0xf3a0f055053cc59f
; -------- Disassembly --------------------
shader main
asic(GFX11)
type(PS)
sgpr_count(14)
vgpr_count(8)
wave_size(64)
                                                        // s_ps_state in s0
s_version     UC_VERSION_GFX11 | UC_VERSION_W64_BIT   // 000000000000: B0802006
s_set_inst_prefetch_distance  0x0003                  // 000000000004: BF840003
s_mov_b32     m0, s4                                  // 000000000008: BEFD0004
s_mov_b64     s[12:13], exec                          // 00000000000C: BE8C017E
s_wqm_b64     exec, exec                              // 000000000010: BEFE1D7E
s_getpc_b64   s[0:1]                                  // 000000000014: BE804780
s_waitcnt_depctr  depctr_vm_vsrc(0) & depctr_va_vdst(0) // 000000000018: BF880F83
lds_param_load  v2, attr0.x wait_vdst:0               // 00000000001C: CE000002
lds_param_load  v3, attr0.y wait_vdst:0               // 000000000020: CE000103
s_mov_b32     s4, s2                                  // 000000000024: BE840002
s_mov_b32     s5, s1                                  // 000000000028: BE850001
s_mov_b32     s0, s3                                  // 00000000002C: BE800003
s_load_b256   s[4:11], s[4:5], null                   // 000000000030: F40C0102 F8000000
s_load_b128   s[0:3], s[0:1], null                    // 000000000038: F4080000 F8000000
v_interp_p10_f32  v4, v2, v0, v2 wait_exp:1           // 000000000040: CD000104 040A0102
v_interp_p10_f32  v0, v3, v0, v3 wait_exp:0           // 000000000048: CD000000 040E0103
                                                    s_delay_alu  instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2) // 000000000050: BF870112
v_interp_p2_f32  v2, v2, v1, v4 wait_exp:7            // 000000000054: CD010702 04120302
v_interp_p2_f32  v0, v3, v1, v0 wait_exp:7            // 00000000005C: CD010700 04020303
s_and_b64     exec, exec, s[12:13]                    // 000000000064: 8BFE0C7E
s_waitcnt     lgkmcnt(0)                              // 000000000068: BF89FC07
image_sample  v[0:3], [v2,v0], s[4:11], s[0:3] dmask:0xf dim:SQ_RSRC_IMG_2D // 00000000006C: F06C0F05 00010002 00000000
s_waitcnt     vmcnt(0)                                // 000000000078: BF8903F7
v_cvt_pk_rtz_f16_f32  v0, v0, v1                      // 00000000007C: 5E000300
v_cvt_pk_rtz_f16_f32  v2, v2, v3                      // 000000000080: 5E040702
s_mov_b64     exec, s[12:13]                          // 000000000084: BEFE010C
exp           mrt0, v0, v2, off, off done             // 000000000088: F8000803 00000200
s_endpgm                                              // 000000000090: BFB00000

We will not analyze it here in details, but it is worth nothing that we have 3 memory loading instructions here, which correspond to the operations we do in the pixel shader: s_load_b256 and s_load_b128 load the descriptors of the sampler s0 and the texture t0, which are then both used by image_sample instruction to perform the texture sampling.

The diagram

We talked about many different formats of root signatures already, and there will be more. It is time to show a diagram that gathers them all and presents transitions between them. This is the central part of our article that we will refer to. Note that we already talked about representations number #1, #2, #3, #4, which you can find on the diagram.

Deserializing root signature

There is a way to convert a serialized root signature blob back to data structures. Microsoft offers function D3D12CreateRootSignatureDeserializer for this purpose. It creates an object of type ID3D12RootSignatureDeserializer, which owns structure D3D12_ROOT_SIGNATURE_DESC and other structures referred by it. Example code:

ComPtr<ID3D12RootSignatureDeserializer> root_sig_deserializer;
hr = D3D12CreateRootSignatureDeserializer(root_sig_data, root_sig_data_size,
    IID_PPV_ARGS(&root_sig_deserializer));
// Check hr...
const D3D12_ROOT_SIGNATURE_DESC* root_sig_desc = root_sig_deserializer->GetRootSignatureDesc();
// Inspect decoded root_sig_desc... 

When using higher root signature versions, you need to use function D3D12CreateVersionedRootSignatureDeserializer and interface ID3D12VersionedRootSignatureDeserializer instead.

#5. Text format

We are only in the middle of this article. This is because Microsoft prepared one more representation of the root signature - a text representation. For it, they defined a simple domain-specific language, which is fully documented on page Specifying Root Signatures in HLSL. As an example, our simple root signature presented in this article would look like this:

RootFlags(ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT),
DescriptorTable(CBV(b4), visibility=SHADER_VISIBILITY_VERTEX),
DescriptorTable(SRV(t0), visibility=SHADER_VISIBILITY_PIXEL),
DescriptorTable(Sampler(s0), visibility=SHADER_VISIBILITY_PIXEL)

I am sure you can recognize the same parameters we passed when we assembled a data structure describing this root signature in our C++ code. The text representation is clearly more concise and readable.

However, this is not exactly the way we specify root signatures in text format. It will go to our HLSL shader source file, but before we can put it there, we must pack it to a string defined using a #define macro, so it takes the form of:

#define MyRootSig "RootFlags(ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT), " \
    "DescriptorTable(CBV(b4), visibility=SHADER_VISIBILITY_VERTEX), " \
    "DescriptorTable(SRV(t0), visibility=SHADER_VISIBILITY_PIXEL), " \
    "DescriptorTable(Sampler(s0), visibility=SHADER_VISIBILITY_PIXEL)"

This is our root signature representation number #5 on the diagram. It looks somewhat clumsy, but this is the way we need to format it. The backslash symbol "\" at the end of each line except the last one is necessary to continue the #define macro in the next line. This is feature of the HLSL preprocessor, same as in C and C++ preprocessor.

We could simplify this macro by putting the whole string with our root signature in a single line, but I am not convinced it would make it more readable. Besides this, formatting root signatures like I shown above is the way recommended by Microsoft in their documentation.

If you think about converting a root signature back to the text representation, there is no ready function for that, but you can find such code in the RGA source, file "source/radeon_gpu_analyzer_backend/autogen/be_rootsignature_dx12.cpp", class RootSignatureUtil. I marked it as an arrow leading from #1 to #5 on the diagram, described as "Custom code".

#6. Attaching root signature to shaders

Having our root signature defined in the text format, packed into a #define macro, and included in our HLSL shader source file is a first step. Just like a single HLSL file can contain multiple entry points to various shaders, it also contain multiple root signature definitions, so we need to specify the one to use. To do this, we can attach a root signature to the function used as the shader entry point, using [RootSignature()] attribute with the name of our macro inside.

Here is the full contents of a new shader file "Shader2.hlsl" with root signature embedded:

struct VsInput
{
 float3 pos : POSITION;
float2 tex_coord : TEXCOORD;
};
struct VsOutput
{
float4 pos : SV_Position;
float2 tex_coord : TEXCOORD;
};

struct VsConstants
{
float4x4 model_view_proj;
};

#define MyRootSig "RootFlags(ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT), " \
"DescriptorTable(CBV(b4), visibility=SHADER_VISIBILITY_VERTEX), " \
"DescriptorTable(SRV(t0), visibility=SHADER_VISIBILITY_PIXEL), " \
"DescriptorTable(Sampler(s0), visibility=SHADER_VISIBILITY_PIXEL)"

ConstantBuffer<VsConstants> vs_constant_buffer : register(b4);

[RootSignature(MyRootSig)]
VsOutput VsMain(VsInput i)
{
VsOutput o;
o.pos = mul(float4(i.pos, 1.0), vs_constant_buffer.model_view_proj);
o.tex_coord = i.tex_coord;
return o;
}

Texture2D<float4> color_texture : register(t0);
SamplerState color_sampler : register(s0);

[RootSignature(MyRootSig)]
float4 PsMain(VsOutput i) : SV_Target
{
return color_texture.Sample(color_sampler, i.tex_coord);
}

If you compile VS and PS from this file using commands:

dxc -T vs_6_0 -E VsMain -Fo Shader2.vs.bin Shader2.hlsl
dxc -T ps_6_0 -E PsMain -Fo Shader2.ps.bin Shader2.hlsl

New files "Shader2.vs.bin" and "Shader2.ps.bin" will have size greater than respective "Shader1.vs.bin" and "Shader1.ps.bin" we created earlier by exactly 168 bytes, which is similar to the size of our serialized root signature. This indicates that our root signature is bundled together with the compiled shader code. This is the representation number #6 on the diagram.

Shaders compiled with a root signature embedded can then be used in the C++/D3D12 code for creating a PSO without a need to specify the root signature explicitly. Variable D3D12_GRAPHICS_PIPELINE_STATE_DESC::pRootSignature can be set to null. Our PSO creation code can now look like this:

D3D12_GRAPHICS_PIPELINE_STATE_DESC pso_desc = {};
pso_desc.pRootSignature = NULL; // Sic!
pso_desc.VS.pShaderBytecode = vs.data(); // Vertex shader from "Shader2.vs.bin".
pso_desc.VS.BytecodeLength = vs.size();
pso_desc.PS.pShaderBytecode = ps.data(); // Pixel shader from "Shader2.ps.bin".
pso_desc.PS.BytecodeLength = ps.size();
pso_desc.RasterizerState.FillMode = D3D12_FILL_MODE_SOLID;
pso_desc.RasterizerState.CullMode = D3D12_CULL_MODE_NONE;
pso_desc.InputLayout.NumElements = _countof(input_elems);
pso_desc.InputLayout.pInputElementDescs = input_elems;
pso_desc.PrimitiveTopologyType = D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE;
pso_desc.NumRenderTargets = 1;
pso_desc.RTVFormats[0] = DXGI_FORMAT_R8G8B8A8_UNORM_SRGB;
pso_desc.SampleDesc.Count = 1;

ComPtr<ID3D12PipelineState> pso;
hr = g_Device->CreateGraphicsPipelineState(&pso_desc, IID_PPV_ARGS(&pso));
// Check hr...

Similarly, we can use RGA to compile those shaders, assemble the PSO, and output AMD GPU assembly:

rga -s dx12 -c gfx1100 --all-hlsl Shader2.hlsl --all-model 6_0 --vs-entry VsMain --ps-entry PsMain --offline --isa AMD_ISA

Because we can use multiple shaders at different shaders stages (vertex shader, pixel shader, possibly also hull, domain, geometry, amplification, mesh shader...) when creating a PSO, and we attached a [RootSignature()] attribute to all of them, you may ask what happens if some shader stages don't specify a root signature or specify a different one. Here are the rules:

D3D12 ERROR: ID3D12Device::CreateGraphicsPipelineState: Root Signature doesn't match Pixel Shader: Root signature of Vertex Shader doesn't match the root signature of Pixel Shader

Compiling standalone root signature from text

When we have a root signature encoded in the text format, we can use it in two ways. One is attaching it to a shader entry point function using the [RootSignature()] attribute, like we've seen in the previous section. The second one is compiling root signature alone. For this, we need to use dedicated command-line arguments for "dxc.exe" and specify the name of our macro.

Let's create a separate HLSL file with only the root signature, called "RootSig.hlsl":

#define MyRootSig "RootFlags(ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT), " \
    "DescriptorTable(CBV(b4), visibility=SHADER_VISIBILITY_VERTEX), " \
    "DescriptorTable(SRV(t0), visibility=SHADER_VISIBILITY_PIXEL), " \
    "DescriptorTable(Sampler(s0), visibility=SHADER_VISIBILITY_PIXEL)"

Let's now use the following command to compile it:

dxc -T rootsig_1_0 -E MyRootSig -Fo RootSigFromHlsl.bin RootSig.hlsl

The output of this command is file "RootSigFromHlsl.bin", which is 188 bytes - exactly the same size as the file "RootSigFromCode.bin" we created earlier by filling in data structures in C++ and serializing them. Thus, we can say we just learned the way to create serialized root signature binary from the text representation. We can now connect two existing blocks in our diagram with the arrow leading from #5 to #2.

Note you can use our previous file "Shader2.hlsl" instead of "RootSig.hlsl" with the same effect. That file contains shader functions, but they just get ignored, as we only use the MyRootSig macro.

Repacking root signatures

Because there are so many ways of storing root signatures, Microsoft provided a possibility to convert between them using dedicated command-line parameters of DXC:

We can specify a compiled shader with a root signature embedded and extract only the root signature blob from it (connection from #6 to #2 in our diagram):

dxc -dumpbin -extractrootsignature -Fo RootSigExtracted.bin Shader2.vs.bin

The -dumpbin parameter means that the input file (specified as the positional argument at the end) is a compiled binary, not a text file with HLSL source.

We can transform a compiled shader file into one with the embedded root signature removed. This path is not shown in the diagram. The output file "ShaderNoRootSig.vs.bin" has the same size (4547 B) as "Shader1.vs.bin" that we compiled previously without a root signature.

dxc -dumpbin -Qstrip_rootsignature -Fo ShaderNoRootSig.vs.bin Shader2.vs.bin

We can also join two binary files: one with compiled shader, one with root signature blob, and create a file with the shader and the root signature embedded in it. This is shown on the diagram as a path from #2 to #6.

dxc -dumpbin -setrootsignature RootSigFromCode.bin -Fo ShaderWithRootSigAdded.vs.bin Shader1.vs.bin

Warning about DXC parameters

I've shown all these commands here, because it is very important to get them right. Microsoft did a terrible job here offering many options in the command-line syntax that can be misleading. For example:

Moreover, if you do it the wrong way, DXC prints some cryptic, unrelated error message or prints nothing, does nothing, and exits with process exit code 0. Not very helpful!

Usage of RGA

Radeon GPU Analyzer utilizes DXC internally, so it can be used to compile shaders from HLSL source code all the way to the pipeline state object (both stages of the shader compilation). That PSO is created internally just to extract the final ISA assembly from it. Here is a command we've seen before:

rga -s dx12 -c gfx1100 --all-hlsl Shader2.hlsl --all-model 6_0 --vs-entry VsMain --ps-entry PsMain --offline --isa AMD_ISA

However, RGA supports many more command-line options. Input shaders can be specified in HLSL format using --all-hlsl FILE or per-stage --vs FILE, --ps FILE etc. with mandatory entry point function names passed as --vs-entry NAME, --ps-entry NAME, etc. Alternatively, we can specify compiled binary shaders as input. Then, the input is the intermediate shader representation, while RGA performs only the second stage of the shader compilation.

rga -s dx12 -c gfx1100 --vs-blob Shader2.vs.bin --ps-blob Shader2.ps.bin --offline --isa AMD_ISA

Similarly, a root signature can be specified in one of many ways:

1. Embedded in shaders, like in the 2 commands shown above, as our "Shader2" was compiled with the root signature.

2. From a separate HLSL file and specific #define macro:

rga -s dx12 -c gfx1100 --all-hlsl Shader1.hlsl --all-model 6_0 --vs-entry VsMain --ps-entry PsMain --rs-hlsl RootSig.hlsl --rs-macro MyRootSig --offline --isa AMD_ISA

3. From a binary file with the serialized root signature:

rga -s dx12 -c gfx1100 --all-hlsl Shader1.hlsl --all-model 6_0 --vs-entry VsMain --ps-entry PsMain --rs-bin RootSigFromCode.bin --offline --isa AMD_ISA

4. None at all. Then, a root signature matching the compiled shaders gets auto-generated. This is a new feature of RGA 2.9.1.

rga -s dx12 -c gfx1100 --all-hlsl Shader1.hlsl --all-model 6_0 --vs-entry VsMain --ps-entry PsMain --offline --isa AMD_ISA

Using DXC as a library

Up to this point, our discussion has centered around a C++ code specifically tailored for loading compiled shaders and establishing a PSO, typically for D3D12 rendering purposes, such as game development or other graphics applications. The compilation of shaders was exclusively carried out utilizing standalone command-line tools: DXC and RGA.

However, DXC shader compiler can also be used in form of a C++ library. Everything we can do with "dxc.exe" we can also do programmatically from our code by using equivalent library. To use the library:

  1. LoadLibrary "dxil.dll" and "dxcompiler.dll".
  2. GetProcAddess of only one function: DxcCreateInstance, as everything starts from it.

I won't describe the library in details here. It is out of scope of this article, and the article is already very long. However, I would like to point to some interesting features:

1. Certainly, we can compile a shader. To do it, use function IDxcCompiler3::Compile. Interestingly, we don't fill in data structures with specific parameters for the compiler, like we would normally expect from a programmatic API. Instead, we are asked to format a list of strings with parameters, same as we would pass to the command-line DXC, e.g.:

const wchar_t* arguments[] = {
    L"-T ps_6_0",
    L"-E PsMain",
    // Etc...
};

Because we talk about root signatures here, it is worth noting that we can check if the compiled shader has one embedded. Calling IDxcResult::GetOutput with parameter DXC_OUT_OBJECT returns the compiled shader blob, DXC_OUT_ERRORS returns a string with errors and warnings, while DXC_OUT_ROOR_SIGNATURE tells us that the shader had a root signature attached.

2. The DXC library offers an interesting feature called reflection. It allows inspecting an existing compiled shader binary for various parameters, including inputs, outputs, and resource bindings. Inputs and outputs are vertex attributes or (in case of a pixel shader output) render targets written. The list of resource bindings is the most interesting for us here, because it allows to generate a root signature compatible with the shader.

Certainly, there isn't just one possible root signature compatible with a given shader, so a generated one may not align with your requirements. For example, a constant buffer b4 can be bound to a shader in one of 3 ways: as a 32-bit root constant, as a root CBV, or a descriptor table containing a CBV. Similarly, multiple subsequent slots like (b2, b3, b4) can be defined in a root signature as separate root parameters or as a single parameter with a descriptor table carrying numDescriptors = 3. However, reflection can still be useful sometimes if you develop your own engine, and you want automate resource binding based on the shader code.

To use this feature, call IDxcUtils::CreateReflection, pass the shader binary, and retrieve a new object of type ID3D12ShaderReflection. You can then query it for parameters, like ID3D12ShaderReflection::GetResourceBindingDesc. You can see an example of shader reflection used to generate the root signature in RGA source code - see file "source/radeon_gpu_analyzer_backend/autogen/be_reflection_dx12.cpp" and other related places.

3. The DXC library also provides tools to manipulate the binary container format, enabling tasks such as extracting, adding, or removing a root signature from a shader. To use it, search the library header for interface IDxcContainerReflection or a simpler function IDxcUtils::GetDxilContainerPart, as well as interface IDxcContainerBuilder. For example, you can check if a shader binary contains a root signature embedded using following code:

void* part_data = NULL; uint32_t part_size = 0;
HRESULT hr = dxc_utils->GetDxilContainerPart(&shader_binary,
    DXC_PART_ROOT_SIGNATURE, &part_data, &part_size);
bool has_root_signature = SUCCEEDED(hr);

How many root signatures to use?

As for the policy regarding the usage of root signatures, do they need to match our shaders exactly? No, but the following rules apply:

You may ask: Can I just create one big all-encompassing root signature that defines all the resource bindings I may ever need and use it for all my shaders? Theoretically you could, but there are two main arguments against doing this.

  1. Root signatures cannot be arbitrary big. There is a limit on the number of root parameters, calculated in units of some virtual DWORDs. Whether these are real 32-bit DWORDs in the D3D12 implementation provided by the graphics driver, it is not important here. What matters is that every root constant costs 1 DWORD per 4 B of data, every root descriptor costs 2 DWORDs, and every descriptor table costs 1 DWORD, while the limit is 64 DWORDs in total. You can find it documented here: "Root Signature Limits". A sample code for calculating this cost can be found in the RGA source: file "source/radeon_gpu_analyzer_backend/autogen/be_rootsignature_dx12.cpp", function CalculateRootSignatureCost. This limit can be overcame, though, by defining descriptor tables with whole ranges of descriptors in them as a single root parameter, e.g. DescriptorTable(CBV(b0, numDescriptors=10)).
  2. "AMD RDNA Performance Guide" in section "Descriptors" recommends trying to minimize the size of root signatures/descriptor set layouts, to avoid spilling user data to memory. This would likely mean an extra level of indirection in the internal implementation causing some performance overhead.

On the other hand, switching root signature for every shader and rebinding all the root arguments can have its overhead too. If you look at Cyberpunk 2077, for example, you can see that they just use one big root signature for all graphics shaders and the second one for all compute shaders in the game. I am not disclosing any secret here. If you own the game on Steam or GOG, you can capture a frame using PIX on Windows and see it by yourself. If they could do it in a AAA game that looks and runs so well, do we really need to optimize better? 😀

Update 2024-05-15: In the comments below my post on Mastodon, others disclosed that Frostbite engine by DICE, as well as the engine developed by Digital Extremes take the same approach.

Summary

This article offers a comprehensive description of various formats of root signatures in Direct3D 12. We've explored some C++ code along with the utilization of command-line tools such as the DXC shader compiler from Microsoft and the Radeon GPU Analyzer (RGA) from AMD. A root signature can be authored or stored as:

We've learned how to use these representations and how to convert between them.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?