In January 2025, I participated in PolyJam - a Global Game Jam site in Warsaw, Poland. I shared my experiences in a blog post: Global Game Jam 2025 and First Impressions from Godot. This post focuses on a specific issue I encountered during the jam: Godot 4.3 frequently hanging on my ASUS TUF Gaming laptop. If you're in a hurry, you can SCROLL DOWN to skip straight to the solution that worked for me.
The laptop I used was an ASUS TUF Gaming FX505DY. Interestingly, it has two different AMD GPUs onboard - a detail that becomes important later:
The game we developed wasn’t particularly complex or demanding - it was a 2D pixel art project. Yet, the Godot editor kept freezing frequently, even without running the game. The hangs occurred at random moments, often while simply navigating the editor UI. Each time, I had to force-close and restart the process. I was using Godot 4.3 Stable at the time.
I needed a quick solution. My first step was verifying that both Godot 4.3 and my AMD graphics drivers were up to date (they were). Then, I launched Godot via "Godot_v4.3-stable_win64_console.exe", which displays a console window with debug logs alongside the editor. That’s when I noticed an error message appearing every time the hang occurred:
ERROR: Condition "err != VK_SUCCESS" is true. Returning: FAILED
at: command_queue_execute_and_present (drivers/vulkan/rendering_device_driver_vulkan.cpp:2266)
This suggested the issue might be GPU-related, specifically involving the Vulkan API. However, I wasn’t entirely sure - the same error message occasionally appeared even when the engine wasn’t hanging, so it wasn’t a definitive indicator.
To investigate further, I decided to enable the Vulkan validation layer, hoping it would reveal more detailed error messages about what the engine was doing wrong. Having Vulkan SDK installed in my system, I launched the Vulkan Configurator app that comes with it ("Bin\vkconfig.exe"), I selected Vulkan Layers Management = Layers Controlled by the Vulkan Configurator, and selected Validation.
Unfortunately, when I launched Godot again, no new error messages appeared in the console. (Looking back, I’m not even sure if that console window actually captured the process’s standard output.) For a brief moment, I thought enabling the Vulkan validation layer had fixed the hangs - but they soon returned. Maybe they were less frequent, or perhaps it was just wishful thinking.
Next, I considered forcing Godot to use the integrated GPU (Radeon Vega 8) instead of the more powerful discrete GPU (RX 560X). To test this, I adjusted Windows power settings to prioritize power saving over maximum performance. However, this didn’t work - Godot still reported using the Radeon RX 560X.
THE SOLUTION: What finally worked was forcing Godot to use the integrated GPU by launching it with a specific command-line parameter. Instead of running the editor normally, I used:
Godot_v4.3-stable_win64_console.exe --verbose --gpu-index 1
This made Godot use the second GPU (index 1) - the slower Radeon Vega 8 - instead of the default RX 560X. The result? No more hangs. While the integrated GPU is less powerful, it was more than enough for our 2D pixel art game.
I am not sure why it helped, considering that both GPUs on my laptop are from AMD and they are supported by one driver. I also didn't check whether the new Godot 4.4 that was released since then has this bug fixed. I am just leaving this story here, in case someone stumbles upon the same problem in the future.
On January 30th 2025 Microsoft released a new version of DirectX 12 Agility SDK: 1.615.0 (D3D12SDKVersion = 615) and 1.716.0-preview (D3D12SDKVersion = 716). The main article announcing this release is: AgilitySDK 1.716.0-preview and 1.615-retail. Files are available to download from DirectX 12 Agility SDK Downloads, as always, in form of .nupkg files (which are really ZIP archives).
I can see several interesting additions in the new SDK, so in this article I am going to describe them and delve into details of some of them. This way, I aim to consolidate information that is scattered across multiple Microsoft pages and provide links to all of them. The article is intended for advanced programmers who use DirectX 12 and are interested in the latest developments of the API and its surrounding ecosystem, including features that are currently in preview mode and will be included in future retail versions.
This is the only feature added to both the retail and preview versions of the new SDK. The article announcing it is: Agility SDK 1.716.0-preview & 1.615-retail: Shader hash bypass. A more extensive article explaining this feature is available here: Validator Hashing.
The problem:
If you use DirectX 12, you most likely know that shaders are compiled in two stages. First, the source code in HLSL (High-Level Shading Language) is compiled using the Microsoft DXC compiler into an intermediate binary code. This often happens offline when the application is built. The intermediate form is commonly referred to as DXBC (as the container format and the first 4 bytes of the file) or DXIL (as the intermediate language of the shader code, somewhat similar to SPIR-V or LLVM IR). This intermediate code is then passed to a DirectX 12 function that creates a Pipeline State Object (PSO), such as ID3D12Device::CreateGraphicsPipelineState
. During this step, the second stage of compilation occurs within the graphics driver, converting the intermediate code into machine code (ISA) specific to the GPU. I described this process in more detail in my article Shapes and forms of DX12 root signatures, specifically in the "Shader Compilation" section.
What you may not know is that the intermediate compiled shader blob is digitally signed by the DXC compiler using a hash embedded within it. This hash is then validated during PSO creation, and the function fails if the hash doesn’t match. Moreover, despite the DXC compiler being open source and hosted on github.com/microsoft/DirectXShaderCompiler, the signing process is handled by a separate library, "dxil.dll", which is not open source.
If you only use the DXC compiler provided by Microsoft, you may never encounter any issues with this. I first noticed this problem when I accidentally used "dxc.exe" from the Vulkan SDK instead of the Windows SDK to compile my shaders. This happened because the Vulkan SDK appeared first in my "PATH" environment variable. My shaders compiled successfully, but since the closed-source "dxil.dll" library is not distributed with the Vulkan SDK, they were not signed. As a result, I couldn’t create PSO objects from them. As the ecosystem of graphics APIs continues to grow, this could also become a problem for libraries and tools that aim to generate DXIL code directly, bypassing the HLSL source code and DXC compiler. Some developers have even reverse-engineered the signing algorithm to overcome this obstacle, as described by Stephen Gutekanst / Hexops in this article: Building the DirectX shader compiler better than Microsoft?.
The solution:
With this new SDK release, Microsoft has made two significant changes:
01010101010101010101010101010101
for "BYPASS", 02020202020202020202020202020202
for "PREVIEW_BYPASS".Technologies that generate DXIL shader code can now use either of these methods to produce a valid shader.
The capability to check whether this new feature is supported is exposed through D3D12_FEATURE_DATA_BYTECODE_BYPASS_HASH_SUPPORTED::Supported
. However, it appears to be implemented entirely at the level of the Microsoft DirectX runtime rather than the graphics driver, as it returns TRUE
on every system I tested.
One caveat is that "dxil.dll" not only signs the shader but also performs some form of validation. Microsoft didn’t want to leave developers without the ability to validate their shaders when using the bypass hash. To address this, they have now integrated the validation code into the D3D Debug Layer, allowing shaders to be validated as they are passed to the PSO creation function.
This feature is only available in the preview SDK version. The article announcing it is: Agility SDK 1.716.0-preview: Tight Alignment of Resources. There is also specification: Direct3D 12 Tight Placed Resource Alignment, but it very low level, describing even the interface for the graphics driver.
The problem:
This one is particularly interesting to me, as I develop the D3D12 Memory Allocator and Vulkan Memory Allocator libraries, which focus on GPU memory management. In DirectX 12, buffers require alignment to 64 KB, which can be problematic and lead to significant memory waste when creating a large number of very small buffers. I previously discussed this issue in my older article: Secrets of Direct3D 12: Resource Alignment.
The solution:
This is one of many features that the Vulkan API got right, and Microsoft is now aligning DirectX 12 in the same direction. In Vulkan, developers need to query the required size and alignment of each resource using functions like vkGetBufferMemoryRequirements
, and the driver can return a small alignment if supported. For more details, you can refer to my older article: Differences in memory management between Direct3D 12 and Vulkan. Microsoft is now finally allowing buffers in DirectX 12 to support smaller alignments by introducing the following new API elements:
D3D12_FEATURE_DATA_TIGHT_ALIGNMENT::SupportTier
.D3D12_RESOURCE_FLAG_USE_TIGHT_ALIGNMENT
, to the description of the resource you are about to create.ID3D12Device::GetResourceAllocationInfo
, the function may now return an alignment smaller than 64 KB. As Microsoft states: "Placed buffers can now be aligned as tightly as 8 B (max of 256 B). Committed buffers have also had alignment restrictions reduced to 4 KiB."I have already implemented support for this new feature in the D3D12MA library. Since this is a preview feature, I’ve done so on a separate branch for now. You can find it here: D3D12MemoryAllocator branch resource-tight-alignment.
This feature requires support from the graphics driver, and as of today, no drivers support it yet. The announcement article mentions that AMD plans to release a supporting driver in early February, while other GPU vendors are also interested and will support it in an "upcoming driver" or at some indefinite point in the future - similar to other preview features described below.
However, testing is possible right now using the software (CPU) implementation of DirectX 12 called WARP. Here’s how you can set it up:
Microsoft has also shared a sample application to test this feature: DirectX-Graphics-Samples - HelloTightAlignment.
This feature is only available in the preview SDK version. The article announcing it is: Agility SDK 1.716.0-preview: Application Specific Driver State. It is intended for capture-replay tools rather than general usage in applications.
The problem:
A graphics API like Direct3D or Vulkan serves as a standardized contract between a game, game engine, or other graphics application, and the graphics driver. In an ideal world, every application that correctly uses the API would work seamlessly with any driver that correctly implements the API. However, we know that software is far from perfect and often contains bugs, which can exist on either side of the API: in the application or in the graphics driver.
It’s no secret that graphics drivers often detect specific popular or problematic games and applications to apply tailored settings to them. These settings might include tweaks to the DirectX 12 driver or the shader compiler, for example. Such adjustments can improve performance in cases where default heuristics are not optimal for a particular application or shader, or they can provide workarounds for known bugs.
For the driver to detect a specific application, it would be helpful to pass some form of application identification. Vulkan includes this functionality in its core API through the VkApplicationInfo
structure, where developers can provide the application name, engine name, application version, and engine version. DirectX 12, however, lacks this feature. The AMD GPU Services (AGS) library adds this capability with the AGSDX12ExtensionParams
structure, but this is specific to AMD and not universally adopted by all applications.
Because of this limitation, DirectX 12 drivers must rely on detecting applications solely by their .exe file name. This can cause issues with capture-replay tools such as PIX on Windows, RenderDoc or GFXReconstruct. These tools attempt to replay the same sequence of DirectX 12 calls but use a different executable name, which means driver workarounds are not applied.
Interestingly, there is a workaround for PIX that you can try if you encounter issues opening or analyzing a capture:
mklink WinPixEngineHost.exe ThatGreatGame.exe
This way, PIX will use "WinPixEngineHost.exe" to launch the DirectX 12 workload, but the driver will see the original executable name. This ensures that the app-specific profile is applied, which may resolve the issue.
The solution:
With this new SDK release, Microsoft introduces an API to retrieve and apply an "application-specific driver state." This state will take the form of an opaque blob of binary data. With this feature and a supporting driver, capture-replay tools will hopefully be able to instruct the driver to apply the same app-specific profile and workarounds when replaying a recorded graphics workload as it would for the original application - even if the executable file name of the replay tool is different. This means that workarounds like the one described above will no longer be necessary.
The support for this feature can be queried using D3D12_FEATURE_DATA_APPLICATION_SPECIFIC_DRIVER_STATE::Supported
. Since this feature is intended for tools rather than typical graphics applications, I won’t delve into further details here.
This feature is only available in the preview SDK version. The article announcing it is: Agility SDK 1.716.0-preview: Recreate At GPUVA. It is intended for capture-replay tools rather than general usage in applications.
The problem:
Graphics APIs are gradually moving toward the use of free-form pointers, known as GPU Virtual Addresses (GPUVA). If such pointers are embedded in buffers, capture-replay tools may struggle to replay the workload accurately, as the addresses of the resources may differ in subsequent runs. Microsoft mentions that in PIX, they intercept the indirect argument buffer used for ExecuteIndirect
to patch these pointers, but this approach may not always be fully reliable.
The solution:
With this new SDK release, Microsoft introduces an API to retrieve the address of a resource and to request the creation of a new resource at a specific address. To ensure that no other resources are assigned to the intended address beforehand, there will also be an option to reserve a list of GPUVA address ranges before creating a Direct3D 12 device.
The support for this feature can be queried using D3D12_FEATURE_DATA_D3D12_OPTIONS20::RecreateAtTier
. Since this feature is intended for tools rather than typical graphics applications, I won’t delve into further details here.
This is yet another feature that Vulkan already provides, while Microsoft is only now adding it. In Vulkan, the ability to recreate resources at a specific address was introduced alongside the VK_KHR_buffer_device_address extension, which introduced free-form pointers. This functionality is provided through "capture replay" features, such as the VkBufferOpaqueCaptureAddressCreateInfo
structure.
This feature works automatically and does not introduce any new API. It improves performance by passing some DirectX 12 function calls directly to the graphics driver, bypassing intermediate functions in Microsoft’s DirectX 12 runtime code.
If I understand it correctly, this appears to be yet another feature that Vulkan got right, and Microsoft is now catching up. For more details, see the article Architecture of the Vulkan Loader Interfaces, which describes how dynamically fetching pointers to Vulkan functions using vkGetInstanceProcAddr
and vkGetDeviceProcAddr
can point directly to the "Installable Client Driver (ICD)," bypassing "trampoline functions."
There are also some additions to D3D12 Video. The article announcing them is: Agility SDK 1.716.0-preview: New D3D12 Video Encode Features. However, since I don’t have much expertise in D3D12 Video, I won’t describe them here.
Microsoft also released new versions of PIX that support all these new features from day 0! See the announcement article for PIX version 2501.30 and 2501.30-preview.
Queries for the new capabilities added in this update to the Agility SDK (both retail and preview versions) have already been integrated into the D3d12info command-line tool, the D3d12infoGUI tool, and the D3d12infoDB online database of DX12 GPU capabilities. You can contribute to this project by running the GUI tool and submitting your GPU’s capabilities to the database!
Last weekend, 24-26 January 2025 I participated in Global Game Jam, and, more specifically, PolyJam 2025 - a site in Warsaw, Poland. In this post I'll share the game we've made (including the full source code) and describe my first impressions of the Godot Engine, which we used for development.
We made a simple 2D pixel-art game with mechanics similar to Overcooked. It was designed for 2 to 4 players in co-op mode, using keyboard and gamepads.
Entry of the game at globalgamejam.org
GitHub repository with the source code
A side note: The theme of GGJ 2025 was "Bubble". Many teams created games about bubbles in water, while others interpreted it more creatively. For example, the game Startup Panic: The Grind Never Stops featured minigames like drawing graphs or typing buzzwords such as "Machine Learning" to convince investors to fund your startup – an obvious bubble 🙂 Our game, on the other hand, focused on taking care of babies and fulfilling their needs so they could grow up successfully. In Polish, the word for "bubbles" is "bąbelki", but it’s also informally used to refer to babies. Deliberately misspelled as "bombelki", it is a wordplay that makes sense and fits the theme in Polish.
My previous game jam was exactly two years ago. Before that jam, I had learned a bit of the Cocos Creator and used it to develop my game, mainly to try something new. I described my impressions in this post: Impressions After Global Game Jam 2023. This time, I took a similar approach and I started learning Godot engine about three weeks before the jam. Having some experience with Unity and Unreal Engine, my first impressions of Godot have been very positive. Despite being an open-source project, it doesn’t have that typical "open-source feeling" of being buggy, unfinished, or inconvenient to use. Quite the opposite! Here are the things I especially like about the engine:
I like that it’s small, lightweight, and easy to set up. All you need to do is download a 55 MB archive, unpack it, and you’re ready to start developing. This is because it’s a portable executable that doesn’t require any installation. The only time you need to download additional files (over 1 GB) is when you’re preparing to create a build for a specific platform.
I also appreciate how simple the core ideas of the engine are:
I’m not sure if this approach is optimal in terms of performance or whether it’s as well-optimized as the full Entity Component System (ECS) that some other engines use. However, I believe a good engine should be designed like this one – with a simple and intuitive interface, while handling performance optimizations seamlessly under the hood.
I also appreciate the idea that the editor is built using the same GUI controls available for game development. This approach provides access to a wide range of advanced controls: not just buttons and labels, but also movable splitters, multi-line text editors, tree views, and more. They can all be skinned with custom colors and textures.
Similarly, files saved by the engine are text files in the INI-like format with sections like [SectionName]
and key-value pairs like Name = Value
. Unlike binary files, XML, or JSON, these files are very convenient to merge when conflicts arise after two developers modify the same file. The same format is also available and recommended for use in games, such as for saving settings.
Then, there is GDScript - a custom scripting language. While Godot also offers a separate version that supports C# and even has a binding to Rust, GDScript is the native way of implementing game logic. I like it a lot. Some people compare it to Python, but it’s not a fork or extension of Python; it’s a completely separate language. The syntax shares similarities with Python, such as using indentation instead of braces {}
to define scopes. However, GDScript includes many features that Python lacks, specifically tailored for convenient and efficient game development.
One such feature is an interesting mix of dynamic and static typing. By default, variables can have dynamic types (referred to as "variant"), but there are ways to define a static type for a variable. In such cases, assigning a value of a different type results in an error – a feature that Python lacks.
var a = 0.1
a = "Text" # OK - dynamic type.
var b: float
b = 0.1
b = "Text" # Error! b must be a number.
var c := 0.1
c = "Text" # Error! c must be a number.
Another great feature is the inclusion of vector types for 2D, 3D, or 4D vectors of floats or integers. These types are both convenient and intuitive to use – they are passed by value (creating an implicit copy) and are mutable, meaning you can modify individual xyzw
components. This is something that Python cannot easily replicate: in Python, tuples are immutable, while lists and custom classes are passed by reference. As a result, assigning or passing them as function parameters in Python makes the new variable refer to the original object. In GDScript, on the other hand:
var a := Vector2(1.0, 2.0)
var b := a # Made a copy.
b.x = 3.0 # Can modify a single component.
print(a) # Prints (1, 2).
I really appreciate the extra language features that are clearly designed for game development. For example, the @export
attribute before a variable exposes it to the Inspector as a property of a specific type, making it available for visual editing. The $NodeName
syntax allows you to reference other nodes in the scene, and it supports file system-like paths, such as using /
to navigate down the hierarchy and ..
to go up. For instance, you can write something like $../AudioPlayers/HitAudioPlayer.play()
.
I also like how easy it is to animate any property of any object using paths like the one shown above. This can be done using a dedicated AnimationPlayer
node, which provides a full sequencer experience with a timeline. Alternatively, you can dynamically change properties over time using a temporary Tween
object. For example, the following code changes the font color of a label to a transparent color over 0.5 seconds, using a specific easing function selected from the many available options (check out the Godot tweening cheat sheet for more details):
create_tween().tween_property(addition_label.label_settings, ^":font_color", transparent_color, 0.5).set_trans(Tween.TRANS_CUBIC).set_ease(Tween.EASE_IN)
I really appreciate the documentation. All core language features, as well as the classes and functions available in the standard library, seem to be well-documented. The documentation is not only available online but also integrated into the editor (just press F1), allowing you to open documentation tabs alongside your script code tabs.
I also like the debugger. Being able to debug the code I write is incredibly important to me, and Godot delivers a full debugging experience. It allows you to pause the game (automatically pausing when an error occurs), inspect the call stack, view variable values, explore the current scene tree, and more.
That said, I’m sure Godot isn’t perfect. For me, it was just a one-month adventure, so I’ve only described my first impressions. There must be reasons why AAA games aren’t commonly made in this engine. It likely has some rough edges and missing features. I only worked with 2D graphics, but I can see it supports 3D graphics with a Forward+ renderer and PBR materials. While it could potentially be used for 3D projects, I’m certain it’s not as powerful as Unreal Engine in that regard. I also encountered some serious technical issues with the engine during the game jam, but I’ll describe those in separate blog posts to make them easier to find for anyone searching the Internet for a solution.
I also don’t know much about Godot’s performance. The game we made was very simple. If we had thousands of objects on the scene to render and complex logic to calculate every frame, performance would become a critical factor. Doing some work in every object every frame using _process
function is surely an anti-pattern and it runs serially on a single thread. However, I can see that GDScript also supports multithreading – another feature that sets it apart from Python.
To summarize, I now believe that Godot is a great engine at least for game jams and fast prototyping.
Earlier this month, Timothy Lottes published a document on Google Docs called “Fixing the GPU”, where he describes many ideas about how programming compute shaders on the GPU could be improved. It might be an interesting read for those advanced enough to understand it. The document is open for suggestions and comments, and there are few comments there already.
On a different topic, 25 November I attended Code::Dive conference in Wrocław, Poland. It was mostly dedicated to programming in C++ language. I usually attend conferences about game development, so it was an interesting experience for me. Big thanks to Tomasz Łopuszański from Programista magazine for inviting me there! It was great to see Bjarne Stroustrup and Herb Sutter live, among other good speakers. By the way, recordings from the talks are available on YouTube.
Those two events inspired me to write down my thoughts – my personal “wishlist” about programming languages, from the perspective of someone interesting in games and real-time graphics programming. I gathered my opinions about things I like and dislike in C++ and some ideas about how a new, better language could look like. It is less about a specific syntax to propose and more about high-level ideas. You can find it under the following shortened address, but it is really a document at Google Docs. Comments are welcome.
Of course I am aware of Rust, D, Circle, Carbon, and other programming languages that share the same goal of replacing C++. I just wanted to write down my own thoughts about this topic.
Floating-point numbers are a great invention. Thanks to dedicating separate bits to the sign, exponent, and mantissa (also called significand), they can represent a wide range of numbers on a limited number of bits - numbers that are positive or negative, very large or very small (close to zero), integer or fractional.
In programming, we typically use double-precision (64b) or single-precision (32b) numbers. These are the data types available in programming languages (like double
and float
in C/C++) and supported by processors, which can perform calculations on them efficiently. Those of you who deal with graphics programming using graphics APIs like OpenGL, DirectX, or Vulkan, may know that some GPUs also support 16-bit floating-point type, also known as half-float.
Such 16b "half" type obviously has limited precision and range compared to the "single" or "double" version. Because of these limitations, I am reserved in recommending them to use in graphics. I summarized capabilities and limits of these 3 types in a table in my old "Floating-Point Formats Cheatsheet".
Now, as artificial intelligence (AI) / machine learning (ML) is a popular topic, programmers use low precision numbers in this domain. When I learned that floating-point formats based only on 8 bits were proposed, I immediately thought: 256 possible value is little enough that they could be all visualized in a 16x16 table! I developed a script that generates such tables, and so I invite you to take a look at my new article:
"FP8 data type - all values in a table"
Today I would like to present my new, comprehensive article: "How to do a good code review". I can be helpful to any programmer no matter what programming language they use. Conducting good code reviews is a skill worth mastering. In this article, we will discuss the advantages and disadvantages of this process and explore the types of projects where it is most beneficial. We will consider the best approach to take when reviewing code, how to effectively carry out the process, which aspects of the code to focus on, and finally – how to write comments on the code in a way that benefits the project. The goal is to ensure that the review process serves as an opportunity for fruitful communication among team members rather than a source of conflict.
The article was first published few months ago in Polish in issue 112 (March/April 2024) of the Programista magazine. Now I have a right to show it publicly for free, so I share it in two language versions:
I wasn't active on my blog in the past months because I took some time for vacation, but also because I'm now learning about machine learning. I may be late to the party, but I recognize that machine learning algorithms are useful tools in many applications. As people learn the basics, the often feel an urge to teach others about it. Some good reads authored by game/graphics developers are: "Machine Learning for Game Devs" by Alan Wolfe and "Crash Course in Deep Learning (for Computer Graphics)" by Jakub Boksansky. I don't want to duplicate their work, so I will only blog about it when I have something unique to show.
This article is for you if you are a programmer using Direct3D 12. We will talk about a specific part of the API: root signatures. I will provide a comprehensive description of various formats in which they can be specified, stored, and ways to convert between them. The difficulty of this article is intermediate. You are expected to know at least some basics of D3D12. I think that advanced developers will can also learn something new, as some of the topics shown here are not what we typically use in a day-to-day development with D3D12.
I will use C++ as the programming language. Wherever possible, I will also try to use standalone command-line tools instead of writing a custom code. To repeat my experiments demonstrated in this article, you will need two of these:
PATH
environmental variable, so you can open Command Prompt and just type "dxc" to use it.You don't need to know the command-line syntax of these tools to understand the article. I will describe everything step-by-step.
Warning about DXC: If you also have Vulkan SDK installed, very likely your PATH
environmental variable points to "dxc.exe" in that SDK instead of Windows SDK, which can cause problems. To check this, type command: where dxc
. If you find Vulkan SDK listed first, make sure you call "dxc.exe" from Windows SDK, e.g. by explicitly specifying full path to the executable file.
Warning about RGA: If you want to repeat command-line experiments presented here, make sure to use Radeon GPU Analyzer in the latest version, at least 2.9.1. In older versions, the commands I present wouldn't work.
A side note about shader compilation: Native CPU code, like the one we create when compiling our C++ programs, is saved in .exe files. I contains instructions in a common format called x86, which is sent directly to CPU for execution. It works regardless if you have an AMD or Intel processor in your computer, because they comply to the same standard. With programs written for the GPU (which we call shaders), things are different. Every GPU vendor (AMD, Nvidia, Intel) has its own instruction set, necessitating a two-step process for shader compilation:
In Direct3D 12, a root signature is a data structure that describes resource bindings used by a pipeline on all the shader stages. Let's see an example. Let's work with file "Shader1.hlsl": a very simple HLSL code that contains 2 entry points: function VsMain
for vertex shader and function PsMain
for pixel shader:
struct VsInput
{
float3 pos : POSITION;
float2 tex_coord : TEXCOORD;
};
struct VsOutput
{
float4 pos : SV_Position;
float2 tex_coord : TEXCOORD;
};
struct VsConstants
{
float4x4 model_view_proj;
};
ConstantBuffer<VsConstants> vs_constant_buffer : register(b4);
VsOutput VsMain(VsInput i)
{
VsOutput o;
o.pos = mul(float4(i.pos, 1.0), vs_constant_buffer.model_view_proj);
o.tex_coord = i.tex_coord;
return o;
}
Texture2D<float4> color_texture : register(t0);
SamplerState color_sampler : register(s0);
float4 PsMain(VsOutput i) : SV_Target
{
return color_texture.Sample(color_sampler, i.tex_coord);
}
I assume you already know that a shader is a program executed on a GPU that processes a single vertex or pixel with clearly defined inputs and outputs. To perform the work, it can also reach out to video memory to access additional resources, like buffers and textures. In the code shown above:
A root signature is a data structure that describes what I said above - what resources should be bound to the pipeline at individual shader stages. In this specific example, it will be a constant buffer at register b4, a texture at t0, and a sampler at s0. It can also be shown in form of a table:
Root param index | Register | Shader stage |
---|---|---|
0 | b4 | VS |
1 | t0 | PS |
2 | s0 | PS |
I am simplifying things here, because this article is not about teaching you the basics of root signatures. For more information about them, you can check:
To prepare for our experiments, let's compile the shaders shown above using commands:
dxc -T vs_6_0 -E VsMain -Fo Shader1.vs.bin Shader1.hlsl
dxc -T ps_6_0 -E PsMain -Fo Shader1.ps.bin Shader1.hlsl
Note that a single HLSL source file can contain multiple functions (VsMain
, PsMain
). When we compile it, we need to specify one function as an entry point. For example, the first command compiles "Shader1.hlsl" file using VsMain
function as the entry point (-E
parameter) treated as a vertex shader in Shader Model 6.0 (-T
parameter). Similarly, the second command compiles PsMain
function as a pixel shader. Compiled shaders are saved in two separate files: "Shader1.vs.bin" and "Shader1.ps.bin".
It is time to show some C++ code. Imagine we have D3D12 already initialized, our compiled shaders loaded from files to memory, and now we want to render something on the screen. I said a root signature is a data structure, and indeed, we can create one by filling in some structures. The main one is D3D12_ROOT_SIGNATURE_DESC
. Let's fill in the structures according to the table above.
// There will be 3 root parameters.
D3D12_ROOT_PARAMETER root_params[3] = {};
// Root param 0: CBV at b4, passed as descriptor table, visible to VS.
D3D12_DESCRIPTOR_RANGE vs_constant_buffer_desc_range = {};
vs_constant_buffer_desc_range.RangeType = D3D12_DESCRIPTOR_RANGE_TYPE_CBV;
vs_constant_buffer_desc_range.NumDescriptors = 1;
vs_constant_buffer_desc_range.BaseShaderRegister = 4; // b4
root_params[0].ParameterType = D3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE;
root_params[0].ShaderVisibility = D3D12_SHADER_VISIBILITY_VERTEX;
root_params[0].DescriptorTable.NumDescriptorRanges = 1;
root_params[0].DescriptorTable.pDescriptorRanges = &vs_constant_buffer_desc_range;
// Root param 1: SRV at t0, passed as descriptor table, visible to PS.
D3D12_DESCRIPTOR_RANGE color_texture_desc_range = {};
color_texture_desc_range.RangeType = D3D12_DESCRIPTOR_RANGE_TYPE_SRV;
color_texture_desc_range.NumDescriptors = 1;
color_texture_desc_range.BaseShaderRegister = 0; // t0
root_params[1].ParameterType = D3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE;
root_params[1].ShaderVisibility = D3D12_SHADER_VISIBILITY_PIXEL;
root_params[1].DescriptorTable.NumDescriptorRanges = 1;
root_params[1].DescriptorTable.pDescriptorRanges = &color_texture_desc_range;
// Root param 2: sampler at s0, passed as descriptor table, visible to PS.
D3D12_DESCRIPTOR_RANGE color_sampler_desc_range = {};
color_sampler_desc_range.RangeType = D3D12_DESCRIPTOR_RANGE_TYPE_SAMPLER;
color_sampler_desc_range.NumDescriptors = 1;
color_sampler_desc_range.BaseShaderRegister = 0; // s0
root_params[2].ParameterType = D3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE;
root_params[2].ShaderVisibility = D3D12_SHADER_VISIBILITY_PIXEL;
root_params[2].DescriptorTable.NumDescriptorRanges = 1;
root_params[2].DescriptorTable.pDescriptorRanges = &color_sampler_desc_range;
// The main structure describing the whole root signature.
D3D12_ROOT_SIGNATURE_DESC root_sig_desc = {};
root_sig_desc.NumParameters = 3;
root_sig_desc.pParameters = root_params;
root_sig_desc.Flags = D3D12_ROOT_SIGNATURE_FLAG_ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT;
Variable root_sig_desc
of type D3D12_ROOT_SIGNATURE_DESC
is our data structure specifying the root signature. Let's call it a root signature representation number #1.
The code may look scary at first, but if you analyze it carefully, I am sure you can recognize the parameters of the 3 resources to bind that we talked about earlier. This code is so complex because a buffer or a texture can be bound in multiple ways, differing in the number of levels of indirection. Describing it is out of scope of this article, but I explained it comprehensively in my old article: Direct3D 12: Long Way to Access Data.
There is also an even more general structure D3D12_VERSIONED_ROOT_SIGNATURE_DESC
that allows to use root signatures in versions higher than 1.0, but we won't talk about it in this article to not complicate things.
If you also use Vulkan, you may recognize that the equivalent structure is VkDescriptorSetLayoutCreateInfo
. From it, you can call function vkCreateDescriptorSetLayout
to create an object of type VkDescriptorSetLayout
, and then VkPipelineLayout
, which is roughly equivalent to the DX12 root signature.
In DX12, however, this is not that simple. There is an intermediate step we need to go through. Microsoft requires converting this data structure to a special binary format first. They call it "serialization". We can do it using function D3D12SerializeRootSignature
, like this:
ComPtr<ID3DBlob> root_sig_blob, error_blob;
HRESULT hr = D3D12SerializeRootSignature(&root_sig_desc, D3D_ROOT_SIGNATURE_VERSION_1_0,
&root_sig_blob, &error_blob);
// Check hr...
const void* root_sig_data = root_sig_blob->GetBufferPointer();
size_t root_sig_data_size = root_sig_blob->GetBufferSize();
An object of type ID3DBlob
is just a simple container that owns a memory buffer with binary data of some size. ("BLOB" stands for "Binary Large OBject".) This buffer we created here is our representation number #2 of the root signature.
If we save it to a file, we can see that our example root signature has 188 bytes. It starts from characters "DXBC", just like the shaders we previously complied with dxc
tool, which indicates root signatures use the same container format as compiled shaders. I am not sure this binary format is documented somewhere. It should be possible to decipher anyway, as DirectX Shader Compiler (dxc) is open source. I never needed to work with this binary format directly, and we won't do it here either.
I guess Microsoft's intention was to encourage developers to prepare root signatures beforehand and store them in files, just like compiled shaders, so they are not assembled in runtime on every application launch. Is it worth it, though? Shader compilation is slow for sure, but would loading a file be faster than filling in the data structure and serializing it with D3D12SerializeRootSignature
? I doubt it, unless Microsoft implemented this function extremely inefficiently. Very likely, this additional level of indirection is just an extra unnecessary complication that Microsoft prepared for us. That wouldn't be the only case they did it, as you can read in my old article Do RTV and DSV descriptors make any sense?
Note that if a serialized root signature is saved to a file and loaded later, it doesn't need to be stored in a ID3DBlob
object. All we need is a pointer to the data and the size (number of bytes). The data can be stored in a byte array like char* arr = new char[size]
, or std::vector<char>
(I like to use this one), or any other form.
With this extra level of indirection done, we can use this serialized binary root signature to create an object of type ID3D12RootSignature
. This is an opaque object that represents the root signature in memory, ready to be used by D3D12. Let's call it root signature representation number #3. The code for creating it is very simple:
ComPtr<ID3D12RootSignature> root_sig_obj;
hr = g_Device->CreateRootSignature(0, root_sig_data, root_sig_data_size,
IID_PPV_ARGS(&root_sig_obj));
// Check hr...
Having this root signature object, we can pass it as part of the D3D12_GRAPHICS_PIPELINE_STATE_DESC
and use it to create a ID3D12PipelineState
- a Pipeline State Object (PSO) that can be used for rendering.
D3D12_GRAPHICS_PIPELINE_STATE_DESC pso_desc = {};
pso_desc.pRootSignature = root_sig_obj.Get(); // Root signature!
pso_desc.VS.pShaderBytecode = vs_data; // Vertex shader from "Shader1.vs.bin".
pso_desc.VS.BytecodeLength = vs_data_size;
pso_desc.PS.pShaderBytecode = ps_data; // Pixel shader from "Shader1.ps.bin".
pso_desc.PS.BytecodeLength = ps_data_size;
pso_desc.RasterizerState.FillMode = D3D12_FILL_MODE_SOLID;
pso_desc.RasterizerState.CullMode = D3D12_CULL_MODE_NONE;
pso_desc.InputLayout.NumElements = _countof(input_elems);
pso_desc.InputLayout.pInputElementDescs = input_elems;
pso_desc.PrimitiveTopologyType = D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE;
pso_desc.NumRenderTargets = 1;
pso_desc.RTVFormats[0] = DXGI_FORMAT_R8G8B8A8_UNORM_SRGB;
pso_desc.SampleDesc.Count = 1;
ComPtr<ID3D12PipelineState> pso;
hr = g_Device->CreateGraphicsPipelineState(&pso_desc, IID_PPV_ARGS(&pso));
// Check hr...
If we have the serialized root signature saved to a file "RootSigFromCode.bin", we can also play around with assembling a PSO without any coding, but using Radeon GPU Analyzer instead. Try the following command:
rga -s dx12 -c gfx1100 --all-hlsl Shader1.hlsl --all-model 6_0 --vs-entry VsMain --ps-entry PsMain --rs-bin RootSigFromCode.bin --offline --isa AMD_ISA
The meaning of individual parameters is:
-s dx12
- selects DirectX 12 as the API. It is needed because this tool supports other APIs as well.-c gfx1100
- selects the GPU generation to use. gfx1100
means the latest generation of AMD GPUs at the moment I write this article, which is Radeon RX 7000 series.--all-hlsl Shader1.hlsl
- specifies input file with HLSL code.--vs-entry VsMain --ps-entry PsMain
- specifies entry points (function names) for vertex and pixel shader, respectively.--rs-bin RootSigFromCode.bin
- specifies the file with the serialized root signature.--offline
- enables offline mode, which allows RGA to work even without AMD graphics driver installed in the system, e.g. when you have an Nvidia or Intel card.--isa AMD_ISA
- enables the ISA (assembly) as the requested output and specifies the name to be used in output files.When succeeded, this command creates 2 text files with the disassembly of the vertex and pixel shader: "gfx1100_AMD_ISA_vert.isa", "gfx1100_AMD_ISA_pixel.isa". The pixel shader looks like this:
; D3D12 Shader Hash 0x46f0bbb15b95e2453380ad3c9765222a
; API PSO Hash 0xd96cc024d8cb165d
; Driver Internal Pipeline Hash 0xf3a0f055053cc59f
; -------- Disassembly --------------------
shader main
asic(GFX11)
type(PS)
sgpr_count(14)
vgpr_count(8)
wave_size(64)
// s_ps_state in s0
s_version UC_VERSION_GFX11 | UC_VERSION_W64_BIT // 000000000000: B0802006
s_set_inst_prefetch_distance 0x0003 // 000000000004: BF840003
s_mov_b32 m0, s4 // 000000000008: BEFD0004
s_mov_b64 s[12:13], exec // 00000000000C: BE8C017E
s_wqm_b64 exec, exec // 000000000010: BEFE1D7E
s_getpc_b64 s[0:1] // 000000000014: BE804780
s_waitcnt_depctr depctr_vm_vsrc(0) & depctr_va_vdst(0) // 000000000018: BF880F83
lds_param_load v2, attr0.x wait_vdst:0 // 00000000001C: CE000002
lds_param_load v3, attr0.y wait_vdst:0 // 000000000020: CE000103
s_mov_b32 s4, s2 // 000000000024: BE840002
s_mov_b32 s5, s1 // 000000000028: BE850001
s_mov_b32 s0, s3 // 00000000002C: BE800003
s_load_b256 s[4:11], s[4:5], null // 000000000030: F40C0102 F8000000
s_load_b128 s[0:3], s[0:1], null // 000000000038: F4080000 F8000000
v_interp_p10_f32 v4, v2, v0, v2 wait_exp:1 // 000000000040: CD000104 040A0102
v_interp_p10_f32 v0, v3, v0, v3 wait_exp:0 // 000000000048: CD000000 040E0103
s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2) // 000000000050: BF870112
v_interp_p2_f32 v2, v2, v1, v4 wait_exp:7 // 000000000054: CD010702 04120302
v_interp_p2_f32 v0, v3, v1, v0 wait_exp:7 // 00000000005C: CD010700 04020303
s_and_b64 exec, exec, s[12:13] // 000000000064: 8BFE0C7E
s_waitcnt lgkmcnt(0) // 000000000068: BF89FC07
image_sample v[0:3], [v2,v0], s[4:11], s[0:3] dmask:0xf dim:SQ_RSRC_IMG_2D // 00000000006C: F06C0F05 00010002 00000000
s_waitcnt vmcnt(0) // 000000000078: BF8903F7
v_cvt_pk_rtz_f16_f32 v0, v0, v1 // 00000000007C: 5E000300
v_cvt_pk_rtz_f16_f32 v2, v2, v3 // 000000000080: 5E040702
s_mov_b64 exec, s[12:13] // 000000000084: BEFE010C
exp mrt0, v0, v2, off, off done // 000000000088: F8000803 00000200
s_endpgm // 000000000090: BFB00000
We will not analyze it here in details, but it is worth nothing that we have 3 memory loading instructions here, which correspond to the operations we do in the pixel shader: s_load_b256
and s_load_b128
load the descriptors of the sampler s0 and the texture t0, which are then both used by image_sample
instruction to perform the texture sampling.
We talked about many different formats of root signatures already, and there will be more. It is time to show a diagram that gathers them all and presents transitions between them. This is the central part of our article that we will refer to. Note that we already talked about representations number #1, #2, #3, #4, which you can find on the diagram.
There is a way to convert a serialized root signature blob back to data structures. Microsoft offers function D3D12CreateRootSignatureDeserializer
for this purpose. It creates an object of type ID3D12RootSignatureDeserializer
, which owns structure D3D12_ROOT_SIGNATURE_DESC
and other structures referred by it. Example code:
ComPtr<ID3D12RootSignatureDeserializer> root_sig_deserializer;
hr = D3D12CreateRootSignatureDeserializer(root_sig_data, root_sig_data_size,
IID_PPV_ARGS(&root_sig_deserializer));
// Check hr...
const D3D12_ROOT_SIGNATURE_DESC* root_sig_desc = root_sig_deserializer->GetRootSignatureDesc();
// Inspect decoded root_sig_desc...
When using higher root signature versions, you need to use function D3D12CreateVersionedRootSignatureDeserializer
and interface ID3D12VersionedRootSignatureDeserializer
instead.
We are only in the middle of this article. This is because Microsoft prepared one more representation of the root signature - a text representation. For it, they defined a simple domain-specific language, which is fully documented on page Specifying Root Signatures in HLSL. As an example, our simple root signature presented in this article would look like this:
RootFlags(ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT),
DescriptorTable(CBV(b4), visibility=SHADER_VISIBILITY_VERTEX),
DescriptorTable(SRV(t0), visibility=SHADER_VISIBILITY_PIXEL),
DescriptorTable(Sampler(s0), visibility=SHADER_VISIBILITY_PIXEL)
I am sure you can recognize the same parameters we passed when we assembled a data structure describing this root signature in our C++ code. The text representation is clearly more concise and readable.
However, this is not exactly the way we specify root signatures in text format. It will go to our HLSL shader source file, but before we can put it there, we must pack it to a string defined using a #define
macro, so it takes the form of:
#define MyRootSig "RootFlags(ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT), " \
"DescriptorTable(CBV(b4), visibility=SHADER_VISIBILITY_VERTEX), " \
"DescriptorTable(SRV(t0), visibility=SHADER_VISIBILITY_PIXEL), " \
"DescriptorTable(Sampler(s0), visibility=SHADER_VISIBILITY_PIXEL)"
This is our root signature representation number #5 on the diagram. It looks somewhat clumsy, but this is the way we need to format it. The backslash symbol "\" at the end of each line except the last one is necessary to continue the #define
macro in the next line. This is feature of the HLSL preprocessor, same as in C and C++ preprocessor.
We could simplify this macro by putting the whole string with our root signature in a single line, but I am not convinced it would make it more readable. Besides this, formatting root signatures like I shown above is the way recommended by Microsoft in their documentation.
If you think about converting a root signature back to the text representation, there is no ready function for that, but you can find such code in the RGA source, file "source/radeon_gpu_analyzer_backend/autogen/be_rootsignature_dx12.cpp", class RootSignatureUtil. I marked it as an arrow leading from #1 to #5 on the diagram, described as "Custom code".
Having our root signature defined in the text format, packed into a #define
macro, and included in our HLSL shader source file is a first step. Just like a single HLSL file can contain multiple entry points to various shaders, it also contain multiple root signature definitions, so we need to specify the one to use. To do this, we can attach a root signature to the function used as the shader entry point, using [RootSignature()]
attribute with the name of our macro inside.
Here is the full contents of a new shader file "Shader2.hlsl" with root signature embedded:
struct VsInput
{
float3 pos : POSITION;
float2 tex_coord : TEXCOORD;
};
struct VsOutput
{
float4 pos : SV_Position;
float2 tex_coord : TEXCOORD;
};
struct VsConstants
{
float4x4 model_view_proj;
};
#define MyRootSig "RootFlags(ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT), " \
"DescriptorTable(CBV(b4), visibility=SHADER_VISIBILITY_VERTEX), " \
"DescriptorTable(SRV(t0), visibility=SHADER_VISIBILITY_PIXEL), " \
"DescriptorTable(Sampler(s0), visibility=SHADER_VISIBILITY_PIXEL)"
ConstantBuffer<VsConstants> vs_constant_buffer : register(b4);
[RootSignature(MyRootSig)]
VsOutput VsMain(VsInput i)
{
VsOutput o;
o.pos = mul(float4(i.pos, 1.0), vs_constant_buffer.model_view_proj);
o.tex_coord = i.tex_coord;
return o;
}
Texture2D<float4> color_texture : register(t0);
SamplerState color_sampler : register(s0);
[RootSignature(MyRootSig)]
float4 PsMain(VsOutput i) : SV_Target
{
return color_texture.Sample(color_sampler, i.tex_coord);
}
If you compile VS and PS from this file using commands:
dxc -T vs_6_0 -E VsMain -Fo Shader2.vs.bin Shader2.hlsl
dxc -T ps_6_0 -E PsMain -Fo Shader2.ps.bin Shader2.hlsl
New files "Shader2.vs.bin" and "Shader2.ps.bin" will have size greater than respective "Shader1.vs.bin" and "Shader1.ps.bin" we created earlier by exactly 168 bytes, which is similar to the size of our serialized root signature. This indicates that our root signature is bundled together with the compiled shader code. This is the representation number #6 on the diagram.
Shaders compiled with a root signature embedded can then be used in the C++/D3D12 code for creating a PSO without a need to specify the root signature explicitly. Variable D3D12_GRAPHICS_PIPELINE_STATE_DESC::pRootSignature
can be set to null. Our PSO creation code can now look like this:
D3D12_GRAPHICS_PIPELINE_STATE_DESC pso_desc = {};
pso_desc.pRootSignature = NULL; // Sic!
pso_desc.VS.pShaderBytecode = vs.data(); // Vertex shader from "Shader2.vs.bin".
pso_desc.VS.BytecodeLength = vs.size();
pso_desc.PS.pShaderBytecode = ps.data(); // Pixel shader from "Shader2.ps.bin".
pso_desc.PS.BytecodeLength = ps.size();
pso_desc.RasterizerState.FillMode = D3D12_FILL_MODE_SOLID;
pso_desc.RasterizerState.CullMode = D3D12_CULL_MODE_NONE;
pso_desc.InputLayout.NumElements = _countof(input_elems);
pso_desc.InputLayout.pInputElementDescs = input_elems;
pso_desc.PrimitiveTopologyType = D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE;
pso_desc.NumRenderTargets = 1;
pso_desc.RTVFormats[0] = DXGI_FORMAT_R8G8B8A8_UNORM_SRGB;
pso_desc.SampleDesc.Count = 1;
ComPtr<ID3D12PipelineState> pso;
hr = g_Device->CreateGraphicsPipelineState(&pso_desc, IID_PPV_ARGS(&pso));
// Check hr...
Similarly, we can use RGA to compile those shaders, assemble the PSO, and output AMD GPU assembly:
rga -s dx12 -c gfx1100 --all-hlsl Shader2.hlsl --all-model 6_0 --vs-entry VsMain --ps-entry PsMain --offline --isa AMD_ISA
Because we can use multiple shaders at different shaders stages (vertex shader, pixel shader, possibly also hull, domain, geometry, amplification, mesh shader...) when creating a PSO, and we attached a [RootSignature()]
attribute to all of them, you may ask what happens if some shader stages don't specify a root signature or specify a different one. Here are the rules:
D3D12 ERROR: ID3D12Device::CreateGraphicsPipelineState: Root Signature doesn't match Pixel Shader: Root signature of Vertex Shader doesn't match the root signature of Pixel Shader
When we have a root signature encoded in the text format, we can use it in two ways. One is attaching it to a shader entry point function using the [RootSignature()]
attribute, like we've seen in the previous section. The second one is compiling root signature alone. For this, we need to use dedicated command-line arguments for "dxc.exe" and specify the name of our macro.
Let's create a separate HLSL file with only the root signature, called "RootSig.hlsl":
#define MyRootSig "RootFlags(ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT), " \
"DescriptorTable(CBV(b4), visibility=SHADER_VISIBILITY_VERTEX), " \
"DescriptorTable(SRV(t0), visibility=SHADER_VISIBILITY_PIXEL), " \
"DescriptorTable(Sampler(s0), visibility=SHADER_VISIBILITY_PIXEL)"
Let's now use the following command to compile it:
dxc -T rootsig_1_0 -E MyRootSig -Fo RootSigFromHlsl.bin RootSig.hlsl
The output of this command is file "RootSigFromHlsl.bin", which is 188 bytes - exactly the same size as the file "RootSigFromCode.bin" we created earlier by filling in data structures in C++ and serializing them. Thus, we can say we just learned the way to create serialized root signature binary from the text representation. We can now connect two existing blocks in our diagram with the arrow leading from #5 to #2.
Note you can use our previous file "Shader2.hlsl" instead of "RootSig.hlsl" with the same effect. That file contains shader functions, but they just get ignored, as we only use the MyRootSig
macro.
Because there are so many ways of storing root signatures, Microsoft provided a possibility to convert between them using dedicated command-line parameters of DXC:
We can specify a compiled shader with a root signature embedded and extract only the root signature blob from it (connection from #6 to #2 in our diagram):
dxc -dumpbin -extractrootsignature -Fo RootSigExtracted.bin Shader2.vs.bin
The -dumpbin
parameter means that the input file (specified as the positional argument at the end) is a compiled binary, not a text file with HLSL source.
We can transform a compiled shader file into one with the embedded root signature removed. This path is not shown in the diagram. The output file "ShaderNoRootSig.vs.bin" has the same size (4547 B) as "Shader1.vs.bin" that we compiled previously without a root signature.
dxc -dumpbin -Qstrip_rootsignature -Fo ShaderNoRootSig.vs.bin Shader2.vs.bin
We can also join two binary files: one with compiled shader, one with root signature blob, and create a file with the shader and the root signature embedded in it. This is shown on the diagram as a path from #2 to #6.
dxc -dumpbin -setrootsignature RootSigFromCode.bin -Fo ShaderWithRootSigAdded.vs.bin Shader1.vs.bin
I've shown all these commands here, because it is very important to get them right. Microsoft did a terrible job here offering many options in the command-line syntax that can be misleading. For example:
--help
parameter, -T rootsig_1_0
profile is not shown among possible -T
options.-E
, while there is also parameter -rootsig-define
that looks like better suited for this task.-Fo
, while there is also parameter -Frs
that looks like better suited for this task, described as "Output root signature to the given file".Moreover, if you do it the wrong way, DXC prints some cryptic, unrelated error message or prints nothing, does nothing, and exits with process exit code 0. Not very helpful!
Radeon GPU Analyzer utilizes DXC internally, so it can be used to compile shaders from HLSL source code all the way to the pipeline state object (both stages of the shader compilation). That PSO is created internally just to extract the final ISA assembly from it. Here is a command we've seen before:
rga -s dx12 -c gfx1100 --all-hlsl Shader2.hlsl --all-model 6_0 --vs-entry VsMain --ps-entry PsMain --offline --isa AMD_ISA
However, RGA supports many more command-line options. Input shaders can be specified in HLSL format using --all-hlsl FILE
or per-stage --vs FILE
, --ps FILE
etc. with mandatory entry point function names passed as --vs-entry NAME
, --ps-entry NAME
, etc. Alternatively, we can specify compiled binary shaders as input. Then, the input is the intermediate shader representation, while RGA performs only the second stage of the shader compilation.
rga -s dx12 -c gfx1100 --vs-blob Shader2.vs.bin --ps-blob Shader2.ps.bin --offline --isa AMD_ISA
Similarly, a root signature can be specified in one of many ways:
1. Embedded in shaders, like in the 2 commands shown above, as our "Shader2" was compiled with the root signature.
2. From a separate HLSL file and specific #define
macro:
rga -s dx12 -c gfx1100 --all-hlsl Shader1.hlsl --all-model 6_0 --vs-entry VsMain --ps-entry PsMain --rs-hlsl RootSig.hlsl --rs-macro MyRootSig --offline --isa AMD_ISA
3. From a binary file with the serialized root signature:
rga -s dx12 -c gfx1100 --all-hlsl Shader1.hlsl --all-model 6_0 --vs-entry VsMain --ps-entry PsMain --rs-bin RootSigFromCode.bin --offline --isa AMD_ISA
4. None at all. Then, a root signature matching the compiled shaders gets auto-generated. This is a new feature of RGA 2.9.1.
rga -s dx12 -c gfx1100 --all-hlsl Shader1.hlsl --all-model 6_0 --vs-entry VsMain --ps-entry PsMain --offline --isa AMD_ISA
Up to this point, our discussion has centered around a C++ code specifically tailored for loading compiled shaders and establishing a PSO, typically for D3D12 rendering purposes, such as game development or other graphics applications. The compilation of shaders was exclusively carried out utilizing standalone command-line tools: DXC and RGA.
However, DXC shader compiler can also be used in form of a C++ library. Everything we can do with "dxc.exe" we can also do programmatically from our code by using equivalent library. To use the library:
LoadLibrary
"dxil.dll" and "dxcompiler.dll".GetProcAddess
of only one function: DxcCreateInstance
, as everything starts from it.I won't describe the library in details here. It is out of scope of this article, and the article is already very long. However, I would like to point to some interesting features:
1. Certainly, we can compile a shader. To do it, use function IDxcCompiler3::Compile
. Interestingly, we don't fill in data structures with specific parameters for the compiler, like we would normally expect from a programmatic API. Instead, we are asked to format a list of strings with parameters, same as we would pass to the command-line DXC, e.g.:
const wchar_t* arguments[] = {
L"-T ps_6_0",
L"-E PsMain",
// Etc...
};
Because we talk about root signatures here, it is worth noting that we can check if the compiled shader has one embedded. Calling IDxcResult::GetOutput
with parameter DXC_OUT_OBJECT
returns the compiled shader blob, DXC_OUT_ERRORS
returns a string with errors and warnings, while DXC_OUT_ROOR_SIGNATURE
tells us that the shader had a root signature attached.
2. The DXC library offers an interesting feature called reflection. It allows inspecting an existing compiled shader binary for various parameters, including inputs, outputs, and resource bindings. Inputs and outputs are vertex attributes or (in case of a pixel shader output) render targets written. The list of resource bindings is the most interesting for us here, because it allows to generate a root signature compatible with the shader.
Certainly, there isn't just one possible root signature compatible with a given shader, so a generated one may not align with your requirements. For example, a constant buffer b4 can be bound to a shader in one of 3 ways: as a 32-bit root constant, as a root CBV, or a descriptor table containing a CBV. Similarly, multiple subsequent slots like (b2, b3, b4) can be defined in a root signature as separate root parameters or as a single parameter with a descriptor table carrying numDescriptors = 3
. However, reflection can still be useful sometimes if you develop your own engine, and you want automate resource binding based on the shader code.
To use this feature, call IDxcUtils::CreateReflection
, pass the shader binary, and retrieve a new object of type ID3D12ShaderReflection
. You can then query it for parameters, like ID3D12ShaderReflection::GetResourceBindingDesc
. You can see an example of shader reflection used to generate the root signature in RGA source code - see file "source/radeon_gpu_analyzer_backend/autogen/be_reflection_dx12.cpp" and other related places.
3. The DXC library also provides tools to manipulate the binary container format, enabling tasks such as extracting, adding, or removing a root signature from a shader. To use it, search the library header for interface IDxcContainerReflection
or a simpler function IDxcUtils::GetDxilContainerPart
, as well as interface IDxcContainerBuilder
. For example, you can check if a shader binary contains a root signature embedded using following code:
void* part_data = NULL; uint32_t part_size = 0;
HRESULT hr = dxc_utils->GetDxilContainerPart(&shader_binary,
DXC_PART_ROOT_SIGNATURE, &part_data, &part_size);
bool has_root_signature = SUCCEEDED(hr);
As for the policy regarding the usage of root signatures, do they need to match our shaders exactly? No, but the following rules apply:
Texture2D<float4> normal_texture : register(t1)
, but the root signature doesn't mention SRV at the slot t1, the PSO creation will fail with an error.You may ask: Can I just create one big all-encompassing root signature that defines all the resource bindings I may ever need and use it for all my shaders? Theoretically you could, but there are two main arguments against doing this.
DescriptorTable(CBV(b0, numDescriptors=10))
.On the other hand, switching root signature for every shader and rebinding all the root arguments can have its overhead too. If you look at Cyberpunk 2077, for example, you can see that they just use one big root signature for all graphics shaders and the second one for all compute shaders in the game. I am not disclosing any secret here. If you own the game on Steam or GOG, you can capture a frame using PIX on Windows and see it by yourself. If they could do it in a AAA game that looks and runs so well, do we really need to optimize better? 😀
Update 2024-05-15: In the comments below my post on Mastodon, others disclosed that Frostbite engine by DICE, as well as the engine developed by Digital Extremes take the same approach.
This article offers a comprehensive description of various formats of root signatures in Direct3D 12. We've explored some C++ code along with the utilization of command-line tools such as the DXC shader compiler from Microsoft and the Radeon GPU Analyzer (RGA) from AMD. A root signature can be authored or stored as:
D3D12_ROOT_SIGNATURE_DESC
.ID3D12RootSignature
object.ID3D12PipelineState
, which is the final object used during rendering.#define
macro.We've learned how to use these representations and how to convert between them.
If you are a programmer coding mostly in C++ for Windows, as I do, you likely use Microsoft Visual Studio as the IDE, including its code editor and debugger. When using Visual Studio for development, we typically compile, launch, and debug the code we developed. However, Visual Studio can also be used for debugging third-party executables. Having neither debug symbols (.pdb files) nor the source code of the application will result in a limited debugging experience, but it can still be useful in some cases. In this article, I will share a few tips and tricks for using the Visual Studio debugger to debug external .exe files.
I think this article is suitable for programmers with all levels of experience, also for beginners. We won't be looking at the x86 assembly code, I promise! 😀 All the screenshots are made with Visual Studio 2022 version 17.9.6 on Windows 10.
You can attach the debugger to an existing process using Debug > Attach to Process... command, but I think often we want to launch the process with the debugger attached from the very beginning. You can launch a selected .exe file under the Visual Studio debugger without having its source code available and building it. To do this:
This creates a special, dummy solution with 1 project inside that is not a C++ or C# project building any source code, but just a command to launch and debug the selected executable file. You can actually save this solution as an .sln file and go back to it later.
Once this is done, you control a debug session with the following commands. These commands function just like they normally do in the Visual Studio debugger.
In the Solution Explorer panel, when you right-click on a project and select Properties, you can change settings to be used for launching the process under the debugger, such as:
This works regardless of whether you use a normal project that compiles a source code or a project that just launches an .exe file. The screenshot illustrates the settings for an .exe project. Similarly, in case of a C++ project, you can find those settings in the project properties window in Configuration Properties > Debugging tab.
Editing command-line parameters as a single-line string can be inconvenient when using a complex CLI syntax with many parameters and testing their various combinations. Fortunately, there is a great Visual Studio extension that helps with this: Smart Command Line Arguments. After installing it, you can open a new panel with command: View > Other Windows > Command Line Arguments. Within the panel, you can conveniently edit parameters to pass to the current startup project in form of a list, with a possibility to enable/disable individual items. There is also a button to "Copy whole command line to clipboard" in case you need it.
Unfortunately, this extension seems to work only with solutions that build a source code and not the ones that launch an .exe file.
Among the project settings shown above, there is also Command parameter, which points to the .exe file to be launched. The possibility to change it means that we can work on a project that builds some executable but launch a completely different executable for a debugging session.
This capability also extends to static (.lib) and dynamic (.dll) libraries. While it's not possible to "'launch" a library directly, you can still set a library project as the startup project. By changing the Command parameter to point to some .exe file, you can enjoy a full debugging experience of your library, including breakpoints, watches, etc., as long as that executable links or loads this library.
What if the software we want to debug is comprised of multiple processes? There might be, for example, one "launcher" or "watchdog" process which then starts another process that does the actual work. Visual Studio debugs only the single process we start.
However, there is a nice extension that solves this problem: Microsoft Child Process Debugging Power Tool. After installing, it needs to be enabled. To do this:
When this is done, Visual Studio will debug the process we start and all the processes created by it.
When we debug our source code, we typically setup breakpoints, look at the call stack, inspect local variables, or use the Watch panel to enter custom expressions to evaluate. We usually know what processes and threads we have and what .dll libraries we load. When working with third-party executables without having the source code, we need a more general picture of the software we debug. For this, following panels may be useful:
Finally, the panel available under Debug > Windows > Output offers a text log that can be invaluable in some cases. It shows all the key events like process start, process exit (with its exit code), thread exit, .dll libraries loading, plus custom debug messages that the app prints using OutputDebugString
function. To find out more about this function, see my old article "Ways to Print and Capture Text Output of a Process", section "OutputDebugString".
When debugging some application, in order to see function names on the call stack and other useful information, we need "debug symbols". They have a form of a .pdb file that accompanies the .exe file. When developing our own app and compiling it from the source code, this happens automatically. To be more specific:
When debugging a third-party executable, we likely don't have debug symbols for it, as .pdb files typically are not distributed with binaries packaged for end users. However, all hope for obtaining useful information on the call stack is not lost. Some threads may execute a code from Windows system libraries, for which Microsoft provides debug symbols. Furthermore, Visual Studio can automatically download these symbols. All you need to do is to enable their symbol server:
There is one caveat: Downloading these symbols can take a long time. When you enable symbol servers and Visual Studio attempts to download symbols, you'll encounter a window like the one shown below. Unfortunately, this process is fully synchronous, meaning it blocks Visual Studio as the window is modal. Additionally, it also blocks the application being debugged. I wish Microsoft would improve this process by loading these symbols in the background to prevent interruption of workflow.
Fortunately, the lengthy download process typically occurs only on the first launch or when a new .dll file is loaded that wasn't previously used. This is because for subsequent launches, the symbols (.pdb files) for the loaded .dll will already be cached on the local drive, resulting in faster debugging sessions. However, it is important to note that after a Windows Update, symbols may need to be downloaded again, as system libraries could have changed, necessitating an update of the cached symbols.
After loading debug symbols for system libraries, it becomes:
If you have obtained a .pdb file for an executable or a dynamic library that you want to debug, you can manually point Visual Studio to that file. Simply right-click on the module in the Modules panel and select "Load Symbols" from the context menu.
Finally, the last tip is about catching exceptions. When an unhandled exception occurs in a process that is not running under a debugger, the process simply terminates. However, when running under a debugger, the debugger catches the exception and provides an appropriate message showing what went wrong, such as a memory access violation.
We can configure Visual Studio to pause the application and break into the debugger when additional types of exceptions occur, even if they are caught and handled by the application, which will prevent them from going unnoticed. To enable this feature:
When an exception like this occurs, you can simply hit Debug > Continue to resume application execution.
To learn more about catching various types of exceptions (C++ exceptions, SEH exceptions), see my old article: "Why I Catch Exceptions in Main Function in C++".
When learning about using a debugger, we typically focus on setting up breakpoints, observing call stack, inspecting values of variables and data structures. In this article, I explained some of the higher-level features of the Visual Studio debugger that deal with entire processes, their launch parameters, modules (.dll libraries), and debug symbols.
Technological advancements don't come out of nowhere. They are a sum of many small steps. Video games added interactivity to films displayed on a screen, so we call them "video games". Film, in turn, is a successor of theater, which dates back to the ancient times. No wonder that when we make modern games e.g. using Unreal Engine, we use concepts from the theater and film, like an "actor", "scene", "camera", "light".
What inspired me to write this blog post was my visit at Linen Industry Museum in Żyrardów, a city located not far from Warsaw, Poland. Formerly an industrial hub, Żyrardów now houses a museum dedicated to showcasing the rich history of the linen industry through a collection of preserved machinery and artifacts.
Probably the most interesting for us programmers is the Jacquard machine, which used punched cards to program the pattern to be created on a textile. Punched cards like this later became the medium of storing programs for the first computers. For that machine, it wasn't yet programming in terms of Turing-completeness, but it surely was a digital code that controlled the machine.
It should be no surprise that in modern computer science, we use the term "thread", which comes from the textile industry. Nvidia also uses the term "warp", which is another word from that industry. We can think of a modern GPU as a machine like this one below. There are a lot of threads running in parallel. Each thread produces one pixel of a certain color. Our role as graphics programmers is to make this machine run fast, with no jams, and make sure the correct pattern is produced on the fabric on the computer screen.
So many threads! 😀
(All photos are taken by me in the beforementioned Linen Industry Museum in Żyrardów. If you happen to drive around Warsaw in Poland, make sure to visit it!)
Believe it or not, today marks the 20th anniversary of my blog. Exactly on February 13th, 2004, I've published my first entry, which is still online: "Nareszcie w Sieci". It was Friday 13th, but I said there that I don't believe in bad luck, and apparently I was right. Today, I would like to take this opportunity to look back and to write a few words about this website.
This wasn't my first or last venture on the Internet. Even before I launched this page, together with my friends from the neighborhood in my home town Częstochowa, still as teenagers, we formed a group that we called "ProgrameX". We had a website where we published various applications, games, articles, and art. We even created and shared custom Windows cursors and screensavers. I mentioned it in my past article: "Internet in Poland - My History". By the way, we all ended up earning M.Sc. degrees in computer science and now work in the IT field. Greetings to Grzesiek, Damian, and Tomek! I was also actively involved in the Polish Internet game developers community known as "Warsztat" (eng. "Workshop"), and over the years I became a moderator and then an administrator of it. That website doesn't exist anymore. Its last address was warsztat.gd.
At first, I was blogging in Polish, as I didn't feel confident writing in English. Only in June 2009 I officially switched to English. Over these 20 years, I gathered more than a thousand of entries. This one is 1168th. I know I could be ashamed of the old ones and I should remove them, their links and images probably don't work any more, but I still keep them, because I think some of them may provide useful knowledge. I like educating people and I know there are always more beginners than advanced programmers in each area, so what now seems obvious to me may be an interesting and new finding for someone else. I've only included a disclaimer for older entries, acknowledging that they may not reflect my current knowledge and beliefs.
While I may not consider myself particularly creative or spontaneous, persistence is a skill I possess. I can work towards my goals step after step and I don't get bored too quickly. Maybe this is why my favorite sport is working out at the gym and riding my bike, my favorite music genre is electronic music like psytrance, trance, and techno, my approach to money is passive investing, and after 20 years I'm still writing this blog.
When it comes to the technical side of this page, I don't use any CMS like Wordpress. While I'm not a proficient web developer, I've written all the scripts myself using PHP and MySQL. The page is still running pretty much the same scripts with small modifications. In August 2017, I familiarized myself with "responsive design" and implemented changes to the page and its CSS stylesheet to improve readability on mobile devices. Until October 2019, the page was not even utilizing the UTF-8 character set! Instead, it relied on the one-byte ISO-8859-2 code page, historically used for diacritic characters in Central and Eastern European languages. Just last week, I upgraded the scripts from PHP 5 to 8 and configured MySQL to accept any Unicode characters like emoticons 😀
My website has always been and remains completely non-commercial. I never intended to make money from it. Quite the opposite - I need to cover annual expenses for hosting and for the "asawicki.info" domain. Occasionally, I receive offers for commercial collaboration, such as displaying advertisements on the website or publishing third-party content. I don't know which ones are legit and which ones are pure spam or scam, and I don't even care - I ignore or reject them all. On only a few occasions, I accepted offers for complimentary e-books or development tool licenses in exchange for publishing their reviews.
Because of this, I also don't check statistics about visits too often. Statistics for January 2024 show 24076 unique guests, 21728 visits, 91680 pages, and 304438 requests in that month, which means 777 guests, 701 visits, 2957 pages, and 9821 requests per day. Is it a significant amount? I'm not sure. I'm not a web developer or a SEO expert. I understand that the topics I cover on my blog are niche. I don't publish content solely to increase popularity. I don't delude myself into believing that I have a dedicated fan base following my website. I'm delighted if you visit the site occasionally or subscribe to my RSS feed, but I know that most visitors come here from a search engine like Google, looking for specific keywords. Unsurprisingly, the most visited pages include: "How to quickly convert MKV to MP4 file using VLC?" (although the method described may not be the most optimal) and "Car Physics for Games" by Marco Monster (an article that is not mine, which I mirrored due to the original URL becoming inactive) - not the most important ones, from my perspective. However, if someone searches for a specialized topic such as "sparse textures in Vulkan" or "how to scalarize a shader" and finds my blog useful, then I know it all makes sense. I work on entertainment (games, video) my entire life, I'm not going to find a cure for cancer, but educating developers, providing answers to their questions (whether needed for work or just for a hobby project) is one good thing I can do for others.
Speaking about the topics covered on this blog, I told myself from the very beginning that I will write exclusively about my professional interests listed at the top of the page: "programming, graphics, games, media, C++, Windows, Internet and more...". I never intended it to serve as a personal diary. Later, as my career advanced to the roles of "senior" and more recently "principal", I also promised myself to always stay close to the code on this blog and never switch to posting exclusively on some high-level, "visionary" ideas, like some veteran programmers do.
Over the past 20 years, many changes have occurred. When I started this blog, it was at the dawn of the "Web 2.0" era. Numerous websites still adhered to the old "Web 1.0" style, characterized by textured backgrounds and animated GIFs. There were many "portals" gathering people and knowledge around a specific topic. Personal blogs were a new thing. They allowed to have our own place in the virtual space, to become a content creator rather than just a consumer. Hosting custom websites or even entire servers was a norm. Only later, we witnessed the rise of social media platforms and the centralization of data and power in the hands of a few major corporations, such as Facebook, Google, and X (Twitter).
The content published on the Internet also changed its form. These days, I should probably post my articles on LinkedIn, have a YouTube channel, or record podcasts, rather than just publishing articles here. I'm seriously considering posting videos on YouTube, especially as I have some experience in public speaking, as well as recording and editing videos. Somehow, I still didn't start it yet. As one ages, time mysteriously passes faster, thus, I find it challenging to consistently post something at least once a month despite having a lengthy list of new topics to explore.
What will future bring? I don't know and I don't like to play a futurologist. I prefer to focus on "here and now". The recent surge in popularity of AI chatbots, such as ChatGPT, raises questions about the future transformation of the Internet. On one hand, they offer convenience by providing quick answers to specific queries. As my friend recently said, "the best thing about these AI services is that you can get an answer for how long to cook the porridge without opening an entire long article that describes where does the porridge come from and who was the first to cook it". On the other hand, scraping knowledge from websites like mine and serving it rephrased without crediting the author raises a concern about copyrights. Regardless of what lies ahead, I remain optimistic, as I believe the fields of graphics and games have a bright future.