I recently stumbled upon a problem while trying to install the Epic Games Launcher on a fresh installation of Windows 11. The installation wizard was showing the message: “Epic Games Launcher Wizard ended prematurely because of an error.” and the launcher wasn’t installing. Trying to install it from the Microsoft Store was also failing, showing the error code 0x8A150049.
Solution: Create a different user account - one without a space in the username. Change its type to Administrator. Sign out, sign in to that account, and use it to install the launcher. After that, you can return to your main account and delete the temporary one. The Epic Games Launcher will remain installed and ready to use.
Full story: I got a new PC with a fresh installation of Windows 11. I started installing all the necessary software and my programming environment. (For the list of Windows apps I recommend, see my older post: My Favorite Windows Apps.) When I tried to install the Epic Games Launcher, I was surprised that after completing the setup wizard, the app didn’t appear in my system. Only on the second attempt did I carefully read the message that appeared on the final page:
Epic Games Launcher Wizard ended prematurely because of an error.
I searched the Internet and tried many solutions, but none of them helped:
msiexec /i PATH\EpicInstaller-18.12.1.msi as Administrator.Finally, somewhere on the Internet I found information that the installer leaves a text log file in "c:\Users\MY_LOGIN\AppData\Local\Epic Games\Epic Online Services\EOSInstaller\Logs\EOSInstaller-DATE-TIME.log". I opened it and found the following messages inside:
[2025.11.04-10.00.56:329][---]Log file opened.
[2025.11.04-10.00.56:329][---]FApplication: Version 1.2.26 ran with extract=C:\Users\Adam Sawicki\AppData\Local\Temp\7a4515cf-dde6-44f9-afb4-b5b1e0dee697
[2025.11.04-10.00.56:348][---]FApplication: Extract mode
[2025.11.04-10.00.56:349][---]FApplication: Extracting bundled MSI
[2025.11.04-10.00.56:349][---]FApplication: Could not create temp directory "C:\\Users\\Adam" system:183
[2025.11.04-10.00.56:349][---]FApplication: Failed to build MSI
[2025.11.04-10.00.56:349][---]Log file closed.
The line "Could not create temp directory "C:\Users\Adam"" gave me a clue that the installer likely fails because of the space in my Windows username, which is “Adam Sawicki”. That’s how I came up with the solution of using a Windows account without a space in the username.
After all, this is clearly a bug in Epic’s code. They shouldn’t rely on whether a username contains spaces or other characters. They probably just forgot to properly escape the path with quotation marks (" ") somewhere in their code. Epic, please fix it!
This article will be short and straight to the point. While working with geometry in 2D, I was recently looking for an algorithm to calculate the bounding box of a specific shape that I initially called a "cone". Actually, as I'm talking about 2D, I should rather say I needed the bounding rectangle of a circular sector - a part of a circle with a limited angle around an axis pointing in a specific direction.
When developing a 2D game, this shape can represent, for example, the area of effect of an attack, such as punching nearby enemies, firing a shotgun, spraying some substance ahead, or casting a magical spell. Calculating its bounding rectangle can be useful for querying a space-partitioning data structure (like a grid, a quadtree, etc.) for potentially affected objects.
I prototyped my solution in ShaderToy, which you can see here: shadertoy.com/view/w3jcRw.
A circular sector is described by:
vec2 apex - the starting point and the center of the circle that this shape is part ofvec2 direction - a vector pointing in the direction of the axis (must be normalized)float halfAngle - the angle between the axis and the edges, or half of the angle between the opposing edges (in radians, in range 0...π)
float radius - the radius of the circle that this shape is part ofThe output bounding rectangle is described by just vec2 MinPos, MaxPox - two points defining the minimum and maximum coordinates it contains.
To calculate the bounding rectangle of our cone, we need to consider all possible points that extend the furthest along the X and Y axes, and take their min/max. The first such point is the apex. The next two are what I call "edge points."

However, there are cases where this is not enough. We also need to check four "extra points" located at a distance of radius from the apex along -X, +X, -Y, +Y, as long as each of these points belongs to the cone.

My final algorithm in GLSL is:
void CalcConeBoundingRect(vec2 apex, vec2 direction, float halfAngle, float radius,
out vec2 boundingRectMinPos, out vec2 boundingRectMaxPos)
{
float sinHalfAngle = sin(halfAngle);
float cosHalfAngle = cos(halfAngle);
vec2 edgeVec1 = vec2(
direction.x * cosHalfAngle - direction.y * sinHalfAngle,
direction.y * cosHalfAngle + direction.x * sinHalfAngle);
vec2 edgeVec2 = vec2(
direction.x * cosHalfAngle + direction.y * sinHalfAngle,
direction.y * cosHalfAngle - direction.x * sinHalfAngle);
vec2 edgePoint1 = apex + edgeVec1 * radius;
vec2 edgePoint2 = apex + edgeVec2 * radius;
boundingRectMinPos = min(min(edgePoint1, edgePoint2), apex);
boundingRectMaxPos = max(max(edgePoint1, edgePoint2), apex);
vec2 unitVec[4] = vec2[](
vec2(-1.0, 0.0), vec2(1.0, 0.0),
vec2(0.0, -1.0), vec2(0.0, 1.0));
for(int i = 0; i < 4; ++i)
{
if(dot(unitVec[i], direction) >= cosHalfAngle)
{
vec2 extraPoint = apex + unitVec[i] * radius;
boundingRectMinPos = min(boundingRectMinPos, extraPoint);
boundingRectMaxPos = max(boundingRectMaxPos, extraPoint);
}
}
}
Note that we don't use raw angles here, apart from the initial parameter. We don't call the atan2 function, nor do we compare whether one angle is smaller than another. We simply operate on vectors - a common theme in well-designed geometric algorithms.
The algorithm can be optimized further if we store the sine and cosine of the angle in advance. Alternatively, if we have only one of them, we can compute the other using the formula below. This way, we never need to use the raw angle value at all.
float sinHalfAngle = sqrt(1.0 - cosHalfAngle * cosHalfAngle);
EDIT: Big thanks to Matthew Arcus for suggesting an improvement to the code! I applied it to the listing above.
This is a guest post from my friend Łukasz Izdebski Ph.D.
It’s been a while since my last guest post on Adam’s blog, but I’m back with something short and practical—think of it as an epilogue to this earlier post on Bézier curves in animation. The last post focused on the theory and mathematics behind Bézier curves. What it lacked was a practical perspective—an opportunity to see the implementation in action. I wanted to share with you a simple library that I have created. Its purpose is to directly represent cubic Bézier Curves as Easing Functions.
The library is designed with C++20 and newer standards in mind, taking advantage of modern language features for clarity and performance. If needed, support for earlier versions of C++ can be added to ensure broader compatibility.
EasingCubicBezier<T>. This class handles the interpolation of parameters used in the keyframe method. The interpolation of parameters follows the same principles as standard Bézier curve evaluation.evaluate function with a parameter t, which should lie between x0 (the X coordinate of the first control point, representing the start time of the frame) and x3 (the X coordinate of the fourth control point, representing the end time). As presented in previous blog post, the EasingCubicBezier character as a easing function depends solely on the X coordinates of the control points.
The tests were prepared for a single, fixed value of the Y coordinates of the Bézier curve control points (their value does not affect the interpolation performance in any way), and for a set of 256 different variants of the X coordinates of the control points.
The aim was to cover as wide a range of control point locations as possible (in particular, the two inner points).
Performance measurements were carried out using the Google Benchmark framework, ensuring reliable and consistent results. Further details and test results are available in the library repository.
The new approach using EasingCubicBezier<T> has been benchmarked against two commonly used methods in game engines and graphics applications. Both of these alternatives rely on solving cubic polynomial equations, either through algebraic solutions or numerical techniques.
In the case of numerical methods, a critical factor is the choice of the initial starting point. This selection plays a major role in determining the algorithm’s convergence speed and stability.
The following tests compared 5 different algorithms:
0.5 (because the X coordinates of the curve were previously normalised to the interval [0, 1]).t, where t is the input parameter for which the Bézier curve interpolation is being evaluated.The chart and the table below presents the benchmark results using a box plot, highlighting the distribution and variability of each algorithm’s performance in PRECISE mode with AVX2 extensions turned On.
| Algorithm | Min | Q1 | Median | Q3 | Max | Average | Std dev |
|---|---|---|---|---|---|---|---|
| Easing Cubic Bezier | 1562.5 | 21875.0 | 32812.5 | 32812.5 | 39062.5 | 28991.7 | 7803.5 |
| Numeric Solution 1 | 7812.5 | 17187.5 | 23437.5 | 53515.6 | 150000.0 | 40856.9 | 32900.5 |
| Numeric Solution 2 | 4687.5 | 15625.0 | 21093.8 | 52343.8 | 173438.0 | 37292.5 | 31220.9 |
| Original Blender | 40625.0 | 54687.5 | 56250.0 | 56250.0 | 60937.5 | 55096.4 | 2659.2 |
| Optimised Blender | 12500.0 | 40625.0 | 42187.5 | 43750.0 | 45312.5 | 41003.4 | 5931.3 |
The chart and the table below presents the benchmark results using a box plot, highlighting the distribution and variability of each algorithm’s performance in FAST mode mode with AVX2 extensions turned On.
| Algorithm | Min | Q1 | Median | Q3 | Max | Average | Std dev |
|---|---|---|---|---|---|---|---|
| Easing Cubic Bezier | 3125.0 | 15625.0 | 21875.0 | 23437.5 | 23437.5 | 19714.4 | 4838.2 |
| Numeric Solution 1 | 7812.5 | 17187.5 | 23437.5 | 59375.0 | 331250.0 | 42059.3 | 37539.5 |
| Numeric Solution 2 | 4687.5 | 15625.0 | 20312.5 | 48437.5 | 156250.0 | 36926.3 | 31357.5 |
| Original Blender | 32812.5 | 44921.9 | 50000.0 | 50000.0 | 50000.0 | 47418.2 | 3817.4 |
| Optimised Blender | 12500.0 | 35937.5 | 42187.5 | 42187.5 | 43750.0 | 39953.6 | 5554.2 |
The table below summarizes the key conclusions drawn from the benchmark tests.
| Algorithm | Performance | Variation | Conclusions |
|---|---|---|---|
| Easing Cubic Bezier | Very stable and consistently low execution time | Minimal | Most predictable and effective in typical use cases |
| Numeric Solution 1 | Highly variable — ranging from excellent to extremely slow | Huge, with many outliers | Efficient in some cases, but unstable and prone to severe slowdowns |
| Numeric Solution 2 | Similar to Numeric Solution 1, but with more symmetrical behavior | Large, but less extreme | More balanced overall, though still susceptible to performance issues |
| Original Blender | High execution time | Very small | Stable and predictable; useful when consistency is more important than speed |
| Optimised Blender | Moderate execution time | Small | A good compromise between speed and stability |
By representing Bézier curves explicitly in just 28 bytes (float) or 56 bytes (double) using the proposed method, this approach delivers both speed and stability—making it ideal for real-time animation systems. By storing the curve in this form, runtime execution becomes straightforward: it directly interpolates parameter values without the need to solve cubic polynomial equations.This eliminates the overhead typically associated with solving cubic polynomials during runtime.
The cost of determining the interpolating function corresponding to a given Bézier curve is deferred to the construction of an EasingCubicBezier
This is just the beginning of my journey with easing functions. I am working on another solution, whose main goal will be maximum performance in runtime, while maintaining flexibility comparable to that offered by cubic Bézier curves.
Stay tuned!
I believe that in today’s world, e-mail newsletters still make a lot of sense. Back in the early days of the Internet - before search engines like Google became truly effective - there were websites that provided manually curated catalogs of links, organized into categories and subcategories. Later, full-text search engines such as Yahoo and Google took over, making it easy to find almost anything online.
But now, with the overwhelming flood of new content published every day, aggressive Search Engine Optimization (SEO) tactics, and the rise of AI-generated noise, I find it valuable to rely on trusted people who periodically curate and share the most interesting articles and videos within a specific field.
So here’s my question for you: Which programming-related newsletters do you recommend? I’m especially interested in those covering C++, graphics rendering, game development, and similar topics.
Here is my current list:
EDIT: There are additional newsletters recommended in comments under my social media posts on X/Twitter and LinkedIn.
By the way, I still use RSS/Atom feeds to follow interesting websites and blogs. Not every site offers one, but when they do, it’s a convenient way to aggregate recent posts in a single place. For this, I use the free online service Feedly.
If you also follow news feeds this way, you can subscribe to the
Atom feed of my blog.
I also use the social bookmarking service Pinboard. You can browse my public links about graphics under the tags rendering and graphics. Some of these links point to individual articles, while others lead to entire websites or blogs.
If you’re programming graphics using modern APIs like DirectX 12 or Vulkan and you're working with an AMD GPU, you may already be familiar with the Radeon Developer Tool Suite. In this article, I’d like to highlight one of the tools it includes - Driver Experiments - and specifically focus on two experiments that can help you debug AMD-specific issues in your application, such as visual glitches.

Not an actual screenshot from a game, just an illustration.
Before diving into the details, let’s start with the basics. Driver Experiments is one of the tabs available in the Radeon Developer Panel, part of the Radeon Developer Tool Suite. To get started:
The Driver Experiments tool provides a range of toggles that control low-level driver behavior. These settings are normally inaccessible to anyone outside AMD and are certainly not intended for end users or gamers. However, in a development or testing environment - which is our focus here - they can be extremely valuable.
Comprehensive documentation for the tool and its individual experiments is available at GPUOpen.com: Radeon Developer Panel > Features > Driver Experiments.
When using these settings, please keep in mind the following limitations:
Among the many available experiments, some relate to enabling or disabling specific API features (such as ray tracing or mesh shaders), while others target internal driver optimizations. These toggles can help diagnose bugs in your code, uncover optimization opportunities, or even verify suspected driver issues. In the next section, I’ll describe two experiments that I find especially helpful when debugging problems that tend to affect AMD hardware more frequently than other vendors.
This is about a topic I already warned about back in 2015, right after DirectX 12 was released, in my article "Direct3D 12 - Watch out for non-uniform resource index!". To recap: when writing shaders that perform dynamic indexing of an array of descriptors (buffers, textures, samplers), the index is assumed to be scalar - that is, to have the same value across all threads in a wave. For an explanation of what that means, see my old post: "Which Values Are Scalar in a Shader?" When it is not scalar (e.g. it varies from pixel to pixel), we need to decorate it with the NonUniformResourceIndex qualifier in HLSL or the nonuniformEXT qualifier in GLSL:
Texture2D<float4> allTextures[400] : register(t3);
...
float4 color = allTextures[NonUniformResourceIndex(materialIndex)].Sample(
mySampler, texCoords);
The worst thing is that if we forget about NonUniformResourceIndex while the index is indeed non-uniform, we may get undefined behavior, which typically means indexing into the wrong descriptor and results in visual glitches. It won't be reported as an error by the D3D Debug Layer. (EDIT: But PIX can help detect it.) It typically affects only AMD GPUs, while working fine on NVIDIA. This is because in the AMD GPU assembly (ISA) (which is publicly available – see AMD GPU architecture programming documentation) descriptors are scalar, so when the index is non-uniform, the shader compiler needs to generate instructions for a "waterfall loop" that have some performance overhead.
I think that whoever designed the NonUniformResourceIndex qualifier in shader languages is guilty of hours of debugging and frustration for countless developers who stumbled upon this problem. This approach of "performance by default, correctness as opt-in" is not a good design. A better language design would be to do the opposite:
myCB.myConstIndex + 10) and then optimize it.UniformResourceIndex() qualifier, thus declaring that we know what we are doing and we agree to introduce a bug if we don’t keep our promise to ensure the index is really scalar.But the reality is what it is, and no one seems to be working on fixing this. (EDIT: Not fully true, there is some discussion.) That’s where Driver Experiments can help. When you activate the "Force NonUniformResourceIndex" experiment, all shaders are compiled as if every dynamic descriptor index were annotated with NonUniformResourceIndex. This may incur a performance cost, but it can also resolve visual bugs. If enabling it fixes the issue, you’ve likely found a missing NonUniformResourceIndex somewhere in your shaders - you just need to identify which one.
This relates to a topic I touched on in my older post: "Texture Compression: What Can It Mean?". "Compression" in the context of textures can mean many different things. Here, I’m not referring to packing textures in a ZIP file or even using compressed pixel formats like BC7 or ASTC. I’m talking about internal compression formats that GPUs sometimes apply to textures in video memory. These formats are opaque to the developer, lossless, and specific to the GPU vendor and model. They’re not intended to reduce memory usage - in fact, they may slightly increase it due to additional metadata - but they can improve performance when the texture is used. This kind of compression is typically applied to render-target (DX12) or color-attachment (Vulkan) and depth-stencil textures. The decision of when and how to apply such compression is made by the driver and depends on factors like pixel format, MSAA usage, and even texture dimensions.
The problem with this form of compression is that, while invisible to the developer, it can introduce bugs that wouldn’t occur if the texture were stored as a plain, uncompressed pixel array. Two issues in particular come to mind:
(1) Missing or incorrect barrier. Some GPUs may not support certain compression formats for all types of texture usage. Imagine a texture that is first bound as a render target. Rendering triangles to it is optimized thanks to the specialized internal compression. Later, we want to use that texture in a screen-space post-processing pass, sampling it as an SRV (shader resource). In DX12 and Vulkan, this requires inserting a barrier between the two usages. A barrier typically ensures correct execution order - so that the next draw call starts only after the previous one finishes - and flushes or invalidates relevant caches. However, if the GPU doesn’t support the render-target compression format for SRV usage, the barrier must also trigger decompression, converting the entire texture into a different internal format. This step may be slow, but it’s necessary for rendering to work correctly. That’s exactly what D3D12_RESOURCE_STATES and VkImageLayout enums are designed to control.
Now, imagine what happens if we forget to issue this barrier or issue an incorrect one. The texture remains in its compressed render-target format but is then sampled as a shader resource. As a result, we read incorrect data - leading to completely broken output, such as the kind of visual garbage shown in the image above. In contrast, if the driver hadn’t applied any compression, the missing barrier would be less critical because there’d be no format transition required.
(2) Missing or incorrect clear. I discussed this in detail in my older articles: "Initializing DX12 Textures After Allocation and Aliasing" and the follow-up "States and Barriers of Aliasing Render Targets". To recap: when a texture is placed in memory that may contain garbage data, it needs to be properly initialized before use. This situation can occur when the texture is created as placed in a larger memory block (using the CreatePlacedResource function), and that memory was previously used for something else, or when the texture aliases other resources. Proper initialization usually involves a Clear operation. However, if we don’t care about the contents, we can also use the DiscardResource function (in DX12) or transition the texture from VK_IMAGE_LAYOUT_UNDEFINED (in Vulkan).
Here comes the tricky part. What if we’re going to overwrite the entire texture by using it as a render target or UAV / storage image? Surprisingly, that is not considered proper initialization. If the texture were uncompressed, everything might work fine. But when an internal compression format is applied, visual artifacts can appear - and sometimes persist - even after a full overwrite as an RT or UAV. This issue frequently shows up on AMD GPUs while going unnoticed on NVIDIA. The root cause is that the texture’s metadata wasn't properly initialized. The DiscardResource function handles this correctly: it initializes the metadata while leaving the actual pixel values undefined.
The Driver Experiments tool can also help with debugging this type of issue by providing the "Disable color texture compression" experiment (and in DX12, also "Disable depth-stencil texture compression"). When enabled, the driver skips applying internal compression formats to textures in video memory. While this may result in reduced performance, it can also eliminate rendering bugs. If enabling this experiment resolves the issue, it’s a strong indicator that the problem lies in a missing or incorrect initialization (typically a Clear operation) or a barrier involving a render-target or depth-stencil texture. The next step is to identify the affected texture and insert the appropriate command at the right place in the rendering process.
The Driver Experiments tab in the Radeon Developer Panel is a collection of toggles for the AMD graphics driver, useful for debugging and performance tuning. I've focused on two of them in this article, but there are many more, each potentially useful in different situations. Over the years, I’ve encountered various issues across many games. For example:
This will be a beginner-level article for programmers working in C, C++, or other languages that use a similar preprocessor - such as shader languages like HLSL or GLSL. The preprocessor is a powerful feature. While it can be misused in ways that make code more complex and error-prone, it can also be a valuable tool for building programs and libraries that work across multiple platforms and external environments.
In this post, I’ll focus specifically on conditional compilation using the #if and #ifdef directives. These allow you to include or exclude parts of your code at compile time, which is much more powerful than a typical runtime if() condition. For example, you can completely remove a piece of code that might not even compile in certain configurations. This is especially useful when targeting specific platforms, external libraries, or particular versions of them.
When it comes to enabling or disabling a feature in your code, there are generally two common approaches:
Solution 1: Define or don’t define a macro and use #ifdef:
// To disable the feature: leave the macro undefined.
// To enable the feature: define the macro (with or without a value).
#define M
// Later in the code...
#ifdef M
// Use the feature...
#else
// Use fallback path...
#endif
Solution 2: Define a macro with a numeric value (0 or 1), and use #if:
// To disable the feature: define the macro as 0.
#define M 0
// To enable the feature: define the macro as a non-zero value.
#define M 1
// Later in the code...
#if M
// Use the feature...
#else
// Use fallback path...
#endif
There are more possibilities to consider, so let’s summarize how different macro definitions behave with #ifdef and #if in the table below:
| #ifdef M | #if M | |
|---|---|---|
| (Undefined) | No | No |
#define M |
Yes | ERROR |
#define M 0 |
Yes | No |
#define M 1 |
Yes | Yes |
#define M (1) |
Yes | Yes |
#define M FOO |
Yes | No |
#define M "FOO" |
Yes | ERROR |
The #ifdef M directive simply checks whether the macro M is defined, no matter if it has empty value or any other value. On the other hand, #if M attempts to evaluate the value of M as an integer constant expression. This means it works correctly if M is defined as a literal number like 1 or even as an arithmetic expression like (OTHER_MACRO + 1). Interestingly, using an undefined symbol in #if evaluates to 0, but defining a macro with an empty value or a non-numeric token (like a string) will cause a compilation error - such as “error C1017: invalid integer constant expression” in Visual Studio.
It's also worth noting that #if can be used to check whether a macro is defined by writing #if defined(M). While this is more verbose than #ifdef M, it’s also more flexible and robust. It allows you to combine multiple conditions using logical operators like && and ||, enabling more complex preprocessor logic. It is also the only option when doing #elif defined(OTHER_M), unless you are using C++23, which adds missing #elifdef and #elifndef directives.
So, which of the two approaches should you choose? We may argue about the one or the other, but when developing Vulkan Memory Allocator and D3D12 Memory Allocator libraries, I decided to treat some configuration macros as having three distinct states:
To support this pattern, I use the following structure:
#ifndef M
#if (MY OWN CONDITION...)
#define M 1
#else
#define M 0
#endif
#endif
// Somewhere later...
#if M
// Use the feature...
#else
// Use fallback path...
#endif
Today I would like to present my new article: "The Secrets of Floating-Point Numbers". I can be helpful to any programmer no matter what programming language they use. In this article, I discuss floating-point numbers compliant with the IEEE 754 standard, which are available in most programming languages. I describe their structure, capabilities, and limitations. I also address the common belief that these numbers are inaccurate or nondeterministic. Furthermore, I highlight many non-obvious pitfalls that await developers who use them.
The article was first published few months ago in Polish in issue 5/2024 (115) (November/December 2024) of the Programista magazine. Now I have a right to show it publicly for free, so I share it in two language versions:
This post is about D3d12info open-source project that I'm involved in. The project is in continuous development, while I noticed I didn't blog about it since I first announced it in 2022. Here, I describe the story behind it and the current state of it. The post may be interesting to you if you are a programmer coding graphics for Windows using DirectX 12.
Various GPUs (discrete graphics cards, processor integrated graphics chips) from various vendors (AMD, Intel, Nvidia, …) have various capabilities. Even when a GPU supports a specific API (OpenGL, DirectX 11, DirectX 12, Vulkan), some of the features may not be supported. These features span from the big ones that even non-programmers recognize, like ray tracing, to the most obscure, like the lengthy D3D12_FEATURE_DATA_D3D12_OPTIONS::VPAndRTArrayIndexFromAnyShaderFeedingRasterizerSupportedWithoutGSEmulation and even lengthier Vulkan VkPhysicalDeviceShaderIntegerDotProductProperties::integerDotProductAccumulatingSaturating4x8BitPackedMixedSignednessAccelerated 🙂
Before using any of these features in our apps, we need to query if the feature is supported on the current GPU. Checking it programmatically is relatively simple. Graphics APIs offer functions for that purpose, like ID3D12Device::CheckFeatureSupport and vkGetPhysicalDeviceProperties2. When the feature is not supported, the app should either fall back to some other implementation (e.g. using screen-space reflections instead of ray-traced reflections) or display an error telling that the GPU doesn’t meet our minimum hardware requirements.
However, when we plan using some optional feature of the API and we think about testing it on a variety of platforms and eventually shipping it to end users, we may ask:
For Vulkan, answers to these questions are: yes & yes. For querying the capabilities of the local GPU, Vulkan SDK comes with a small command-line program called “vulkaninfo”. After calling it, we can see all the extensions, properties, features, and limits of the GPU, in a human-readable text format. Alternatively, JSON and HTML format is also available.
For the database of GPUs, Sascha Willems maintains Vulkan Hardware Database and an accompanying GUI app Vulkan Hardware Capability Viewer that presents the capabilities of the local GPU and also allows submitting this report to the database.
For Direct3D 12, however, I wasn’t aware of any such application or database. Windows SDK comes with a GUI app that can be found in "c:\Program Files (x86)\Windows Kits\10\bin\*\x64\dxcapsviewer.exe". It presents some features of DirectDraw, Direct3D9, 11, DXGI, also some options of Direct3D12, but doesn’t seem to be complete in terms of all the latest options available. There is no updated version of it distributed with DirectX 12 Agility SDK. There is also no way to use it from command line. At least Microsoft open sourced it: DxCapsViewer @ GitHub.
This is why I decided to develop D3d12info, to become a DX12 equivalent of vulkaninfo. Written in C++, this small Windows console app prints all the capabilities of DX12 to the standard output, in text format. The project is open source, under MIT license, but you can also download precompiled binaries by picking the latest release.
JSON is also available as the output format, which makes the app suitable for automated processing as part of some larger pipeline.
I published first “draft” version of D3d12info in 2018, but it wasn’t until July 2022 that I released first version I considered complete and marked as 1.0.0. The app had many releases since then. I update it as Microsoft ships new versions of the Agility SDK to fetch newly added capabilities (including ones from “preview” version of the SDK).
There are some other information fetched and printed by the app apart from DX12 features. The ones I consider most important are:
However, I try to limit the scope of the project to avoid feature creep, so I refuse some feature requests. For example, I decided not to include capabilities queried from DirectX Video or WDDM.
D3d12info would stay only a command-line tool without Dmytro Bulatov “Devaniti” - a Ukrainian developer working at ZibraAI, who joined the project and developed D3d12infoGUI. This app is a convenient overlay that unpacks the command-line D3d12info, launches it, converts its output into a nicely looking HTML page, which is then saved to a temporary file and opened in a web browser. This allows browsing capabilities of the current GPU in a convenient way. Dmytro also contributed significantly to the code of my D3d12info project.
If you scroll down the report, you can see a table with texture formats and the capabilities they support. Many of them are mandatory for every GPU supporting feature level 12_0, which are marked by a hollow check mark. However, as you can see below, my GPU supports some additional formats as “UAV Typed Load”:
The web page with the report also offers a large green button near the top that submits it to the online database. Here comes the last part of the ecosystem: D3d12infoDB. This is something I was dreaming about for years, but I couldn’t make it since I am not a proficient web developer. Now, Dmytro along with other contributors from the open source community developed a website that gathers reports about various GPUs, offering multiple ways of browsing, searching, and filtering them.
One great feature they’ve added recently is Feature Table. It gathers DX12 capabilities as rows, while columns are subsequent generations of the GPUs from AMD, Nvidia, Intel, and Qualcomm. This way, we can easily see which features are supported by older GPU generations to make a better decision about minimum feature set required by the game we develop. For example, we can see that ray tracing (DXR 1.1) and mesh shaders are supported by Nvidia since Turning architecture (GeForce RTX 2000 series, released in 2018), but support from AMD is more recent, since RDNA2 architecture (Radeon RX 6000 series, released in 2020).
As I mentioned above, I keep the D3d12info tool updated to the latest DirectX 12 Agility SDK, to fetch and print newly added capabilities. This also includes major features like DirectSR or metacommands. D3d12infoGUI app and D3d12infoDB website are also updated frequently.
I want to avoid expanding my app too much. One major feature I consider adding is a separate executable for x86 32-bit, x86 64-bit, and ARM architecture, as I heard there are differences in DX12 capabilities supported between them, while some graphics programmers (e.g. on the demoscene) still target 32 bits. Please let me know if it would be useful to you!
Finally, here is my call to action! You can help the project by submitting your GPU to the online database. Every submission counts. Even having a different version of the graphics driver constitutes a separate entry. Please download the latest D3d12infoGUI release, launch it, and when the local web page opens, press that large green button to submit your report.
Only if you are one of those developers working for a GPU vendor and you use some prototype future GPU hardware or an internal unreleased build of the graphics driver, then, please don’t do it. We don’t want to leak any confidential information through this website. If you accidentally submitted such report, please contact us and we will remove it.
In January 2025, I participated in PolyJam - a Global Game Jam site in Warsaw, Poland. I shared my experiences in a blog post: Global Game Jam 2025 and First Impressions from Godot. This post focuses on a specific issue I encountered during the jam: Godot 4.3 frequently hanging on my ASUS TUF Gaming laptop. If you're in a hurry, you can SCROLL DOWN to skip straight to the solution that worked for me.
The laptop I used was an ASUS TUF Gaming FX505DY. Interestingly, it has two different AMD GPUs onboard - a detail that becomes important later:
The game we developed wasn’t particularly complex or demanding - it was a 2D pixel art project. Yet, the Godot editor kept freezing frequently, even without running the game. The hangs occurred at random moments, often while simply navigating the editor UI. Each time, I had to force-close and restart the process. I was using Godot 4.3 Stable at the time.
I needed a quick solution. My first step was verifying that both Godot 4.3 and my AMD graphics drivers were up to date (they were). Then, I launched Godot via "Godot_v4.3-stable_win64_console.exe", which displays a console window with debug logs alongside the editor. That’s when I noticed an error message appearing every time the hang occurred:
ERROR: Condition "err != VK_SUCCESS" is true. Returning: FAILED
at: command_queue_execute_and_present (drivers/vulkan/rendering_device_driver_vulkan.cpp:2266)
This suggested the issue might be GPU-related, specifically involving the Vulkan API. However, I wasn’t entirely sure - the same error message occasionally appeared even when the engine wasn’t hanging, so it wasn’t a definitive indicator.
To investigate further, I decided to enable the Vulkan validation layer, hoping it would reveal more detailed error messages about what the engine was doing wrong. Having Vulkan SDK installed in my system, I launched the Vulkan Configurator app that comes with it ("Bin\vkconfig.exe"), I selected Vulkan Layers Management = Layers Controlled by the Vulkan Configurator, and selected Validation.
Unfortunately, when I launched Godot again, no new error messages appeared in the console. (Looking back, I’m not even sure if that console window actually captured the process’s standard output.) For a brief moment, I thought enabling the Vulkan validation layer had fixed the hangs - but they soon returned. Maybe they were less frequent, or perhaps it was just wishful thinking.
Next, I considered forcing Godot to use the integrated GPU (Radeon Vega 8) instead of the more powerful discrete GPU (RX 560X). To test this, I adjusted Windows power settings to prioritize power saving over maximum performance. However, this didn’t work - Godot still reported using the Radeon RX 560X.
THE SOLUTION: What finally worked was forcing Godot to use the integrated GPU by launching it with a specific command-line parameter. Instead of running the editor normally, I used:
Godot_v4.3-stable_win64_console.exe --verbose --gpu-index 1
This made Godot use the second GPU (index 1) - the slower Radeon Vega 8 - instead of the default RX 560X. The result? No more hangs. While the integrated GPU is less powerful, it was more than enough for our 2D pixel art game.
I am not sure why it helped, considering that both GPUs on my laptop are from AMD and they are supported by one driver. I also didn't check whether the new Godot 4.4 that was released since then has this bug fixed. I am just leaving this story here, in case someone stumbles upon the same problem in the future.
On January 30th 2025 Microsoft released a new version of DirectX 12 Agility SDK: 1.615.0 (D3D12SDKVersion = 615) and 1.716.0-preview (D3D12SDKVersion = 716). The main article announcing this release is: AgilitySDK 1.716.0-preview and 1.615-retail. Files are available to download from DirectX 12 Agility SDK Downloads, as always, in form of .nupkg files (which are really ZIP archives).
I can see several interesting additions in the new SDK, so in this article I am going to describe them and delve into details of some of them. This way, I aim to consolidate information that is scattered across multiple Microsoft pages and provide links to all of them. The article is intended for advanced programmers who use DirectX 12 and are interested in the latest developments of the API and its surrounding ecosystem, including features that are currently in preview mode and will be included in future retail versions.
This is the only feature added to both the retail and preview versions of the new SDK. The article announcing it is: Agility SDK 1.716.0-preview & 1.615-retail: Shader hash bypass. A more extensive article explaining this feature is available here: Validator Hashing.
The problem:
If you use DirectX 12, you most likely know that shaders are compiled in two stages. First, the source code in HLSL (High-Level Shading Language) is compiled using the Microsoft DXC compiler into an intermediate binary code. This often happens offline when the application is built. The intermediate form is commonly referred to as DXBC (as the container format and the first 4 bytes of the file) or DXIL (as the intermediate language of the shader code, somewhat similar to SPIR-V or LLVM IR). This intermediate code is then passed to a DirectX 12 function that creates a Pipeline State Object (PSO), such as ID3D12Device::CreateGraphicsPipelineState. During this step, the second stage of compilation occurs within the graphics driver, converting the intermediate code into machine code (ISA) specific to the GPU. I described this process in more detail in my article Shapes and forms of DX12 root signatures, specifically in the "Shader Compilation" section.
What you may not know is that the intermediate compiled shader blob is digitally signed by the DXC compiler using a hash embedded within it. This hash is then validated during PSO creation, and the function fails if the hash doesn’t match. Moreover, despite the DXC compiler being open source and hosted on github.com/microsoft/DirectXShaderCompiler, the signing process is handled by a separate library, "dxil.dll", which is not open source.
If you only use the DXC compiler provided by Microsoft, you may never encounter any issues with this. I first noticed this problem when I accidentally used "dxc.exe" from the Vulkan SDK instead of the Windows SDK to compile my shaders. This happened because the Vulkan SDK appeared first in my "PATH" environment variable. My shaders compiled successfully, but since the closed-source "dxil.dll" library is not distributed with the Vulkan SDK, they were not signed. As a result, I couldn’t create PSO objects from them. As the ecosystem of graphics APIs continues to grow, this could also become a problem for libraries and tools that aim to generate DXIL code directly, bypassing the HLSL source code and DXC compiler. Some developers have even reverse-engineered the signing algorithm to overcome this obstacle, as described by Stephen Gutekanst / Hexops in this article: Building the DirectX shader compiler better than Microsoft?.
The solution:
With this new SDK release, Microsoft has made two significant changes:
01010101010101010101010101010101 for "BYPASS", 02020202020202020202020202020202 for "PREVIEW_BYPASS".Technologies that generate DXIL shader code can now use either of these methods to produce a valid shader.
The capability to check whether this new feature is supported is exposed through D3D12_FEATURE_DATA_BYTECODE_BYPASS_HASH_SUPPORTED::Supported. However, it appears to be implemented entirely at the level of the Microsoft DirectX runtime rather than the graphics driver, as it returns TRUE on every system I tested.
One caveat is that "dxil.dll" not only signs the shader but also performs some form of validation. Microsoft didn’t want to leave developers without the ability to validate their shaders when using the bypass hash. To address this, they have now integrated the validation code into the D3D Debug Layer, allowing shaders to be validated as they are passed to the PSO creation function.
This feature is only available in the preview SDK version. The article announcing it is: Agility SDK 1.716.0-preview: Tight Alignment of Resources. There is also specification: Direct3D 12 Tight Placed Resource Alignment, but it very low level, describing even the interface for the graphics driver.
The problem:
This one is particularly interesting to me, as I develop the D3D12 Memory Allocator and Vulkan Memory Allocator libraries, which focus on GPU memory management. In DirectX 12, buffers require alignment to 64 KB, which can be problematic and lead to significant memory waste when creating a large number of very small buffers. I previously discussed this issue in my older article: Secrets of Direct3D 12: Resource Alignment.
The solution:
This is one of many features that the Vulkan API got right, and Microsoft is now aligning DirectX 12 in the same direction. In Vulkan, developers need to query the required size and alignment of each resource using functions like vkGetBufferMemoryRequirements, and the driver can return a small alignment if supported. For more details, you can refer to my older article: Differences in memory management between Direct3D 12 and Vulkan. Microsoft is now finally allowing buffers in DirectX 12 to support smaller alignments by introducing the following new API elements:
D3D12_FEATURE_DATA_TIGHT_ALIGNMENT::SupportTier.D3D12_RESOURCE_FLAG_USE_TIGHT_ALIGNMENT, to the description of the resource you are about to create.ID3D12Device::GetResourceAllocationInfo, the function may now return an alignment smaller than 64 KB. As Microsoft states: "Placed buffers can now be aligned as tightly as 8 B (max of 256 B). Committed buffers have also had alignment restrictions reduced to 4 KiB."I have already implemented support for this new feature in the D3D12MA library. Since this is a preview feature, I’ve done so on a separate branch for now. You can find it here: D3D12MemoryAllocator branch resource-tight-alignment.
This feature requires support from the graphics driver, and as of today, no drivers support it yet. The announcement article mentions that AMD plans to release a supporting driver in early February, while other GPU vendors are also interested and will support it in an "upcoming driver" or at some indefinite point in the future - similar to other preview features described below.
However, testing is possible right now using the software (CPU) implementation of DirectX 12 called WARP. Here’s how you can set it up:
Microsoft has also shared a sample application to test this feature: DirectX-Graphics-Samples - HelloTightAlignment.
This feature is only available in the preview SDK version. The article announcing it is: Agility SDK 1.716.0-preview: Application Specific Driver State. It is intended for capture-replay tools rather than general usage in applications.
The problem:
A graphics API like Direct3D or Vulkan serves as a standardized contract between a game, game engine, or other graphics application, and the graphics driver. In an ideal world, every application that correctly uses the API would work seamlessly with any driver that correctly implements the API. However, we know that software is far from perfect and often contains bugs, which can exist on either side of the API: in the application or in the graphics driver.
It’s no secret that graphics drivers often detect specific popular or problematic games and applications to apply tailored settings to them. These settings might include tweaks to the DirectX 12 driver or the shader compiler, for example. Such adjustments can improve performance in cases where default heuristics are not optimal for a particular application or shader, or they can provide workarounds for known bugs.
For the driver to detect a specific application, it would be helpful to pass some form of application identification. Vulkan includes this functionality in its core API through the VkApplicationInfo structure, where developers can provide the application name, engine name, application version, and engine version. DirectX 12, however, lacks this feature. The AMD GPU Services (AGS) library adds this capability with the AGSDX12ExtensionParams structure, but this is specific to AMD and not universally adopted by all applications.
Because of this limitation, DirectX 12 drivers must rely on detecting applications solely by their .exe file name. This can cause issues with capture-replay tools such as PIX on Windows, RenderDoc or GFXReconstruct. These tools attempt to replay the same sequence of DirectX 12 calls but use a different executable name, which means driver workarounds are not applied.
Interestingly, there is a workaround for PIX that you can try if you encounter issues opening or analyzing a capture:
mklink WinPixEngineHost.exe ThatGreatGame.exeThis way, PIX will use "WinPixEngineHost.exe" to launch the DirectX 12 workload, but the driver will see the original executable name. This ensures that the app-specific profile is applied, which may resolve the issue.
The solution:
With this new SDK release, Microsoft introduces an API to retrieve and apply an "application-specific driver state." This state will take the form of an opaque blob of binary data. With this feature and a supporting driver, capture-replay tools will hopefully be able to instruct the driver to apply the same app-specific profile and workarounds when replaying a recorded graphics workload as it would for the original application - even if the executable file name of the replay tool is different. This means that workarounds like the one described above will no longer be necessary.
The support for this feature can be queried using D3D12_FEATURE_DATA_APPLICATION_SPECIFIC_DRIVER_STATE::Supported. Since this feature is intended for tools rather than typical graphics applications, I won’t delve into further details here.
This feature is only available in the preview SDK version. The article announcing it is: Agility SDK 1.716.0-preview: Recreate At GPUVA. It is intended for capture-replay tools rather than general usage in applications.
The problem:
Graphics APIs are gradually moving toward the use of free-form pointers, known as GPU Virtual Addresses (GPUVA). If such pointers are embedded in buffers, capture-replay tools may struggle to replay the workload accurately, as the addresses of the resources may differ in subsequent runs. Microsoft mentions that in PIX, they intercept the indirect argument buffer used for ExecuteIndirect to patch these pointers, but this approach may not always be fully reliable.
The solution:
With this new SDK release, Microsoft introduces an API to retrieve the address of a resource and to request the creation of a new resource at a specific address. To ensure that no other resources are assigned to the intended address beforehand, there will also be an option to reserve a list of GPUVA address ranges before creating a Direct3D 12 device.
The support for this feature can be queried using D3D12_FEATURE_DATA_D3D12_OPTIONS20::RecreateAtTier. Since this feature is intended for tools rather than typical graphics applications, I won’t delve into further details here.
This is yet another feature that Vulkan already provides, while Microsoft is only now adding it. In Vulkan, the ability to recreate resources at a specific address was introduced alongside the VK_KHR_buffer_device_address extension, which introduced free-form pointers. This functionality is provided through "capture replay" features, such as the VkBufferOpaqueCaptureAddressCreateInfo structure.
This feature works automatically and does not introduce any new API. It improves performance by passing some DirectX 12 function calls directly to the graphics driver, bypassing intermediate functions in Microsoft’s DirectX 12 runtime code.
If I understand it correctly, this appears to be yet another feature that Vulkan got right, and Microsoft is now catching up. For more details, see the article Architecture of the Vulkan Loader Interfaces, which describes how dynamically fetching pointers to Vulkan functions using vkGetInstanceProcAddr and vkGetDeviceProcAddr can point directly to the "Installable Client Driver (ICD)," bypassing "trampoline functions."
There are also some additions to D3D12 Video. The article announcing them is: Agility SDK 1.716.0-preview: New D3D12 Video Encode Features. However, since I don’t have much expertise in D3D12 Video, I won’t describe them here.
Microsoft also released new versions of PIX that support all these new features from day 0! See the announcement article for PIX version 2501.30 and 2501.30-preview.
Queries for the new capabilities added in this update to the Agility SDK (both retail and preview versions) have already been integrated into the D3d12info command-line tool, the D3d12infoGUI tool, and the D3d12infoDB online database of DX12 GPU capabilities. You can contribute to this project by running the GUI tool and submitting your GPU’s capabilities to the database!