I recently got a new Windows PC, which I use for development. I work on a game based on Unreal Engine, and I build both the game and the engine from C++ source code using Visual Studio. From the very beginning, I had an annoying problem with this machine: every first launch of the game took almost five minutes. I don’t mean loading textures, other assets, or getting into gameplay. I mean the time from launching the app to seeing anything on the screen indicating that it had even started loading. I had to wait that long every time I started or restarted the system. Subsequent launches were almost instantaneous, since I use a fast M.2 SSD. Something was clearly slowing down the first launch.
Solution: open Windows Settings and disable Smart App Control. This is a security feature that Microsoft enables by default on fresh Windows installations, and it can severely slow down application launches. If you installed your system a long time ago, you may not have it enabled. Once you turn it off, it cannot be turned back on - but that’s fine for me.
Full story: I observed my game taking almost five minutes to launch for the first time after every system restart. Before I found the solution, I tried many things to debug the problem. When running the app under the Visual Studio debugger, I noticed messages like the following slowly appearing in the Output panel:
'UnrealEditor-Win64-DebugGame.exe' (Win32): Loaded (...)Engine\Binaries\Win64\UnrealEditor-(...).dll. Symbols loaded.
That’s how I realized that loading each .dll was what took so long. In total, launching the Unreal Engine editor on my system requires loading 914 unique .exe and .dll files.
At first, I blamed the loading of debug symbols from .pdb files, but I quickly ruled that out, because launching the game without the debugger attached (Ctrl+F5 in Visual Studio) was just as slow - only without any indication of what the process was doing during those five minutes before anything appeared on the screen.
Next, I started profiling this slow launch to see what was happening on the call stack. I used the Very Sleepy profiler, as well as Concurrency Visualizer extension for Visual Studio. However, I didn’t find anything unusual beyond standard LoadLibrary calls and other generic system functions. That led me to suspect that something was happening in kernel space or in another process, while my process was simply blocked, waiting on each
DLL load.
Naturally, my next assumption was that some security feature was scanning every .dll file for viruses. I opened Windows Settings → Protection & security → Virus & threat protection and added the folder containing my project’s source code and binaries to the exclusions list. That didn’t help. I then completely disabled real-time protection and the other toggles on that page. That didn’t help either. For completeness, I should add that I don’t have any third-party antivirus software installed on this machine.
I was desperate to find a solution, so I thought: what if I wrote a separate program that calls LoadLibrary on each .dll file required by the project, in parallel, using multiple threads? Would that “pre-warm” whatever scanning was happening, so that launching the game afterward would be instantaneous?
I saved the debugger log containing all the “Loaded … .dll” messages to a text file and wrote a small C++ program to process it, calling LoadLibrary on each entry. It turned out that doing this on multiple threads didn’t help at all - it still took 4–5 minutes. Apparently, there must be some internal mutex preventing any real parallelism within a single process.
Next, I modified the tool to spawn multiple separate processes, each responsible for loading every N-th .dll file. That actually helped. Processing all files this way took less than a minute, and afterward I could launch my game quickly. Still, this was clearly just a workaround, not a real solution.
I lived with this workaround for weeks, until I stumbled upon an article about the Smart App Control feature in Windows while reading random IT news. I immediately tried disabling it - and it solved the problem completely.
Apparently, Microsoft is trying to improve security by scanning and potentially blocking every executable and .dll library before it loads, likely involving sending it to their servers, which takes a very long time. I understand the motivation: these days, most users launch only a web browser and maybe one or two additional apps like Spotify, so every newly seen executable could indeed be malware trying to steal their banking credentials. However, for developers compiling and running large software projects with hundreds of binaries, this results in an egregious slowdown.
This article is about a quite niche topic - the functions ClearUnorderedAccessViewUint and ClearUnorderedAccessViewFloat of the ID3D12GraphicsCommandList interface. You may be familiar with them if you are a programmer using DirectX 12. Their official documentation - ClearUnorderedAccessViewUint and ClearUnorderedAccessViewFloat - provides some details, but there is much more to say about their behavior. I could not find sufficiently detailed information anywhere on the Internet, so here is my take on this topic.
The two functions discussed here allow “clearing” a buffer, or a subregion of it, by setting every element to a specific numeric value (sometimes also called “splatting” or “broadcasting”). The function ClearUnorderedAccessViewUint accepts a UINT[4] array, while function ClearUnorderedAccessViewFloat accepts a FLOAT[4] array. They are conceptually similar to the functions ClearRenderTargetView and ClearDepthStencilView, which we commonly use for clearing textures. In the realm of CPU code, they can also be compared to the standard function memset.
These functions work with typed buffers. Buffer views can come in three flavors:
DXGI_FORMAT, just like pixels for a texture. For example, using DXGI_FORMAT_R32G32B32R32_FLOAT means each element has four floats (x, y, z, w), 16 bytes in total.uint at a time.I plan to write a more comprehensive article about buffers in DX12 and their types. For now, I recommend the excellent article Let’s Close the Buffer Zoo by Joshua Barczak for more information. In my article below, we will use only typed buffers.
The functions ClearUnorderedAccessViewUint and ClearUnorderedAccessViewFloat have a quite inconvenient interface, requiring us to provide two UAV descriptors for the buffer we are about to clear: a GPU handle in a shader-visible descriptor heap and a CPU handle in a non-shader-visible descriptor heap. This means we need to have both kinds of descriptor heaps, we need to write the descriptor for our buffer twice, and we need three different descriptor handles - the third one being a CPU handle to the shader-visible heap, which we use to actually create the descriptor using CreateUnorderedAccessView function. The code may look like this:
// Created with desc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV,
// desc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE.
ID3D12DescriptorHeap* shaderVisibleDescHeap = ...
// Created with desc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV,
// desc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_NONE.
ID3D12DescriptorHeap* nonShaderVisibleDescHeap = ...
UINT handleIncrementSize = device->GetDescriptorHandleIncrementSize(
D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV)
UINT descIndexInVisibleHeap = ... // Descriptor index in the heap.
UINT descIndexInNonVisibleHeap = ... // Descriptor index in the heap.
D3D12_GPU_DESCRIPTOR_HANDLE shaderVisibleGpuDescHandle =
shaderVisibleDescHeap->GetGPUDescriptorHandleForHeapStart();
shaderVisibleGpuDescHandle.ptr += descIndexInVisibleHeap * handleIncrementSize;
D3D12_CPU_DESCRIPTOR_HANDLE shaderVisibleCpuDescHandle =
shaderVisibleDescHeap->GetCPUDescriptorHandleForHeapStart();
shaderVisibleCpuDescHandle.ptr += descIndexInVisibleHeap * handleIncrementSize;
D3D12_CPU_DESCRIPTOR_HANDLE nonShaderVisibleCpuDescHandle =
nonShaderVisibleDescHeap->GetCPUDescriptorHandleForHeapStart();
nonShaderVisibleCpuDescHandle.ptr += descIndexInNonVisibleHeap * handleIncrementSize;
D3D12_UNORDERED_ACCESS_VIEW_DESC uavDesc = {};
uavDesc.ViewDimension = D3D12_UAV_DIMENSION_BUFFER;
uavDesc.Format = DXGI_FORMAT_R32G32B32R32_FLOAT; // My buffer element format.
uavDesc.Buffer.FirstElement = 0;
uavDesc.Buffer.NumElements = 1024; // My buffer element count.
ID3D12Resource* buf = ... // My buffer resource.
device->CreateUnorderedAccessView(buf, NULL, &uavDesc, shaderVisibleCpuDescHandle);
device->CreateUnorderedAccessView(buf, NULL, &uavDesc, nonShaderVisibleCpuDescHandle);
UINT values[4] = {1, 2, 3, 4}; // Values to clear.
commandList->ClearUnorderedAccessViewUint(
shaderVisibleGpuDescHandle, // ViewGPUHandleInCurrentHeap
nonShaderVisibleCpuDescHandle, // ViewCPUHandle
buf // pResource
values, // Values
0, // NumRects
NULL); // pRects
Why did Microsoft make it so complicated? We may find the answer in the official function documentation mentioned above, which says: "This is to allow drivers that implement the clear as a fixed-function hardware operation (rather than as a dispatch) to efficiently read from the descriptor, as shader-visible heaps may be created in WRITE_COMBINE memory." I suspect this was needed mostly for older, DX11-class GPUs with more fixed-function hardware, while modern GPUs can read from and write to video memory more freely.
We must also remember to set the shader-visible descriptor heap as the current one using ID3D12GraphicsCommandList::SetDescriptorHeaps before performing the clear. Interestingly, on my RTX 4090 it works even without this step, but this is still incorrect and may not work on a different GPU. The D3D Debug Layer emits an error in this case.
Note this is different from texture clears performed using ClearRenderTargetView and ClearDepthStencilView, where we use Render-Target View (RTV) and Depth-Stencil View (DSV) descriptors, which can never be shader-visible, so they cannot be used in SetDescriptorHeaps. For more information, see my older article: "Secrets of Direct3D 12: Do RTV and DSV descriptors make any sense?".
The functions ClearUnorderedAccessViewUint and ClearUnorderedAccessViewFloat require the buffer to be in the D3D12_RESOURCE_STATE_UNORDERED_ACCESS state, just like when writing to it from a compute shader. If the buffer was in a different state before, we need to issue a transition barrier (D3D12_RESOURCE_BARRIER_TYPE_TRANSITION). If we use the buffer as a UAV before or after the clear, the state doesn't change, but we need to issue an UAV barrier (D3D12_RESOURCE_BARRIER_TYPE_UAV) to make the commands wait for each other. Otherwise, a race condition could occur, as the commands could run in parallel.
These restrictions make buffer clearing with these functions similar to using compute shaders, and different from ClearRenderTargetView and ClearDepthStencilView, which are used for clearing textures, or from copy operations (CopyResource, CopyBufferRegion), which do not require barriers around them. For a more in-depth investigation of this distinction, see my older article: "Secrets of Direct3D 12: Copies to the Same Buffer".
Here comes the main part that inspired me to write this article. I asked myself how are the UINT[4] values passed to the ClearUnorderedAccessViewUint function converted to the values of elements in the buffer, depending on the element format. I could not find any mention of this on the Internet, so I did some experiments. Below, I summarize my findings. Unfortunately, the behavior is inconsistent between GPU vendors! I tested on Nvidia (GeForce RTX 4090, driver 591.44), AMD (Radeon RX 9060 XT, driver 25.11.1), and Intel (Arc B580, driver 32.0.101.8250) - all on Windows 25H2 (OS Build 26200.7462) with DirectX Agility SDK 1.618.3 Retail.
DXGI_FORMAT_R32G32B32A32_UINT, the values are written as-is, because the format matches exactly the values accepted by the function, so for example {1, 2, 3, 4} gets written as 0x00000001, 0x00000002, 0x00000003, 0x00000004, 0x00000001, 0x00000002, 0x00000003, 0x00000004, ...DXGI_FORMAT_R32G32_UINT, from {1, 2, 3, 4}, the buffer will be filled with 0x00000001, 0x00000002, 0x00000001, 0x00000002, ... - only the first 2 components are used.DXGI_FORMAT_R16G16B16A16_UINT:
0x20003 gets written as 0xFFFF.0x0003.UNORM formats:
UINT formats. For example, values {0, 2, 255, 0xFFFFFFFFu} written to a buffer with DXGI_FORMAT_R8G8B8A8_UNORM become {0x00, 0x02, 0xFF, 0xFF}, despite they logically represent {0, 2.0/255.0, 1.0, 1.0}. The normalization logic isn't working here. Note this is different from the Float version of the function described in the next section. However, it makes sense, because otherwise the only useful values in this function would be 0 and 1.0x20003 becomes 0x03.DXGI_FORMAT_R16G16B16A16_SINT, and we use only the lowest 16 bits of the value, like 0xFFF0u to write value -16:
0x7FFF.DXGI_FORMAT_R16G16B16A16_SINT, but this time specify 0xFFFFFFF0 (which is the 32-bit representation of -16):
DXGI_FORMAT_R32G32B32A32_FLOAT, the values are bit-casted to float, e.g. value 0x3F800000u becomes 1.0.To summarize, the ClearUnorderedAccessViewUint function may be useful when we want to set a specific bit pattern to the elements of a buffer in UINT format, but for other formats or out-of-range values the behavior is unreliable and we cannot be sure it won't change in the future.
Here is a similar summary of the behavior of function ClearUnorderedAccessViewFloat, which takes 4 floats as the value, for different formats of buffer elements:
DXGI_FORMAT_R32G32B32A32_FLOAT, the values are written as-is, because the format matches exactly the values accepted by the function.{1.0f, 2.0f, 3.0f, 4.0f}, but the format has only 2 components, like DXGI_FORMAT_R32G32_FLOAT, only {1.0f, 2.0f} is written repeatedly.{1.0f, -1.0f, 0.5f, 2.0f} in DXGI_FORMAT_R16G16B16A16_FLOAT become 0x3C00, 0xBC00, 0x3800, 0x4000.{0.0f, 123.0f, -1.0f, -10.5f} written to a buffer with DXGI_FORMAT_R32G32B32A32_SINT becomes {0, 123, -1, -10}.{0, 0x42f60000, 0xbf800000, 0xc1200000}, which are floating-point values specified on input, just directly bit-casted.-100.0 becomes 0xFFFFFF9C.0xC2C80000.{0.0f, 1.0f, 123.0f, 1000.f} written to a buffer with DXGI_FORMAT_R8G8B8A8_UINT becomes {0, 1, 123, 255}.{0x00, 0xFF, 0xFF, 0xFF}.UNORM or SNORM, values are converted and clamped, respecting its logical representation with normalization. For example, with DXGI_FORMAT_R8G8B8A8_UNORM, values 0.0...1.0 are mapped to 0...255 and values above 1.0 become 255. Note this is different from the Uint version of the function described above. The Float version is better to work with such formats.To summarize, the ClearUnorderedAccessViewFloat function is useful when we want to set a specific numeric value, correctly converted to the specific format, especially when it's FLOAT, UNORM, SNORM. For consistent behavior across GPUs, we should avoid using it with UINT, SINT formats.
If we want to limit the range of elements to clear, we have 2 equivalent ways of doing so:
1. Set the limit when filling the descriptor:
D3D12_UNORDERED_ACCESS_VIEW_DESC uavDesc = {};
uavDesc.ViewDimension = D3D12_UAV_DIMENSION_BUFFER;
uavDesc.Format = ...
uavDesc.Buffer.FirstElement = firstElementIndex; // !!!
uavDesc.Buffer.NumElements = elementCount; // !!!
device->CreateUnorderedAccessView(...
2. Set the limit as a "rectangle" to clear:
D3D12_RECT rect = {
firstElementIndex, // left
0, // top
firstElementIndex + elementCount, // right
1}; // bottom
commandList->ClearUnorderedAccessViewUint(
shaderVisibleGpuDescHandle, // ViewGPUHandleInCurrentHeap
shaderInvisibleCpuDescHandle, // ViewCPUHandle
buf // pResource
values, // Values
1, // NumRects !!!
&rect); // pRects !!!
Note that in both cases, the boundaries are expressed in entire elements, not bytes.
The behavior I presented above is based on my experiments, as it is not described precisely in the official documentation of ClearUnorderedAccessViewUint and ClearUnorderedAccessViewFloat functions in DX12. The state of DX12 documentation in general is somewhat messy, as I described in my recent post "All Sources of DirectX 12 Documentation". Normally, when something is not defined in DX12 documentation, we might resort to the older DX11 documentation. In this case, however, that would be misleading, because DX12 behaves differently from DX11:
DXGI_FORMAT_R32_UINT, so {1, 2, 3, 4} is written as 0x00000001, 0x00000001, ....DXGI_FORMAT enum (shared by DX11 and DX12). There are no 3-byte formats at all, and 8-bit floating-point numbers were not supported by GPUs until recently (as I described in my article "FP8 data type - all values in a table").100.0, -100.0 to a buffer in a normalized format like DXGI_FORMAT_R16G16_SNORM issues no error and clamps the value to the minimum/maximum representing +1.0 / -1.0, becoming 0x7FFF, 0x8001 in our case.Every API needs documentation. Even more so in the case of a graphics API, where there is no single implementation (like in the case of a specific library), but countless users of the API (video games and other graphics apps) and several implementers on the other side of the API (graphics drivers for GPUs from various vendors like AMD, Intel, and Nvidia).
Vulkan documentation, for example, is very extensive, detailed, and precise. Sure, it is not perfect, but it's getting better over time. It's also very formal and difficult to read, but that's how a reference specification should be. For learning the basics, third-party tutorials are better. Documentation is needed for more advanced, day-to-day work with the API. I like to think of the documentation as law. A software bug is like a crime. When the application crashes, you as a programmer are a detective investigating "who killed it". You check the specification to see if the app "broke the law" by using the API incorrectly - meaning your app is guilty of the bug - or whether the usage is correct and the culprit is on the other side: a bug in the graphics driver. There are, of course, some gray areas and unclear situations as well.
Direct3D 12, unfortunately, doesn't have just one main documentation. In this post, I would like to gather and describe links to all official documents that describe the API... and also complain a bit about the state of all this.
1. Direct3D 12 programming guide @ learn.microsoft.com
This looks like the main page of the D3D12 documentation. Indeed, we can find many general chapters there describing various concepts of the API, as well as the API reference for individual interfaces and methods. For example:
ID3D12Device, ID3D12Device1, ..., 10.D3D12_FEATURE_D3D12_OPTIONS20, but it links only to documentation up to the D3D12_FEATURE_DATA_D3D12_OPTIONS11 structure. By manually changing the URL, I found the documentation of 12 and 13 as well, although they look quite empty or “TBD”. At the same time, the latest Agility SDK already defines option structures up to version 21.But there are also hidden gems - sections that, in my opinion, deserve separate pages, yet are buried inside the documentation of specific API elements. For example:
2. Direct3D 11.3 Functional Specification
The documentation linked in point 1 is not fully complete. Direct3D 12, although revolutionary and not backward-compatible, still builds on top of Direct3D 11 in some ways. For that older API, there is this one long and comprehensive document. Sometimes you may need to resort to that specification to find answers to more advanced questions. For example, I remember searching it to find out the alignment requirements for elements of a vertex or index buffer. Be aware through that the parts of this document that apply to D3D12 are only those that D3D12 documentation doesn't define, and that are applicable to D3D12 at all.
3. github.com/microsoft/DirectX-Specs
On the other hand, recent updates to DirectX 12 are also not included in the documentation mentioned in point 1, as Microsoft now puts their new documents in a GitHub repository. You can find .md files there describing new features added in newer versions of the DirectX 12 Agility SDK - from small ones like ID3D12InfoQueue1, to very large ones like DirectX Raytracing (DXR) or Work Graphs. This repository also provides pages describing what's new in each shader model, starting from 6.0, 6.1, 6.2, etc... up to 6.8 (at the moment I’m writing this post). A convenient way to read these docs is through link: microsoft.github.io/DirectX-Specs/.
4. github.com/microsoft/DirectXShaderCompiler/wiki
Then there is the HLSL shader language and its compiler: DXC. Microsoft also maintains documentation for the compiler and the shader language in a separate GitHub repo, this time using the GitHub Wiki feature. There, we can find descriptions of language features like 16 Bit Scalar Types, what's new in each major HLSL language version (2015, 2016, ..., 2021 - see Language Versions), and... again a list of what has been added in recent shader models (see Shader Model).
5. github.com/microsoft/hlsl-specs
When it comes to the HLSL language itself, it’s sometimes hard to tell what code is correct and supported, because there is no fully formal specification like there is for C++, for example. There is only the High-level shader language (HLSL) section of the documentation mentioned in point 1, which briefly describes elements of the syntax. However, Microsoft recently started writing new documentation for HLSL, which can be found in yet another GitHub repository that is most convenient to read online at microsoft.github.io/hlsl-specs/.
I should also mention the DirectX Developer Blog, which is worth following for the latest news about new releases of the Agility SDK and recent additions to the API, as well as updates on related projects like PIX, DirectStorage, and DirectSR (which is pretty much dead now - it was removed from the preview Agility SDK before reaching the retail version). The blog also features nice standalone articles, such as Getting Started with the Agility SDK or the HLSL 2021 Migration Guide, which could easily be part of the main documentation.
As one example I stumbled upon just last week: the description of ByteAddressBuffer at learn.microsoft.com mentions that it has methods Load, Load2, Load3, Load4 that read uint values from a buffer. But to learn that modern HLSL versions also support templated Load<MyType>, I had to go to a separate document ByteAddressBuffer Load Store Additions on the DirectXShaderCompiler Wiki - which describes only that specific addition.
What a mess! Why is the DirectX 12 documentation so scattered across so many websites in different shapes and forms? Of course, I don't know - I don't work at Microsoft. But having worked at big companies for more than 10 years, it isn’t shocking to me. I can imagine how things like this happen. First, engineering managers, project/program managers, and other decision-makers tend to focus on adding new features (everyone wants to “build their pyramid”) while also moving quickly and cutting costs. Creating good documentation is not a top priority. Then, there is Conway’s Law, which states that “Organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations.” So if there are separate teams developing DXC, the Agility SDK, etc., they will likely want their own outlets for publishing documentation, while no one takes responsibility for the overall user experience. Still, seeing new initiatives like the HLSL specification, I’m hopeful that things will get better over time.
Finally, DirectX Landing Page is also worth mentioning, as it gathers links to many SDKs, tools, helpers, samples, and other projects related to DirectX.
I recently stumbled upon a problem while trying to install the Epic Games Launcher on a fresh installation of Windows 11. The installation wizard was showing the message: “Epic Games Launcher Wizard ended prematurely because of an error.” and the launcher wasn’t installing. Trying to install it from the Microsoft Store was also failing, showing the error code 0x8A150049.
Solution: Create a different user account - one without a space in the username. Change its type to Administrator. Sign out, sign in to that account, and use it to install the launcher. After that, you can return to your main account and delete the temporary one. The Epic Games Launcher will remain installed and ready to use.
Full story: I got a new PC with a fresh installation of Windows 11. I started installing all the necessary software and my programming environment. (For the list of Windows apps I recommend, see my older post: My Favorite Windows Apps.) When I tried to install the Epic Games Launcher, I was surprised that after completing the setup wizard, the app didn’t appear in my system. Only on the second attempt did I carefully read the message that appeared on the final page:
Epic Games Launcher Wizard ended prematurely because of an error.
I searched the Internet and tried many solutions, but none of them helped:
msiexec /i PATH\EpicInstaller-18.12.1.msi as Administrator.Finally, somewhere on the Internet I found information that the installer leaves a text log file in "c:\Users\MY_LOGIN\AppData\Local\Epic Games\Epic Online Services\EOSInstaller\Logs\EOSInstaller-DATE-TIME.log". I opened it and found the following messages inside:
[2025.11.04-10.00.56:329][---]Log file opened.
[2025.11.04-10.00.56:329][---]FApplication: Version 1.2.26 ran with extract=C:\Users\Adam Sawicki\AppData\Local\Temp\7a4515cf-dde6-44f9-afb4-b5b1e0dee697
[2025.11.04-10.00.56:348][---]FApplication: Extract mode
[2025.11.04-10.00.56:349][---]FApplication: Extracting bundled MSI
[2025.11.04-10.00.56:349][---]FApplication: Could not create temp directory "C:\\Users\\Adam" system:183
[2025.11.04-10.00.56:349][---]FApplication: Failed to build MSI
[2025.11.04-10.00.56:349][---]Log file closed.
The line "Could not create temp directory "C:\Users\Adam"" gave me a clue that the installer likely fails because of the space in my Windows username, which is “Adam Sawicki”. That’s how I came up with the solution of using a Windows account without a space in the username.
After all, this is clearly a bug in Epic’s code. They shouldn’t rely on whether a username contains spaces or other characters. They probably just forgot to properly escape the path with quotation marks (" ") somewhere in their code. Epic, please fix it!
This article will be short and straight to the point. While working with geometry in 2D, I was recently looking for an algorithm to calculate the bounding box of a specific shape that I initially called a "cone". Actually, as I'm talking about 2D, I should rather say I needed the bounding rectangle of a circular sector - a part of a circle with a limited angle around an axis pointing in a specific direction.
When developing a 2D game, this shape can represent, for example, the area of effect of an attack, such as punching nearby enemies, firing a shotgun, spraying some substance ahead, or casting a magical spell. Calculating its bounding rectangle can be useful for querying a space-partitioning data structure (like a grid, a quadtree, etc.) for potentially affected objects.
I prototyped my solution in ShaderToy, which you can see here: shadertoy.com/view/w3jcRw.
A circular sector is described by:
vec2 apex - the starting point and the center of the circle that this shape is part ofvec2 direction - a vector pointing in the direction of the axis (must be normalized)float halfAngle - the angle between the axis and the edges, or half of the angle between the opposing edges (in radians, in range 0...π)
float radius - the radius of the circle that this shape is part ofThe output bounding rectangle is described by just vec2 MinPos, MaxPox - two points defining the minimum and maximum coordinates it contains.
To calculate the bounding rectangle of our cone, we need to consider all possible points that extend the furthest along the X and Y axes, and take their min/max. The first such point is the apex. The next two are what I call "edge points."

However, there are cases where this is not enough. We also need to check four "extra points" located at a distance of radius from the apex along -X, +X, -Y, +Y, as long as each of these points belongs to the cone.

My final algorithm in GLSL is:
void CalcConeBoundingRect(vec2 apex, vec2 direction, float halfAngle, float radius,
out vec2 boundingRectMinPos, out vec2 boundingRectMaxPos)
{
float sinHalfAngle = sin(halfAngle);
float cosHalfAngle = cos(halfAngle);
vec2 edgeVec1 = vec2(
direction.x * cosHalfAngle - direction.y * sinHalfAngle,
direction.y * cosHalfAngle + direction.x * sinHalfAngle);
vec2 edgeVec2 = vec2(
direction.x * cosHalfAngle + direction.y * sinHalfAngle,
direction.y * cosHalfAngle - direction.x * sinHalfAngle);
vec2 edgePoint1 = apex + edgeVec1 * radius;
vec2 edgePoint2 = apex + edgeVec2 * radius;
boundingRectMinPos = min(min(edgePoint1, edgePoint2), apex);
boundingRectMaxPos = max(max(edgePoint1, edgePoint2), apex);
vec2 unitVec[4] = vec2[](
vec2(-1.0, 0.0), vec2(1.0, 0.0),
vec2(0.0, -1.0), vec2(0.0, 1.0));
for(int i = 0; i < 4; ++i)
{
if(dot(unitVec[i], direction) >= cosHalfAngle)
{
vec2 extraPoint = apex + unitVec[i] * radius;
boundingRectMinPos = min(boundingRectMinPos, extraPoint);
boundingRectMaxPos = max(boundingRectMaxPos, extraPoint);
}
}
}
Note that we don't use raw angles here, apart from the initial parameter. We don't call the atan2 function, nor do we compare whether one angle is smaller than another. We simply operate on vectors - a common theme in well-designed geometric algorithms.
The algorithm can be optimized further if we store the sine and cosine of the angle in advance. Alternatively, if we have only one of them, we can compute the other using the formula below. This way, we never need to use the raw angle value at all.
float sinHalfAngle = sqrt(1.0 - cosHalfAngle * cosHalfAngle);
EDIT: Big thanks to Matthew Arcus for suggesting an improvement to the code! I applied it to the listing above.
This is a guest post from my friend Łukasz Izdebski Ph.D.
It’s been a while since my last guest post on Adam’s blog, but I’m back with something short and practical—think of it as an epilogue to this earlier post on Bézier curves in animation. The last post focused on the theory and mathematics behind Bézier curves. What it lacked was a practical perspective—an opportunity to see the implementation in action. I wanted to share with you a simple library that I have created. Its purpose is to directly represent cubic Bézier Curves as Easing Functions.
The library is designed with C++20 and newer standards in mind, taking advantage of modern language features for clarity and performance. If needed, support for earlier versions of C++ can be added to ensure broader compatibility.
EasingCubicBezier<T>. This class handles the interpolation of parameters used in the keyframe method. The interpolation of parameters follows the same principles as standard Bézier curve evaluation.evaluate function with a parameter t, which should lie between x0 (the X coordinate of the first control point, representing the start time of the frame) and x3 (the X coordinate of the fourth control point, representing the end time). As presented in previous blog post, the EasingCubicBezier character as a easing function depends solely on the X coordinates of the control points.
The tests were prepared for a single, fixed value of the Y coordinates of the Bézier curve control points (their value does not affect the interpolation performance in any way), and for a set of 256 different variants of the X coordinates of the control points.
The aim was to cover as wide a range of control point locations as possible (in particular, the two inner points).
Performance measurements were carried out using the Google Benchmark framework, ensuring reliable and consistent results. Further details and test results are available in the library repository.
The new approach using EasingCubicBezier<T> has been benchmarked against two commonly used methods in game engines and graphics applications. Both of these alternatives rely on solving cubic polynomial equations, either through algebraic solutions or numerical techniques.
In the case of numerical methods, a critical factor is the choice of the initial starting point. This selection plays a major role in determining the algorithm’s convergence speed and stability.
The following tests compared 5 different algorithms:
0.5 (because the X coordinates of the curve were previously normalised to the interval [0, 1]).t, where t is the input parameter for which the Bézier curve interpolation is being evaluated.The chart and the table below presents the benchmark results using a box plot, highlighting the distribution and variability of each algorithm’s performance in PRECISE mode with AVX2 extensions turned On.
| Algorithm | Min | Q1 | Median | Q3 | Max | Average | Std dev |
|---|---|---|---|---|---|---|---|
| Easing Cubic Bezier | 1562.5 | 21875.0 | 32812.5 | 32812.5 | 39062.5 | 28991.7 | 7803.5 |
| Numeric Solution 1 | 7812.5 | 17187.5 | 23437.5 | 53515.6 | 150000.0 | 40856.9 | 32900.5 |
| Numeric Solution 2 | 4687.5 | 15625.0 | 21093.8 | 52343.8 | 173438.0 | 37292.5 | 31220.9 |
| Original Blender | 40625.0 | 54687.5 | 56250.0 | 56250.0 | 60937.5 | 55096.4 | 2659.2 |
| Optimised Blender | 12500.0 | 40625.0 | 42187.5 | 43750.0 | 45312.5 | 41003.4 | 5931.3 |
The chart and the table below presents the benchmark results using a box plot, highlighting the distribution and variability of each algorithm’s performance in FAST mode mode with AVX2 extensions turned On.
| Algorithm | Min | Q1 | Median | Q3 | Max | Average | Std dev |
|---|---|---|---|---|---|---|---|
| Easing Cubic Bezier | 3125.0 | 15625.0 | 21875.0 | 23437.5 | 23437.5 | 19714.4 | 4838.2 |
| Numeric Solution 1 | 7812.5 | 17187.5 | 23437.5 | 59375.0 | 331250.0 | 42059.3 | 37539.5 |
| Numeric Solution 2 | 4687.5 | 15625.0 | 20312.5 | 48437.5 | 156250.0 | 36926.3 | 31357.5 |
| Original Blender | 32812.5 | 44921.9 | 50000.0 | 50000.0 | 50000.0 | 47418.2 | 3817.4 |
| Optimised Blender | 12500.0 | 35937.5 | 42187.5 | 42187.5 | 43750.0 | 39953.6 | 5554.2 |
The table below summarizes the key conclusions drawn from the benchmark tests.
| Algorithm | Performance | Variation | Conclusions |
|---|---|---|---|
| Easing Cubic Bezier | Very stable and consistently low execution time | Minimal | Most predictable and effective in typical use cases |
| Numeric Solution 1 | Highly variable — ranging from excellent to extremely slow | Huge, with many outliers | Efficient in some cases, but unstable and prone to severe slowdowns |
| Numeric Solution 2 | Similar to Numeric Solution 1, but with more symmetrical behavior | Large, but less extreme | More balanced overall, though still susceptible to performance issues |
| Original Blender | High execution time | Very small | Stable and predictable; useful when consistency is more important than speed |
| Optimised Blender | Moderate execution time | Small | A good compromise between speed and stability |
By representing Bézier curves explicitly in just 28 bytes (float) or 56 bytes (double) using the proposed method, this approach delivers both speed and stability—making it ideal for real-time animation systems. By storing the curve in this form, runtime execution becomes straightforward: it directly interpolates parameter values without the need to solve cubic polynomial equations.This eliminates the overhead typically associated with solving cubic polynomials during runtime.
The cost of determining the interpolating function corresponding to a given Bézier curve is deferred to the construction of an EasingCubicBezier
This is just the beginning of my journey with easing functions. I am working on another solution, whose main goal will be maximum performance in runtime, while maintaining flexibility comparable to that offered by cubic Bézier curves.
Stay tuned!
I believe that in today’s world, e-mail newsletters still make a lot of sense. Back in the early days of the Internet - before search engines like Google became truly effective - there were websites that provided manually curated catalogs of links, organized into categories and subcategories. Later, full-text search engines such as Yahoo and Google took over, making it easy to find almost anything online.
But now, with the overwhelming flood of new content published every day, aggressive Search Engine Optimization (SEO) tactics, and the rise of AI-generated noise, I find it valuable to rely on trusted people who periodically curate and share the most interesting articles and videos within a specific field.
So here’s my question for you: Which programming-related newsletters do you recommend? I’m especially interested in those covering C++, graphics rendering, game development, and similar topics.
Here is my current list:
EDIT: There are additional newsletters recommended in comments under my social media posts on X/Twitter and LinkedIn.
By the way, I still use RSS/Atom feeds to follow interesting websites and blogs. Not every site offers one, but when they do, it’s a convenient way to aggregate recent posts in a single place. For this, I use the free online service Feedly.
If you also follow news feeds this way, you can subscribe to the
Atom feed of my blog.
I also use the social bookmarking service Pinboard. You can browse my public links about graphics under the tags rendering and graphics. Some of these links point to individual articles, while others lead to entire websites or blogs.
If you’re programming graphics using modern APIs like DirectX 12 or Vulkan and you're working with an AMD GPU, you may already be familiar with the Radeon Developer Tool Suite. In this article, I’d like to highlight one of the tools it includes - Driver Experiments - and specifically focus on two experiments that can help you debug AMD-specific issues in your application, such as visual glitches.

Not an actual screenshot from a game, just an illustration.
Before diving into the details, let’s start with the basics. Driver Experiments is one of the tabs available in the Radeon Developer Panel, part of the Radeon Developer Tool Suite. To get started:
The Driver Experiments tool provides a range of toggles that control low-level driver behavior. These settings are normally inaccessible to anyone outside AMD and are certainly not intended for end users or gamers. However, in a development or testing environment - which is our focus here - they can be extremely valuable.
Comprehensive documentation for the tool and its individual experiments is available at GPUOpen.com: Radeon Developer Panel > Features > Driver Experiments.
When using these settings, please keep in mind the following limitations:
Among the many available experiments, some relate to enabling or disabling specific API features (such as ray tracing or mesh shaders), while others target internal driver optimizations. These toggles can help diagnose bugs in your code, uncover optimization opportunities, or even verify suspected driver issues. In the next section, I’ll describe two experiments that I find especially helpful when debugging problems that tend to affect AMD hardware more frequently than other vendors.
This is about a topic I already warned about back in 2015, right after DirectX 12 was released, in my article "Direct3D 12 - Watch out for non-uniform resource index!". To recap: when writing shaders that perform dynamic indexing of an array of descriptors (buffers, textures, samplers), the index is assumed to be scalar - that is, to have the same value across all threads in a wave. For an explanation of what that means, see my old post: "Which Values Are Scalar in a Shader?" When it is not scalar (e.g. it varies from pixel to pixel), we need to decorate it with the NonUniformResourceIndex qualifier in HLSL or the nonuniformEXT qualifier in GLSL:
Texture2D<float4> allTextures[400] : register(t3);
...
float4 color = allTextures[NonUniformResourceIndex(materialIndex)].Sample(
mySampler, texCoords);
The worst thing is that if we forget about NonUniformResourceIndex while the index is indeed non-uniform, we may get undefined behavior, which typically means indexing into the wrong descriptor and results in visual glitches. It won't be reported as an error by the D3D Debug Layer. (EDIT: But PIX can help detect it.) It typically affects only AMD GPUs, while working fine on NVIDIA. This is because in the AMD GPU assembly (ISA) (which is publicly available – see AMD GPU architecture programming documentation) descriptors are scalar, so when the index is non-uniform, the shader compiler needs to generate instructions for a "waterfall loop" that have some performance overhead.
I think that whoever designed the NonUniformResourceIndex qualifier in shader languages is guilty of hours of debugging and frustration for countless developers who stumbled upon this problem. This approach of "performance by default, correctness as opt-in" is not a good design. A better language design would be to do the opposite:
myCB.myConstIndex + 10) and then optimize it.UniformResourceIndex() qualifier, thus declaring that we know what we are doing and we agree to introduce a bug if we don’t keep our promise to ensure the index is really scalar.But the reality is what it is, and no one seems to be working on fixing this. (EDIT: Not fully true, there is some discussion.) That’s where Driver Experiments can help. When you activate the "Force NonUniformResourceIndex" experiment, all shaders are compiled as if every dynamic descriptor index were annotated with NonUniformResourceIndex. This may incur a performance cost, but it can also resolve visual bugs. If enabling it fixes the issue, you’ve likely found a missing NonUniformResourceIndex somewhere in your shaders - you just need to identify which one.
This relates to a topic I touched on in my older post: "Texture Compression: What Can It Mean?". "Compression" in the context of textures can mean many different things. Here, I’m not referring to packing textures in a ZIP file or even using compressed pixel formats like BC7 or ASTC. I’m talking about internal compression formats that GPUs sometimes apply to textures in video memory. These formats are opaque to the developer, lossless, and specific to the GPU vendor and model. They’re not intended to reduce memory usage - in fact, they may slightly increase it due to additional metadata - but they can improve performance when the texture is used. This kind of compression is typically applied to render-target (DX12) or color-attachment (Vulkan) and depth-stencil textures. The decision of when and how to apply such compression is made by the driver and depends on factors like pixel format, MSAA usage, and even texture dimensions.
The problem with this form of compression is that, while invisible to the developer, it can introduce bugs that wouldn’t occur if the texture were stored as a plain, uncompressed pixel array. Two issues in particular come to mind:
(1) Missing or incorrect barrier. Some GPUs may not support certain compression formats for all types of texture usage. Imagine a texture that is first bound as a render target. Rendering triangles to it is optimized thanks to the specialized internal compression. Later, we want to use that texture in a screen-space post-processing pass, sampling it as an SRV (shader resource). In DX12 and Vulkan, this requires inserting a barrier between the two usages. A barrier typically ensures correct execution order - so that the next draw call starts only after the previous one finishes - and flushes or invalidates relevant caches. However, if the GPU doesn’t support the render-target compression format for SRV usage, the barrier must also trigger decompression, converting the entire texture into a different internal format. This step may be slow, but it’s necessary for rendering to work correctly. That’s exactly what D3D12_RESOURCE_STATES and VkImageLayout enums are designed to control.
Now, imagine what happens if we forget to issue this barrier or issue an incorrect one. The texture remains in its compressed render-target format but is then sampled as a shader resource. As a result, we read incorrect data - leading to completely broken output, such as the kind of visual garbage shown in the image above. In contrast, if the driver hadn’t applied any compression, the missing barrier would be less critical because there’d be no format transition required.
(2) Missing or incorrect clear. I discussed this in detail in my older articles: "Initializing DX12 Textures After Allocation and Aliasing" and the follow-up "States and Barriers of Aliasing Render Targets". To recap: when a texture is placed in memory that may contain garbage data, it needs to be properly initialized before use. This situation can occur when the texture is created as placed in a larger memory block (using the CreatePlacedResource function), and that memory was previously used for something else, or when the texture aliases other resources. Proper initialization usually involves a Clear operation. However, if we don’t care about the contents, we can also use the DiscardResource function (in DX12) or transition the texture from VK_IMAGE_LAYOUT_UNDEFINED (in Vulkan).
Here comes the tricky part. What if we’re going to overwrite the entire texture by using it as a render target or UAV / storage image? Surprisingly, that is not considered proper initialization. If the texture were uncompressed, everything might work fine. But when an internal compression format is applied, visual artifacts can appear - and sometimes persist - even after a full overwrite as an RT or UAV. This issue frequently shows up on AMD GPUs while going unnoticed on NVIDIA. The root cause is that the texture’s metadata wasn't properly initialized. The DiscardResource function handles this correctly: it initializes the metadata while leaving the actual pixel values undefined.
The Driver Experiments tool can also help with debugging this type of issue by providing the "Disable color texture compression" experiment (and in DX12, also "Disable depth-stencil texture compression"). When enabled, the driver skips applying internal compression formats to textures in video memory. While this may result in reduced performance, it can also eliminate rendering bugs. If enabling this experiment resolves the issue, it’s a strong indicator that the problem lies in a missing or incorrect initialization (typically a Clear operation) or a barrier involving a render-target or depth-stencil texture. The next step is to identify the affected texture and insert the appropriate command at the right place in the rendering process.
The Driver Experiments tab in the Radeon Developer Panel is a collection of toggles for the AMD graphics driver, useful for debugging and performance tuning. I've focused on two of them in this article, but there are many more, each potentially useful in different situations. Over the years, I’ve encountered various issues across many games. For example:
This will be a beginner-level article for programmers working in C, C++, or other languages that use a similar preprocessor - such as shader languages like HLSL or GLSL. The preprocessor is a powerful feature. While it can be misused in ways that make code more complex and error-prone, it can also be a valuable tool for building programs and libraries that work across multiple platforms and external environments.
In this post, I’ll focus specifically on conditional compilation using the #if and #ifdef directives. These allow you to include or exclude parts of your code at compile time, which is much more powerful than a typical runtime if() condition. For example, you can completely remove a piece of code that might not even compile in certain configurations. This is especially useful when targeting specific platforms, external libraries, or particular versions of them.
When it comes to enabling or disabling a feature in your code, there are generally two common approaches:
Solution 1: Define or don’t define a macro and use #ifdef:
// To disable the feature: leave the macro undefined.
// To enable the feature: define the macro (with or without a value).
#define M
// Later in the code...
#ifdef M
// Use the feature...
#else
// Use fallback path...
#endif
Solution 2: Define a macro with a numeric value (0 or 1), and use #if:
// To disable the feature: define the macro as 0.
#define M 0
// To enable the feature: define the macro as a non-zero value.
#define M 1
// Later in the code...
#if M
// Use the feature...
#else
// Use fallback path...
#endif
There are more possibilities to consider, so let’s summarize how different macro definitions behave with #ifdef and #if in the table below:
| #ifdef M | #if M | |
|---|---|---|
| (Undefined) | No | No |
#define M |
Yes | ERROR |
#define M 0 |
Yes | No |
#define M 1 |
Yes | Yes |
#define M (1) |
Yes | Yes |
#define M FOO |
Yes | No |
#define M "FOO" |
Yes | ERROR |
The #ifdef M directive simply checks whether the macro M is defined, no matter if it has empty value or any other value. On the other hand, #if M attempts to evaluate the value of M as an integer constant expression. This means it works correctly if M is defined as a literal number like 1 or even as an arithmetic expression like (OTHER_MACRO + 1). Interestingly, using an undefined symbol in #if evaluates to 0, but defining a macro with an empty value or a non-numeric token (like a string) will cause a compilation error - such as “error C1017: invalid integer constant expression” in Visual Studio.
It's also worth noting that #if can be used to check whether a macro is defined by writing #if defined(M). While this is more verbose than #ifdef M, it’s also more flexible and robust. It allows you to combine multiple conditions using logical operators like && and ||, enabling more complex preprocessor logic. It is also the only option when doing #elif defined(OTHER_M), unless you are using C++23, which adds missing #elifdef and #elifndef directives.
So, which of the two approaches should you choose? We may argue about the one or the other, but when developing Vulkan Memory Allocator and D3D12 Memory Allocator libraries, I decided to treat some configuration macros as having three distinct states:
To support this pattern, I use the following structure:
#ifndef M
#if (MY OWN CONDITION...)
#define M 1
#else
#define M 0
#endif
#endif
// Somewhere later...
#if M
// Use the feature...
#else
// Use fallback path...
#endif
Today I would like to present my new article: "The Secrets of Floating-Point Numbers". I can be helpful to any programmer no matter what programming language they use. In this article, I discuss floating-point numbers compliant with the IEEE 754 standard, which are available in most programming languages. I describe their structure, capabilities, and limitations. I also address the common belief that these numbers are inaccurate or nondeterministic. Furthermore, I highlight many non-obvious pitfalls that await developers who use them.
The article was first published few months ago in Polish in issue 5/2024 (115) (November/December 2024) of the Programista magazine. Now I have a right to show it publicly for free, so I share it in two language versions: