Adam Sawicki - Homepage View RSS

Graphics programming, game programming, C++, games, Windows, Internet and more...
Hide details



DirectX 12 News from GDC 2026 - My Comments 15 Mar 1:02 PM (6 days ago)

Game Developers Conference (GDC) just took place on 9–13 March 2026 (renamed this year to “GDC Festival of Gaming”). Microsoft announced lots of interesting news during the event regarding further development of DirectX 12, its accompanying libraries, and tools. I didn’t attend the conference this year, but the announcements were also published on their DirectX Developer Blog. In this article, I will gather and summarize these news items, provide links to the appropriate web pages, and include my comments.

When I was working at AMD, I was deeply involved in this area - writing code, publishing articles on the GPUOpen website, attending GDC, and even giving a talk there twice. Back then, I didn’t post much about it on my personal blog, as I would have needed to watch my words and secure corporate approvals. Now, as I’m just a programmer at Plastic - a small game development studio - I can observe these developments from the outside and comment on them freely, which I am going to do with brutal honesty 🙂 If I had a YouTube channel, I would record a “reaction” type of video, but since I’m writing a blog, this will be my reaction in text form.

Disclaimer: Everything in this article is based on information shared publicly by Microsoft and GPU vendors and could be gathered by any graphics programmer. No insider information was used.

Introduction

Every year, GDC is the pinnacle moment for game developers, including rendering programmers. Whether it’s worth going there is a separate question, considering the cost of travel (especially for someone living in Europe) and the price of a hotel in San Francisco. There are other important events as well: Nvidia has GTC, Epic has Unreal Fest, Vulkan has the Vulkanised conference as its main event, and graphics researchers gather at SIGGRAPH. But GDC in March is the annual opportunity for people from all these worlds to get together, listen to talks, and socialize, so we should all pay attention to what happens there.

DirectX Innovation at GDC 2026 is the article announcing the talks Microsoft delivered during the event. “DirectX State of the Union” is the main one, spanning a variety of topics related to recent DX12 developments. “DirectX is Bringing Console-Level Developer Tools to Windows” is also very interesting, announcing unprecedented progress in graphics tooling, especially PIX. Microsoft was joined on stage by representatives of all four current PC GPU vendors (AMD, Intel, Nvidia, Qualcomm), who normally compete with each other. This is always a sign that something important is happening (like the memorable presentation "Approaching Zero Driver Overhead in OpenGL" from GDC 2014, which I mentioned as part of the story in my article Graphics APIs – Yesterday, Today, and Tomorrow). Hopefully video recordings or slides from this year’s talks will available online after some time. For now, we need to analyze what we have - just the web pages they published.

GPU vendors also posted their announcements simultaneously, expressing support for Microsoft’s announcements.

On one hand, for Microsoft and GPU vendors (also called Independent Hardware Vendors - IHVs), GDC is a major milestone every year. Developers crunch on their code to finish demos, articles and slide decks are prepared, travel and meetings get scheduled... On top of the engineering effort, there is surely politics involved. Cooperation between teams in a big corporation is difficult enough. Here, Microsoft needed to coordinate with all PC GPU vendors. I believe it is even more challenging for Khronos, where all hardware and software vendors that care about graphics (both desktop and mobile) come together, but I’m sure it was a hectic time there as well.

But on the other hand, what has been announced in these talks will only ship to the public in the upcoming months, basically “when it’s done”, so nothing really changed this week. We just received new information that had been secret so far. It’s like watching a teaser video for a new game (when we read an article announcing new technology) or maybe a gameplay trailer (when we can see an actual API spec), with the game shipping later this year.

1. Tool improvements

The main blog post on DirectX Developer Blog:
DirectX: Bringing Console-Level Developer Tools to Windows

This topic is the most interesting and most important for me personally. Compared to the debuggers we have for CPU programming languages (whether high-level like Python or native like C and C++), I think GPU debugging is really in the Stone Age. GPU crashes (also known as Timeout Detection and Recovery - TDR - whether caused by a timeout or a memory page fault) are especially difficult to debug. Sure, frame capture tools like PIX on Windows and RenderDoc are helpful in debugging various problems, but they require a frame that successfully rendered until the Present() call. The Debug Layer and GPU-Based Validation also do a great job. Finally, crash capture tools like DRED and custom vendor tools - Nvidia Aftermath and Radeon GPU Detective - help identify the cause of a crash... when they work, because sometimes they don’t show any meaningful information.

I wish we had a proper debugger for shaders, just like we have debuggers for CPU code - with breakpoints, step-by-step execution, watching the values of local variables, etc. (They would likely need to be conditional breakpoints for a specific frame, draw call, and pixel coordinate or thread ID, because there are thousands of threads running simultaneously on the GPU and millions of them starting every frame.) Sure, PIX and RenderDoc offer shader debugging, but RenderDoc does it by emulating shaders on the CPU, which can break in many cases because it doesn’t show what really happens on the GPU. What if there are timing issues or race conditions, such as a missing barrier? What if there is a bug in the shader compiler for a specific GPU?

Of course, this is not as easy on the GPU as it is on the CPU, where a thread or a process can be suspended while the operating system, IDE, and other apps continue running. On a GPU, the entire chip may be busy executing just a single draw call. However, I can still imagine a scenario where the GPU is stopped, breaking into a debugging session, while the debugger runs on a separate machine or on the same machine with the desktop rendered using integrated graphics. In fact, stopping the entire machine and debugging it from a remote system connected through a network is possible using WinDbg, but that is kernel-mode, low-level, hardcore debugging. So we can say that making GPU debugging possible is mostly a matter of developing good high-level tools.

Despite the confidentiality of console SDKs, it shouldn’t be a secret if Microsoft disclosed it in the title of their talk that consoles like Xbox have better graphics tools than the PC. In DirectX: Bringing Console-Level Developer Tools to Windows, they announced a major update to PIX on Windows, along with other tools and APIs intended for debugging. Let’s see what exactly they announced.

DirectX Dump Files

First of all, they disclosed a plan to support new DirectX Dump Files (.dxdmp), containing a dump of the GPU state after it crashes. They will be available to open in PIX. It’s great to see they have buy-in from all GPU vendors, as this surely requires their deep involvement and support. It’s also great that there will be a toggle to adjust the trade-off between performance overhead and "actionability" (as they called it), since gathering more data may impact performance, which in turn could hide the bug. The “no overhead” mode, where supported, may even be suitable to leave always enabled for end users. Dump files will be available for manual management or collected by Microsoft via Watson.

This all looks very promising. I have just one concern. Based on the example screenshots shown from all four GPU vendors, I suspect this whole system may just be a generic placeholder, with the specific data gathered and displayed being highly dependent on the GPU. We can only hope that the actual crash dumping and the data gathered from the crash will be:

From the screenshots shown:

  1. I would score Nvidia the highest. Given the unreliable nature of GPU crash capturing, showing information like "Confidence: X%" or "Possible Cause 1/2/..." looks like a good direction.
  2. AMD seems to display the same information that the current text-based reports from Radeon GPU Detective contain. (A tool that I had the honor and pleasure to work on.) Which is great, because I know their crash capturing is very reliable. The information shown is sometimes just too low-level. For example, it is known that it shows a "Make Resident" event (also visible in this screenshot) as an internal driver operation, even if the developer didn’t explicitly call ID3D12Device::MakeResident.
  3. Intel scores the worst, as it shows raw GPU registers with cryptic names like VS_INVOCATION_COUNT_RCSUNIT_BE_ and their hexadecimal values. It’s great that they are able to dump the exact state of the GPU at the moment it crashed, but do they really believe game developers will read thousands of pages of their GPU documentation to make sense of it?

PIX API

Next, Microsoft announced the PIX API - a programmatic way to access all the data a PIX capture and the new crash dump can contain, available to C++, C#, and Python. This is a great step forward. It will definitely be useful for various kinds of automation. I can also imagine developing an MCP server, so an AI agent could use such captures to help with graphics debugging or performance optimization. It’s worth mentioning that by doing this, PIX is basically catching up with RenderDoc, which already has its own Python API.

DebugBreak

Next, they are going to add a HLSL intrinsic function DebugBreak(). See the specification: 0039 - Debugging Intrinsics. It will be similar to the assert macro known from C++ and other CPU languages, triggering an instant crash, which can help identify exactly where an asserted condition was hit - in which frame, which draw call or dispatch, and hopefully also with a pixel or thread ID. It definitely has a chance to work more reliably than my hacky ShaderCrashingAssert library.

Too bad they didn’t define any parameter that would allow the shader to return some data to the crash dump. Even a single uint or uint4 would make a huge difference. Dear Microsoft, can we have that, please???

I also wish that together with this “assert” function, they worked on some standardized “printf” function that would let shaders output messages and data while running, even without crashing. Having those two would move GPU debugging to the Medieval Ages. Implementing convenient logging in HLSL has been discussed many times, including in the good article An Experimental Approach to printf in HLSL by Chris Bieneman - still unofficial, despite the fact that he is a Microsoft employee. After this year’s GDC there are still no signs that Microsoft is working on standardizing this, unfortunately.

PIX event configurability

Then we have PIX events (also known as markers) - the begin/end events with custom strings that we use to organize our draw calls into a logical hierarchy for easier debugging in PIX or RenderDoc. Microsoft announced that they will finally be passed to the graphics driver. This is a huge thing. Let me explain why.

There are basically two ways graphics debugging tools can operate. The first is injecting themselves into the game process and intercepting all calls to the graphics API. That’s what PIX, RenderDoc, and GfxReconstruct do. This is great, because it captures all the calls and all the data at a high level, exactly as the game developer passed them. On the other hand, it causes multiple problems:

The second mode is a tool connecting directly to the graphics driver (e.g. through a socket) and fetching data from it. That’s how Radeon Developer Tools operate. It works independently of what game is running or how it was launched. The game process is not altered in any way. A strategic decision to implement tools like this has many advantages. They work very reliably. However, there is one big caveat: the calls that the driver sees are not necessarily the same calls that the game developer made using the graphics API.

In Vulkan, this is not such a big issue, because the driver basically implements the API “on the other side”. It can also expose new extensions (like VK_EXT_debug_utils) that are available to the game code for direct use. With DirectX 12, however, this is not that simple. The sequence looks like this: Game engine code -> DX12 API -> DirectX Runtime (Microsoft code) -> Device Driver Interface (DDI) API -> Driver. Some information gets lost along the way. This includes PIX events. Because of that, AMD had to define their own way of submitting events so they become visible in their tools, in the form of the AMD GPU Services (AGS) library or a replacement header for PIX events. See RGD documentation - Known issues and workarounds for more details. Now, with Microsoft changing their mind and allowing standard PIX events to reach the driver, this will no longer be necessary. Finally!

I hope that together with begin/end events, Microsoft will also pass to the driver the string names that we assign to our textures and buffers using ID3D12Object::SetName calls. Preferably also data set through ID3D12Object::SetPrivateData - that would provide much more flexibility. Their new PIX Markers spec mentions SetName to be passed to DDI. This is great, because currently, AMD tools fetch these names through the Event Tracing for Windows (ETW) mechanism, which is a workaround and doesn’t always work reliably. (Which is not a secret - see the article Viewing DirectX® 12 Named Resources with Radeon™ Memory Visualizer 1.3).

By the way, I personally like the Vulkan approach of formally defining a way to inject layers into the API. I believe it’s high time for Microsoft to do the same. I was secretly hoping they would add such support for layers a long time ago, but they still haven’t, so every tool like PIX, RenderDoc, GfxReconstruct, or GPU Reshape needs to implement its own solution and struggle with numerous issues. Nvidia Streamline seems to be an effort in this direction, providing a generic layer system for DX12, but because it’s made by Nvidia, unfortunately other GPU vendors didn’t join the effort to spread its adoption. (Hopefully they will have more luck with the Slang shading language.)

Live Shader Debugging

Microsoft also announced they are working on “real-time, on-chip shader debugging”. This almost sounds too good to be true. But they are giving themselves time until 2027, so I believe that by then they may be able to achieve it, with intense effort and support from GPU vendors. That would be the holy grail that could bring GPU debugging into modern times, catching up with CPU programming languages.

One big challenge in bringing this feature to PC is stopping the GPU and inspecting its state at the exact moment, as I described above. But another problem is showing and stepping through the shader code. I hope it won’t be the GPU assembly (ISA) or the intermediate language (DXIL, SPIR-V), but the real high-level shader language (HLSL, GLSL) as written by the game developer. Supporting this would require all stages of shader compilation to preserve the correlation between ISA instructions and high-level shader source lines, as well as between GPU registers and variables in the shader source. Shaders are highly optimized, with everything inlined. There is no call stack to inspect. Providing a decent debugging experience must therefore be a complex endeavor.

2. ML advancements

The main blog post on the DirectX Developer Blog:
Evolving DirectX for the ML Era on Windows

Let's say it plainly: machine learning using deep neural networks is, at a low level, largely about matrix multiplication. GPUs are great at that. We still call them "graphics processing units", but developers discovered a long time ago that they perform well for many other kinds of tasks. First it was breaking ciphers, then crypto mining, and now it's AI. At a high level, such workloads may look very similar to graphics, which is also about regular, massively parallel processing of vectors and matrices. However, when we dig deeper, there are many differences, including:

GPU vendors introduced dedicated hardware to accelerate these types of matrix multiplication ML workloads. Nvidia has been the most effective at marketing theirs, making sure that even average gamers have heard about Tensor Cores, but AMD also has its Wave Matrix Multiply-Accumulate (WMMA) instructions, and Intel has its XMX units. The problem is that they are available in purely compute-oriented APIs like Nvidia CUDA, AMD HIP/ROCm, and Intel oneAPI, not in graphics APIs and shader languages. They are also used in proprietary upscaling technologies: DLSS, FSR 4, and XeSS, respectively - all implemented as black-box libraries performing some vendor-specific magic under the hood.

Is this the desired and target state of things? Having a custom, proprietary, high-quality technology like DLSS surely gives a vendor some competitive advantage when gamers decide which graphics card to buy. But on the other hand, do GPU vendors really like the fact that precious transistors on their chips and their potential TOPS of computational power stay idle most of the time? I don't think so. I think the real issue is exposing these capabilities in a way that is unified across vendors, high-level, and convenient enough for game developers on one hand, while still utilizing the full performance potential of the hardware on the other.

Vulkan faces the same problem. They solve it in a typical Vulkan way - by introducing extensions. We have VK_KHR_shader_integer_dot_product, VK_KHR_cooperative_matrix, VK_KHR_shader_float16_int8, VK_EXT_shader_float8, and many others. In DirectX 12, Microsoft has made several attempts to offer similar capabilities, with mixed success.

Side note: GPU Work Graphs are not this kind of technology. This recent addition to the DX12 API allows the execution of entire graphs of different shaders. However, we still need to focus our implementation on individual threads and manually control how they are spawned. There is no automatic operator fusion or other optimization performed automatically, like ML frameworks do. Work Graphs are more intended for compute work serving rendering workloads, as an extension of Indirect calls that let the GPU spawn new work on its own. They provide more flexibility by supporting a switch to a different shader, as opposed to only changing the vertex/index buffers and the number of vertices/instances/threads, as in Indirect calls.

Here we are now, after GDC 2026, and Microsoft's current plans to expose these ML-related hardware capabilities seem to come from both angles at once:

1. As instructions available in normal shaders

These can be useful for implementing inference of small ML models as part of existing vertex/pixel/compute shaders. For example, this could include neural texture (de)compression, approximation of complex material and lighting models (BRDFs), character animation, or approximate physics simulation. In the future, we may have many small models evaluated every render frame.

These are not really news from this week, because they were announced and their specifications have been available for quite some time. Microsoft develops its HLSL language advancements quite openly by sharing HLSL specification proposals.

Long vectors, as specified in proposal 0026 - HLSL Long Vectors. It adds support for vectors with more than four elements, e.g. vector<float, 15>. Note that they are still normal variables, local to an individual shader thread.

Linear algebra, as specified in proposal 0035 - Linear Algebra Matrix. It adds a matrix type, such as Matrix<ComponentType::F16, 8, 32, MatrixUse::A, MatrixScope::Wave>, as well as vector-matrix and matrix-matrix operations like Multiply and MultiplyAccumulate.

Note that the matrix has many template parameters:

This API looks very clean and convenient. I believe they may have found a good abstraction. Hopefully, with enough effort put into developing shader compilers, it can also deliver good performance while utilizing the hardware capabilities of each GPU. My concern is whether the API will keep up with new hardware capabilities that GPU vendors may want to expose. By the time it ships in the retail SDK, they may already want support for more data types (like INT4) or even features that go beyond what the API offers, such as sparsity.

2. DirectX Compute Graph Compiler

That's a new announcement from GDC 2026. Microsoft teased it as a completely new technology that will consume entire ML models and optimize them for efficient execution on a specific GPU. It will feature "graph optimization, memory planning, and operator fusion". This is clearly an approach to executing ML workloads intended to keep the entire GPU busy for some time, similar to upscaling and other screen-space effects. They will likely execute as multiple compute dispatches, maybe even as separate command buffer submissions.

Note that ML frameworks can already do these things. With this project, Microsoft is basically creating another one, but tailored for cooperation with DirectX 12 and graphics workloads.

Note also that the graph approach is well known in the game development community. Advanced game engines often implement their own graphs representing render passes and dependencies between them, like the Render Dependency Graph in Unreal Engine. AMD also developed a similar solution called Render Pipeline Shaders. However, it never gained traction, possibly because developers saw it as overkill to employ LLVM to compile a custom domain-specific language. I'm not sure if it was ever used in any game. The project looks abandoned now. Game engines already have their own graph solutions, which are often simpler and based on C++ templates or macros.

Will the new DirectX Compute Graph Compiler allow game developers to create their own ML-based effects like DLSS or FSR, executed with comparable performance? I hope so. That would finally allow them to fully utilize these GPU hardware capabilities with their own code. But here we are only talking about inference. Designing and training an ML model, and gathering enough high-quality training data, is a separate topic. Training on data from a specific game title could be beneficial in some cases, but on the other hand, IHVs cooperating with many game developers and having access to so many games will still give them a competitive advantage when training their image upscaling super resolution and other models.

3. Advanced Shader Delivery

The main blog post on DirectX Developer Blog:
Advanced Shader Delivery: What’s New at GDC 2026
Specification:
Advanced Shader Delivery - Shader Compiler Plugin

This is basically yet another announcement of what they already announced before. The problem is this: every GPU has a different instruction set, so shaders cannot be compiled to native code like the .exe files of our games. The second stage of shader compilation (from DXIL/SPIR-V intermediate code to GPU ISA) currently happens inside the graphics driver. The shader compiler is basically one of the driver modules.

  1. In the old days of OpenGL and DirectX 11, different states of the graphics pipeline could trigger shader recompilation, so you might toggle some state and the next draw call would cause a massive hitch.
  2. In Vulkan and DirectX 12, we need to create Pipeline State Objects upfront, encapsulating all the states. This gives full control over when shader compilation happens, but one may argue that the cure is worse than the disease, because all the required combinations of states require so many PSOs that it takes a very long time to create them all at startup, or... creating them at runtime causes hitches again, this time explicitly triggered by the game.
  3. What Microsoft now proposes is precompiling and bundling these shaders into packages that can be downloaded over the Internet. Game storefronts like Steam and EGS will participate in this.

I'm not sure it's such a big deal. We will just change "compiling shaders..." to "downloading shaders...", and it will still happen whenever we update the game or the graphics driver. The only hope is that downloading these shaders will be faster than compiling them. The work done by shader compilers is difficult - trying to optimize the shader while still finishing the process within milliseconds. Allowing the compilation to take longer can result in better optimization. I remember debugging a game once that was supposedly hanging on startup, only to find out there were ray tracing shaders taking over a minute each to compile.

But the second feature they’ve announced may provide more performance benefits, even during game development, when shaders keep changing. Partial Graphics Programs will allow creating pipeline objects that contain only "pre-rasterization" shaders (like a vertex shader) or only a pixel shader, and later linking them together, which hopefully won’t require full recompilation at that point.

4. DirectStorage 1.4

The main blog post on DirectX Developer Blog:
DirectStorage 1.4 release adds support for Zstandard

I must admit I don't follow the development of this API very closely. Overall, it looks like a good idea. Update 1.4 brings support for the Zstandard aka zstd compression format, which is open, free, and developed by Meta. Before that, Microsoft promoted the GDeflate algorithm proposed by Nvidia. This change may be perceived as a step in the right direction - toward greater neutrality among GPU vendors.

On top of that, they presented the Game Asset Conditioning Library - a library offering pre-/post-processing of data, swizzling it so the core lossless compression algorithm performs better. Doing such tricks to squeeze the maximum potential out of a compression algorithm is known in the demoscene, for example, and is used to create those amazing intros that fit into 4 KB or 64 KB. The library supports BC1–5 and BC7 texture formats, which also looks like a good idea, considering that textures typically constitute a major portion of game asset data.

There is still some concern about the benefits and adoption of DirectStorage overall. As far as I know, many years after the initial release, we are still waiting to see the API used by a large number of games or integrated and enabled by default in major game engines. Whether that ever happens depends on whether the API can consistently demonstrate sufficient benefits over traditional file reading APIs. It definitely has potential thanks to more direct access to SSD NVMe drives and GPU-accelerated decompression directly into buffers and textures in video memory. Whether we see wider adoption or not, the API is public and production-grade, so it is very unlikely Microsoft will silently kill it, as they did with DirectSR.

5. DXR 2.0

Specification:
DirectX Raytracing (DXR) Functional Spec, Part 2

A new update to the ray tracing API, defined as D3D12_RAYTRACING_TIER_2_0 and Shader Model 6.10, will bring the following features, focused mostly on the process of building and updating acceleration structures:

Cluster Level Acceleration Structure (CLAS). This makes acceleration structures three-level. CLAS will be like a meshlet for ray tracing, representing a small piece of a triangle mesh, up to 256 vertices and 256 triangles. Then, an actual Bottom Level Acceleration Structure (BLAS) can be created from such CLASes. This can improve performance and also distribute the performance cost of building acceleration structures more evenly across frames by splitting it into smaller pieces of work.

Cluster Template - like CLAS but without vertex positions. An actual CLAS can be "instantiated" from such a template by providing vertex positions, possibly many times when the object is animated. This can provide another performance gain. This API is "intended to be an upgrade versus traditional updates / refits".

Interestingly, they also proposed a new compressed representation of vertex positions, called Compressed1 position encoding, which seems to use shared exponents and delta values to save memory. It reminds me of the Dense Geometry Format that AMD proposed some time ago.

Partitioned TLAS (PTLAS) adds another level to the hierarchy by defining a PTLAS as a new kind of Top Level Acceleration Structure (TLAS) that stores references to Partitions, which in turn reference Instances. A Partition holding a range of 100–1000 instances is recommended. This can definitely help solve the problem of rebuilding the entire TLAS every frame, which games typically do today as the set of objects in the virtual world keeps changing dynamically.

Finally, the new specification also introduces acceleration structure building operations as Indirect commands.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

Total Commander - a Plugin Supporting a Custom Archive Format - a New Article 7 Mar 5:18 AM (14 days ago)

Today I would like to present my new article: "Total Commander – a Plugin Supporting a Custom Archive Format". In this article, we will design our own file format that allows multiple files to be packed and compressed into a single archive, similar to formats like ZIP or 7Z. Using the C++ language and the Visual Studio environment on Windows, we will then write a plugin for the Total Commander file manager that enables creating and manipulating such an archive, including freely adding and removing files inside it.

The article was first published few months ago in Polish in issue 5/2025 (120) of the Programista magazine. Now I have a right to show it publicly for free, so I share it in two language versions:

See also GitHub repository with the source code accompanying the article: sawickiap/TotalCommanderPluginTutorial

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

A Formula for Overall System Load 26 Feb 4:33 AM (23 days ago)

Some time ago I've shared my thoughts about coming up with a nice formula to calculate memory fragmentation. I've recently had another such math puzzle, this time themed around "system load". Below you will find the formula I developed, this time with an interactive demo!

The problem is this: Imagine a system that consists of several sub-systems, each having a gauge indicating the current load, 0...100%. It may be some real device like a computer (with CPU busy %, GPU busy % and the current memory consumption), some vehicle (a car, an airplane), or some virtual contraption, like a spaceship in a sci-fi video game (with the current engine load, force field strength, temperature that can make it overheat, etc.) Apart from the load of the individual subsystems, we want to show a main indicator of the overall system load.

What formula should be used to calculate it? All the load values, displayed as 0...100%, are internally stored as numbers in the range 0.0...1.0. The key principle is that if at least one subsystem is overloaded (at or near 100%), the entire system is considered overloaded. A good formula for the overall system load should have the following properties:

  1. The output is 0 if and only if all inputs are 0. In other words, if any subsystem has load >0%, the overall system load is also >0%.
  2. The output is 1 if at least one input is 1. In other words, it's enough for one subsystem to be fully loaded at 100% to consider the entire system loaded at 100%.
  3. If the output is <1, any increase in any input value should result in some increase in the output value.

Note that these requirements don't specify what happens between 0 and 1. For example, should the output become 0.5 if just one input reaches 0.5, or only after all inputs reach 0.5? We have full freedom here.

#1. My first idea was to use AVERAGE(input[i]). It meets requirement 1 and 3, but it doesn't meet requirement 2, because it requires all inputs to be 1 for the output to become 1.

#2. My second idea was to use MAX(input[i]). It meets requirement 1 and 2, but it doesn't meet requirement 3, because for any input other than the largest one, a change in its value doesn't change the output.

#3. There is a more complex formula that meets all 3 requirements. It may be called the "inverse product" and it looks like this:

output1 = 1.0 - ( (1.0 - input[0]) * (1.0 - input[1]) * ... )

You can think of it as multiplying the "headrooms" left in each subsystem.

#4. Unfortunately, the formula shown above has a tendency to give very high values nearing 100% even for low input values, which is not very user-friendly. A result that is closer to MAX() reflects the overall system load better. Considering this, but still wanting to preserve our requirement 3, I ended up blending AVERAGE() with the inverse product 50/50:

finalOutput = (AVERAGE(input[i]) + output1) * 0.5

Here is an interactive demo of all the formulas described above. You can move the sliders to control the input values.

Further modifications are possible:

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

Graphics APIs – Yesterday, Today, and Tomorrow - a New Article 20 Jan 12:05 PM (2 months ago)

Today I would like to present my new article: "Graphics APIs – Yesterday, Today, and Tomorrow". In this article, we will take a quick walk through the history of graphics APIs such as DirectX, OpenGL, Vulkan, and the accompanying development of graphics cards on the one hand, and video games on the other, over the years. We will not be learning how to program in any of these APIs. This article should be understandable and may be engaging for anyone curious about games or graphics, or at least for those who played games in childhood.

The article was first published few months ago in Polish in issue 4/2025 (119) (July/August 2025) of the Programista magazine. Now I have a right to show it publicly for free, so I share it in two language versions:

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

State of GPU Hardware (End of Year 2025) - New Article 29 Dec 2025 5:10 AM (2 months ago)

I published a guest article by Dmytro “Boolka” Bulatov, providing an overview of the current GPU market in context of what features are supported on end-users machines:

» "State of GPU Hardware (End of Year 2025)" «

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

How I Fixed My App Taking 5 Minutes to Start 22 Dec 2025 12:08 PM (2 months ago)

I recently got a new Windows PC, which I use for development. I work on a game based on Unreal Engine, and I build both the game and the engine from C++ source code using Visual Studio. From the very beginning, I had an annoying problem with this machine: every first launch of the game took almost five minutes. I don’t mean loading textures, other assets, or getting into gameplay. I mean the time from launching the app to seeing anything on the screen indicating that it had even started loading. I had to wait that long every time I started or restarted the system. Subsequent launches were almost instantaneous, since I use a fast M.2 SSD. Something was clearly slowing down the first launch.

Solution: open Windows Settings and disable Smart App Control. This is a security feature that Microsoft enables by default on fresh Windows installations, and it can severely slow down application launches. If you installed your system a long time ago, you may not have it enabled. Once you turn it off, it cannot be turned back on - but that’s fine for me.

Full story: I observed my game taking almost five minutes to launch for the first time after every system restart. Before I found the solution, I tried many things to debug the problem. When running the app under the Visual Studio debugger, I noticed messages like the following slowly appearing in the Output panel:

'UnrealEditor-Win64-DebugGame.exe' (Win32): Loaded (...)Engine\Binaries\Win64\UnrealEditor-(...).dll. Symbols loaded.

That’s how I realized that loading each .dll was what took so long. In total, launching the Unreal Engine editor on my system requires loading 914 unique .exe and .dll files.

At first, I blamed the loading of debug symbols from .pdb files, but I quickly ruled that out, because launching the game without the debugger attached (Ctrl+F5 in Visual Studio) was just as slow - only without any indication of what the process was doing during those five minutes before anything appeared on the screen.

Next, I started profiling this slow launch to see what was happening on the call stack. I used the Very Sleepy profiler, as well as Concurrency Visualizer extension for Visual Studio. However, I didn’t find anything unusual beyond standard LoadLibrary calls and other generic system functions. That led me to suspect that something was happening in kernel space or in another process, while my process was simply blocked, waiting on each DLL load.

Naturally, my next assumption was that some security feature was scanning every .dll file for viruses. I opened Windows Settings → Protection & security → Virus & threat protection and added the folder containing my project’s source code and binaries to the exclusions list. That didn’t help. I then completely disabled real-time protection and the other toggles on that page. That didn’t help either. For completeness, I should add that I don’t have any third-party antivirus software installed on this machine.

I was desperate to find a solution, so I thought: what if I wrote a separate program that calls LoadLibrary on each .dll file required by the project, in parallel, using multiple threads? Would that “pre-warm” whatever scanning was happening, so that launching the game afterward would be instantaneous?

I saved the debugger log containing all the “Loaded … .dll” messages to a text file and wrote a small C++ program to process it, calling LoadLibrary on each entry. It turned out that doing this on multiple threads didn’t help at all - it still took 4–5 minutes. Apparently, there must be some internal mutex preventing any real parallelism within a single process.

Next, I modified the tool to spawn multiple separate processes, each responsible for loading every N-th .dll file. That actually helped. Processing all files this way took less than a minute, and afterward I could launch my game quickly. Still, this was clearly just a workaround, not a real solution.

I lived with this workaround for weeks, until I stumbled upon an article about the Smart App Control feature in Windows while reading random IT news. I immediately tried disabling it - and it solved the problem completely.

Apparently, Microsoft is trying to improve security by scanning and potentially blocking every executable and .dll library before it loads, likely involving sending it to their servers, which takes a very long time. I understand the motivation: these days, most users launch only a web browser and maybe one or two additional apps like Spotify, so every newly seen executable could indeed be malware trying to steal their banking credentials. However, for developers compiling and running large software projects with hundreds of binaries, this results in an egregious slowdown.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

Secrets of Direct3D 12: The Behavior of ClearUnorderedAccessViewUint/Float 17 Dec 2025 12:14 PM (3 months ago)

This article is about a quite niche topic - the functions ClearUnorderedAccessViewUint and ClearUnorderedAccessViewFloat of the ID3D12GraphicsCommandList interface. You may be familiar with them if you are a programmer using DirectX 12. Their official documentation - ClearUnorderedAccessViewUint and ClearUnorderedAccessViewFloat - provides some details, but there is much more to say about their behavior. I could not find sufficiently detailed information anywhere on the Internet, so here is my take on this topic.

What do they do?

The two functions discussed here allow “clearing” a buffer, or a subregion of it, by setting every element to a specific numeric value (sometimes also called “splatting” or “broadcasting”). The function ClearUnorderedAccessViewUint accepts a UINT[4] array, while function ClearUnorderedAccessViewFloat accepts a FLOAT[4] array. They are conceptually similar to the functions ClearRenderTargetView and ClearDepthStencilView, which we commonly use for clearing textures. In the realm of CPU code, they can also be compared to the standard function memset.

Typed buffers

These functions work with typed buffers. Buffer views can come in three flavors:

  1. Typed, which means a buffer consists of elements of a specific DXGI_FORMAT, just like pixels for a texture. For example, using DXGI_FORMAT_R32G32B32R32_FLOAT means each element has four floats (x, y, z, w), 16 bytes in total.
  2. Structured, which means a buffer consists of elements of a custom type, which may be a basic type or a custom structure having various members and a size spanning many bytes.
  3. Byte address (aka raw), which means a buffer is treated as raw binary data, read and written one uint at a time.

I plan to write a more comprehensive article about buffers in DX12 and their types. For now, I recommend the excellent article Let’s Close the Buffer Zoo by Joshua Barczak for more information. In my article below, we will use only typed buffers.

Descriptors needed

The functions ClearUnorderedAccessViewUint and ClearUnorderedAccessViewFloat have a quite inconvenient interface, requiring us to provide two UAV descriptors for the buffer we are about to clear: a GPU handle in a shader-visible descriptor heap and a CPU handle in a non-shader-visible descriptor heap. This means we need to have both kinds of descriptor heaps, we need to write the descriptor for our buffer twice, and we need three different descriptor handles - the third one being a CPU handle to the shader-visible heap, which we use to actually create the descriptor using CreateUnorderedAccessView function. The code may look like this:

// Created with desc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV,
// desc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE.
ID3D12DescriptorHeap* shaderVisibleDescHeap = ...
// Created with desc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV,
// desc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_NONE.
ID3D12DescriptorHeap* nonShaderVisibleDescHeap = ...

UINT handleIncrementSize = device->GetDescriptorHandleIncrementSize(
    D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV)

UINT descIndexInVisibleHeap = ... // Descriptor index in the heap.
UINT descIndexInNonVisibleHeap = ... // Descriptor index in the heap.

D3D12_GPU_DESCRIPTOR_HANDLE shaderVisibleGpuDescHandle =
    shaderVisibleDescHeap->GetGPUDescriptorHandleForHeapStart();
shaderVisibleGpuDescHandle.ptr += descIndexInVisibleHeap * handleIncrementSize;

D3D12_CPU_DESCRIPTOR_HANDLE shaderVisibleCpuDescHandle =
    shaderVisibleDescHeap->GetCPUDescriptorHandleForHeapStart();
shaderVisibleCpuDescHandle.ptr += descIndexInVisibleHeap * handleIncrementSize;

D3D12_CPU_DESCRIPTOR_HANDLE nonShaderVisibleCpuDescHandle =
    nonShaderVisibleDescHeap->GetCPUDescriptorHandleForHeapStart();
nonShaderVisibleCpuDescHandle.ptr += descIndexInNonVisibleHeap * handleIncrementSize;

D3D12_UNORDERED_ACCESS_VIEW_DESC uavDesc = {};
uavDesc.ViewDimension = D3D12_UAV_DIMENSION_BUFFER;
uavDesc.Format = DXGI_FORMAT_R32G32B32R32_FLOAT; // My buffer element format.
uavDesc.Buffer.FirstElement = 0;
uavDesc.Buffer.NumElements = 1024; // My buffer element count.

ID3D12Resource* buf = ... // My buffer resource.
device->CreateUnorderedAccessView(buf, NULL, &uavDesc, shaderVisibleCpuDescHandle);
device->CreateUnorderedAccessView(buf, NULL, &uavDesc, nonShaderVisibleCpuDescHandle);

UINT values[4] = {1, 2, 3, 4}; // Values to clear.
commandList->ClearUnorderedAccessViewUint(
    shaderVisibleGpuDescHandle, // ViewGPUHandleInCurrentHeap
    nonShaderVisibleCpuDescHandle, // ViewCPUHandle
    buf // pResource
    values, // Values
    0, // NumRects
    NULL); // pRects

Why did Microsoft make it so complicated? We may find the answer in the official function documentation mentioned above, which says: "This is to allow drivers that implement the clear as a fixed-function hardware operation (rather than as a dispatch) to efficiently read from the descriptor, as shader-visible heaps may be created in WRITE_COMBINE memory." I suspect this was needed mostly for older, DX11-class GPUs with more fixed-function hardware, while modern GPUs can read from and write to video memory more freely.

We must also remember to set the shader-visible descriptor heap as the current one using ID3D12GraphicsCommandList::SetDescriptorHeaps before performing the clear. Interestingly, on my RTX 4090 it works even without this step, but this is still incorrect and may not work on a different GPU. The D3D Debug Layer emits an error in this case. Note this is different from texture clears performed using ClearRenderTargetView and ClearDepthStencilView, where we use Render-Target View (RTV) and Depth-Stencil View (DSV) descriptors, which can never be shader-visible, so they cannot be used in SetDescriptorHeaps. For more information, see my older article: "Secrets of Direct3D 12: Do RTV and DSV descriptors make any sense?".

Barriers needed

The functions ClearUnorderedAccessViewUint and ClearUnorderedAccessViewFloat require the buffer to be in the D3D12_RESOURCE_STATE_UNORDERED_ACCESS state, just like when writing to it from a compute shader. If the buffer was in a different state before, we need to issue a transition barrier (D3D12_RESOURCE_BARRIER_TYPE_TRANSITION). If we use the buffer as a UAV before or after the clear, the state doesn't change, but we need to issue an UAV barrier (D3D12_RESOURCE_BARRIER_TYPE_UAV) to make the commands wait for each other. Otherwise, a race condition could occur, as the commands could run in parallel.

These restrictions make buffer clearing with these functions similar to using compute shaders, and different from ClearRenderTargetView and ClearDepthStencilView, which are used for clearing textures, or from copy operations (CopyResource, CopyBufferRegion), which do not require barriers around them. For a more in-depth investigation of this distinction, see my older article: "Secrets of Direct3D 12: Copies to the Same Buffer".

Conversions from uint

Here comes the main part that inspired me to write this article. I asked myself how are the UINT[4] values passed to the ClearUnorderedAccessViewUint function converted to the values of elements in the buffer, depending on the element format. I could not find any mention of this on the Internet, so I did some experiments. Below, I summarize my findings. Unfortunately, the behavior is inconsistent between GPU vendors! I tested on Nvidia (GeForce RTX 4090, driver 591.44), AMD (Radeon RX 9060 XT, driver 25.11.1), and Intel (Arc B580, driver 32.0.101.8250) - all on Windows 25H2 (OS Build 26200.7462) with DirectX Agility SDK 1.618.3 Retail.

To summarize, the ClearUnorderedAccessViewUint function may be useful when we want to set a specific bit pattern to the elements of a buffer in UINT format, but for other formats or out-of-range values the behavior is unreliable and we cannot be sure it won't change in the future.

Conversions from float

Here is a similar summary of the behavior of function ClearUnorderedAccessViewFloat, which takes 4 floats as the value, for different formats of buffer elements:

To summarize, the ClearUnorderedAccessViewFloat function is useful when we want to set a specific numeric value, correctly converted to the specific format, especially when it's FLOAT, UNORM, SNORM. For consistent behavior across GPUs, we should avoid using it with UINT, SINT formats.

Limited range

If we want to limit the range of elements to clear, we have 2 equivalent ways of doing so:

1. Set the limit when filling the descriptor:

D3D12_UNORDERED_ACCESS_VIEW_DESC uavDesc = {};
uavDesc.ViewDimension = D3D12_UAV_DIMENSION_BUFFER;
uavDesc.Format = ...
uavDesc.Buffer.FirstElement = firstElementIndex; // !!!
uavDesc.Buffer.NumElements = elementCount; // !!!
device->CreateUnorderedAccessView(...

2. Set the limit as a "rectangle" to clear:

D3D12_RECT rect = {
    firstElementIndex, // left
    0, // top
    firstElementIndex + elementCount, // right
    1}; // bottom
commandList->ClearUnorderedAccessViewUint(
    shaderVisibleGpuDescHandle, // ViewGPUHandleInCurrentHeap
    shaderInvisibleCpuDescHandle, // ViewCPUHandle
    buf // pResource
    values, // Values
    1, // NumRects !!!
    &rect); // pRects !!!

Note that in both cases, the boundaries are expressed in entire elements, not bytes.

Documentation

The behavior I presented above is based on my experiments, as it is not described precisely in the official documentation of ClearUnorderedAccessViewUint and ClearUnorderedAccessViewFloat functions in DX12. The state of DX12 documentation in general is somewhat messy, as I described in my recent post "All Sources of DirectX 12 Documentation". Normally, when something is not defined in DX12 documentation, we might resort to the older DX11 documentation. In this case, however, that would be misleading, because DX12 behaves differently from DX11:

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

All Sources of DirectX 12 Documentation 25 Nov 2025 10:59 AM (3 months ago)

Every API needs documentation. Even more so in the case of a graphics API, where there is no single implementation (like in the case of a specific library), but countless users of the API (video games and other graphics apps) and several implementers on the other side of the API (graphics drivers for GPUs from various vendors like AMD, Intel, and Nvidia).

Vulkan documentation, for example, is very extensive, detailed, and precise. Sure, it is not perfect, but it's getting better over time. It's also very formal and difficult to read, but that's how a reference specification should be. For learning the basics, third-party tutorials are better. Documentation is needed for more advanced, day-to-day work with the API. I like to think of the documentation as law. A software bug is like a crime. When the application crashes, you as a programmer are a detective investigating "who killed it". You check the specification to see if the app "broke the law" by using the API incorrectly - meaning your app is guilty of the bug - or whether the usage is correct and the culprit is on the other side: a bug in the graphics driver. There are, of course, some gray areas and unclear situations as well.

Direct3D 12, unfortunately, doesn't have just one main documentation. In this post, I would like to gather and describe links to all official documents that describe the API... and also complain a bit about the state of all this.

1. Direct3D 12 programming guide @ learn.microsoft.com

This looks like the main page of the D3D12 documentation. Indeed, we can find many general chapters there describing various concepts of the API, as well as the API reference for individual interfaces and methods. For example:

But there are also hidden gems - sections that, in my opinion, deserve separate pages, yet are buried inside the documentation of specific API elements. For example:

2. Direct3D 11.3 Functional Specification

The documentation linked in point 1 is not fully complete. Direct3D 12, although revolutionary and not backward-compatible, still builds on top of Direct3D 11 in some ways. For that older API, there is this one long and comprehensive document. Sometimes you may need to resort to that specification to find answers to more advanced questions. For example, I remember searching it to find out the alignment requirements for elements of a vertex or index buffer. Be aware through that the parts of this document that apply to D3D12 are only those that D3D12 documentation doesn't define, and that are applicable to D3D12 at all.

3. github.com/microsoft/DirectX-Specs

On the other hand, recent updates to DirectX 12 are also not included in the documentation mentioned in point 1, as Microsoft now puts their new documents in a GitHub repository. You can find .md files there describing new features added in newer versions of the DirectX 12 Agility SDK - from small ones like ID3D12InfoQueue1, to very large ones like DirectX Raytracing (DXR) or Work Graphs. This repository also provides pages describing what's new in each shader model, starting from 6.0, 6.1, 6.2, etc... up to 6.8 (at the moment I’m writing this post). A convenient way to read these docs is through link: microsoft.github.io/DirectX-Specs/.

4. github.com/microsoft/DirectXShaderCompiler/wiki

Then there is the HLSL shader language and its compiler: DXC. Microsoft also maintains documentation for the compiler and the shader language in a separate GitHub repo, this time using the GitHub Wiki feature. There, we can find descriptions of language features like 16 Bit Scalar Types, what's new in each major HLSL language version (2015, 2016, ..., 2021 - see Language Versions), and... again a list of what has been added in recent shader models (see Shader Model).

5. github.com/microsoft/hlsl-specs

When it comes to the HLSL language itself, it’s sometimes hard to tell what code is correct and supported, because there is no fully formal specification like there is for C++, for example. There is only the High-level shader language (HLSL) section of the documentation mentioned in point 1, which briefly describes elements of the syntax. However, Microsoft recently started writing new documentation for HLSL, which can be found in yet another GitHub repository that is most convenient to read online at microsoft.github.io/hlsl-specs/.

6. DirectX Developer Blog

I should also mention the DirectX Developer Blog, which is worth following for the latest news about new releases of the Agility SDK and recent additions to the API, as well as updates on related projects like PIX, DirectStorage, and DirectSR (which is pretty much dead now - it was removed from the preview Agility SDK before reaching the retail version). The blog also features nice standalone articles, such as Getting Started with the Agility SDK or the HLSL 2021 Migration Guide, which could easily be part of the main documentation.

As one example I stumbled upon just last week: the description of ByteAddressBuffer at learn.microsoft.com mentions that it has methods Load, Load2, Load3, Load4 that read uint values from a buffer. But to learn that modern HLSL versions also support templated Load<MyType>, I had to go to a separate document ByteAddressBuffer Load Store Additions on the DirectXShaderCompiler Wiki - which describes only that specific addition.

What a mess! Why is the DirectX 12 documentation so scattered across so many websites in different shapes and forms? Of course, I don't know - I don't work at Microsoft. But having worked at big companies for more than 10 years, it isn’t shocking to me. I can imagine how things like this happen. First, engineering managers, project/program managers, and other decision-makers tend to focus on adding new features (everyone wants to “build their pyramid”) while also moving quickly and cutting costs. Creating good documentation is not a top priority. Then, there is Conway’s Law, which states that “Organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations.” So if there are separate teams developing DXC, the Agility SDK, etc., they will likely want their own outlets for publishing documentation, while no one takes responsibility for the overall user experience. Still, seeing new initiatives like the HLSL specification, I’m hopeful that things will get better over time.

Finally, DirectX Landing Page is also worth mentioning, as it gathers links to many SDKs, tools, helpers, samples, and other projects related to DirectX.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

Solution to Epic Games Launcher Wizard Ended Prematurely 4 Nov 2025 2:46 AM (4 months ago)

I recently stumbled upon a problem while trying to install the Epic Games Launcher on a fresh installation of Windows 11. The installation wizard was showing the message: “Epic Games Launcher Wizard ended prematurely because of an error.” and the launcher wasn’t installing. Trying to install it from the Microsoft Store was also failing, showing the error code 0x8A150049.

Solution: Create a different user account - one without a space in the username. Change its type to Administrator. Sign out, sign in to that account, and use it to install the launcher. After that, you can return to your main account and delete the temporary one. The Epic Games Launcher will remain installed and ready to use.

Full story: I got a new PC with a fresh installation of Windows 11. I started installing all the necessary software and my programming environment. (For the list of Windows apps I recommend, see my older post: My Favorite Windows Apps.) When I tried to install the Epic Games Launcher, I was surprised that after completing the setup wizard, the app didn’t appear in my system. Only on the second attempt did I carefully read the message that appeared on the final page:

Epic Games Launcher Wizard ended prematurely because of an error.

I searched the Internet and tried many solutions, but none of them helped:

Finally, somewhere on the Internet I found information that the installer leaves a text log file in "c:\Users\MY_LOGIN\AppData\Local\Epic Games\Epic Online Services\EOSInstaller\Logs\EOSInstaller-DATE-TIME.log". I opened it and found the following messages inside:

[2025.11.04-10.00.56:329][---]Log file opened.
[2025.11.04-10.00.56:329][---]FApplication: Version 1.2.26 ran with extract=C:\Users\Adam Sawicki\AppData\Local\Temp\7a4515cf-dde6-44f9-afb4-b5b1e0dee697
[2025.11.04-10.00.56:348][---]FApplication: Extract mode
[2025.11.04-10.00.56:349][---]FApplication: Extracting bundled MSI
[2025.11.04-10.00.56:349][---]FApplication: Could not create temp directory "C:\\Users\\Adam" system:183
[2025.11.04-10.00.56:349][---]FApplication: Failed to build MSI
[2025.11.04-10.00.56:349][---]Log file closed.

The line "Could not create temp directory "C:\Users\Adam"" gave me a clue that the installer likely fails because of the space in my Windows username, which is “Adam Sawicki”. That’s how I came up with the solution of using a Windows account without a space in the username.

After all, this is clearly a bug in Epic’s code. They shouldn’t rely on whether a username contains spaces or other characters. They probably just forgot to properly escape the path with quotation marks (" ") somewhere in their code. Epic, please fix it!

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?

Calculating the Bounding Rectangle of a Circular Sector 19 Oct 2025 1:09 AM (5 months ago)

This article will be short and straight to the point. While working with geometry in 2D, I was recently looking for an algorithm to calculate the bounding box of a specific shape that I initially called a "cone". Actually, as I'm talking about 2D, I should rather say I needed the bounding rectangle of a circular sector - a part of a circle with a limited angle around an axis pointing in a specific direction.

When developing a 2D game, this shape can represent, for example, the area of effect of an attack, such as punching nearby enemies, firing a shotgun, spraying some substance ahead, or casting a magical spell. Calculating its bounding rectangle can be useful for querying a space-partitioning data structure (like a grid, a quadtree, etc.) for potentially affected objects.

I prototyped my solution in ShaderToy, which you can see here: shadertoy.com/view/w3jcRw.

A circular sector is described by:

The output bounding rectangle is described by just vec2 MinPos, MaxPox - two points defining the minimum and maximum coordinates it contains.

To calculate the bounding rectangle of our cone, we need to consider all possible points that extend the furthest along the X and Y axes, and take their min/max. The first such point is the apex. The next two are what I call "edge points."

However, there are cases where this is not enough. We also need to check four "extra points" located at a distance of radius from the apex along -X, +X, -Y, +Y, as long as each of these points belongs to the cone.

My final algorithm in GLSL is:

void CalcConeBoundingRect(vec2 apex, vec2 direction, float halfAngle, float radius,
    out vec2 boundingRectMinPos, out vec2 boundingRectMaxPos)
{
    float sinHalfAngle = sin(halfAngle);
    float cosHalfAngle = cos(halfAngle);
    vec2 edgeVec1 = vec2(
        direction.x * cosHalfAngle - direction.y * sinHalfAngle,
        direction.y * cosHalfAngle + direction.x * sinHalfAngle);
    vec2 edgeVec2 = vec2(
        direction.x * cosHalfAngle + direction.y * sinHalfAngle,
        direction.y * cosHalfAngle - direction.x * sinHalfAngle);
    vec2 edgePoint1 = apex + edgeVec1 * radius;
    vec2 edgePoint2 = apex + edgeVec2 * radius;
    boundingRectMinPos = min(min(edgePoint1, edgePoint2), apex);
    boundingRectMaxPos = max(max(edgePoint1, edgePoint2), apex);
    
    vec2 unitVec[4] = vec2[](
        vec2(-1.0, 0.0), vec2(1.0, 0.0),
        vec2(0.0, -1.0), vec2(0.0, 1.0));
    for(int i = 0; i < 4; ++i)
    {
        if(dot(unitVec[i], direction) >= cosHalfAngle)
        {
            vec2 extraPoint = apex + unitVec[i] * radius;
            boundingRectMinPos = min(boundingRectMinPos, extraPoint);
            boundingRectMaxPos = max(boundingRectMaxPos, extraPoint);
        }
    }
}

Note that we don't use raw angles here, apart from the initial parameter. We don't call the atan2 function, nor do we compare whether one angle is smaller than another. We simply operate on vectors - a common theme in well-designed geometric algorithms.

The algorithm can be optimized further if we store the sine and cosine of the angle in advance. Alternatively, if we have only one of them, we can compute the other using the formula below. This way, we never need to use the raw angle value at all.

float sinHalfAngle = sqrt(1.0 - cosHalfAngle * cosHalfAngle);

EDIT: Big thanks to Matthew Arcus for suggesting an improvement to the code! I applied it to the listing above.

Add post to Blinklist Add post to Blogmarks Add post to del.icio.us Digg this! Add post to My Web 2.0 Add post to Newsvine Add post to Reddit Add post to Simpy Who's linking to this post?