Once upon a time, when > was faster than < ... Wait, what?

c optimization opengl cpu gpu

If I understand correctly, performance-wise, flipping the sign of Z and the depth test is nothing but changing a < comparison to a > comparison. So, if I understand correctly and the author isn't lying or making things up, then changing < to > used to be a vital optimization for many games.

I didn't explain that particularly well, because it wasn't important. I just felt it was an interesting bit of trivia to add. I didn't intend to go over the algorithm specifically.

However, context is key. I never said that a < comparison was faster than a > comparison. Remember: we're talking about graphics hardware depth tests, not your CPU. Not operator<.

What I was referring to was a specific old optimization where one frame you would use GL_LESS with a range of [0, 0.5]. Next frame, you render with GL_GREATER with a range of [1.0, 0.5]. You go back and forth, literally "flipping the sign of Z and the depth test" every frame.

This loses one bit of depth precision, but you didn't have to clear the depth buffer, which once upon a time was a rather slow operation. Since depth clearing is not only free these days but actually faster than this technique, people don't do it anymore.

c optimization opengl cpu gpu

The answer is almost certainly that for whatever incarnation of chip+driver was used, the Hierarchical Z only worked in the one direction - this was a fairly common issue back in the day. Low level assembly/branching has nothing to do with it - Z-buffering is done in fixed function hardware, and is pipelined - there is no speculation and hence, no branch prediction.

c optimization opengl cpu gpu

Optimization like that will hurt performance on many embedded graphics solutions because it will make framebuffer resolve less efficient. Clearing a buffer is a clear signal to the driver that it does not need to store and restore the buffer when binning.

Little background information: a tiling/binning rasterizer processes the screen in number of very small tiles which fit into the on-chip memory. This reduces writes and reads to external memory which reduces traffic on memory bus. When a frame is complete (swap is called, or FIFOs are flushed because they are full, framebuffer bindings change, etc) the framebuffer must be resolved; this means every bin is processed in turn.

The driver must assume that the previous contents must be preserved. The preservation means that the bin has to be written out to the external memory and later restored from external memory when the bin is processed again. The clear operation tells the driver that the contents of the bin are well defined: the clear color. This is a situation which is trivial to optimize. There are also extensions to "discard" the buffer contents.

CodeHunter

Once upon a time, when > was faster than < ... Wait, what?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last