Better Vertex Cache Optimised Index Buffer Compression

I’ve had a second pass at the compression/decompression algorithm now, after some discussion with davean on #flipCode. The new iteration of the algorithm (which is available alongside the original) is a fair bit faster for decompression, slightly faster for compression and has a slightly better compression ratio. The updated algorithm doesn’t however support degenerate triangles (triangles that have a duplicate vertex indice), where as the original does.

The original is discussed here.

The main change to the algorithm is to change from per vertex codes to a single composite triangle code. Currently for edge FIFO hits there are two 2 bits codes and for other cases, three 2 bit codes. There are a lot of redundant codes here, because not all combinations are valid and some code combinations are rotations of each other. The new algorithm rotates the vertex order of triangles to reduce the number of codes, which lead to one 4 bit code per triangle. This nets us a performance improvement and a mild compression gain.

There were also some left over codes, so I added a special encoding that took the edge + new vertex triangle case and special cased it for the 0th and 1st edges in the FIFOs. This could potentially net very big wins if you restrict your ordering to always use a triangle adjacent to the last one, trading vertex cache optimisation for compression.

Here’s the performance of the old algorithm:

MODEL	TRIANGLES	UNCOMPRESSED	COMPRESSED	COMPRESSION TIME	DECOMPRESSION TIME
Bunny	69,630	835,560	111,786	3.7ms	1.45ms
Armadillo	345,944	4,151,328	563,122	17ms	6.5ms
Dragon	871,306	10,455,672	1,410,502	44.8ms	16.6ms
Buddha	1,087,474	13,049,688	1,764,509	54ms	20.8ms

And here is the performance of the new one:

MODEL	TRIANGLES	UNCOMPRESSED	COMPRESSED	COMPRESSION TIME	DECOMPRESSION TIME
Bunny	69,630	835,560	108,908	2.85ms	0.92ms
Armadillo	345,944	4,151,328	547,781	15.9s	4.6ms
Dragon	871,306	10,455,672	1,363,265	36.3ms	11.8ms
Buddha	1,087,474	13,049,688	1,703,928	46ms	14.7ms

Overall, there seems to be a 30%-40% speedup on decompression and a modest 3% improvement on compression.

Also, I’d like to thank Branimir Karadžić, whose changes I back-ported, which should hopefully add support for 16bit indices and MSVC 2008 (unless I’ve broken them with my changes - I don’t have MSVC 2008 installed to test).

If you haven’t checked out the source, it’s available on GitHub.

← Previous Archive Next →

blog comments powered by Disqus

Published

30 September 2014

Better Vertex Cache Optimised Index Buffer Compression

Published

Category

Tags