Better Vertex Cache Optimised Index Buffer Compression
I’ve had a second pass at the compression/decompression algorithm now, after some discussion with davean on #flipCode. The new iteration of the algorithm (which is available alongside the original) is a fair bit faster for decompression, slightly faster for compression and has a slightly better compression ratio. The updated algorithm doesn’t however support degenerate triangles (triangles that have a duplicate vertex indice), where as the original does.
The original is discussed here.
The main change to the algorithm is to change from per vertex codes to a single composite triangle code. Currently for edge FIFO hits there are two 2 bits codes and for other cases, three 2 bit codes. There are a lot of redundant codes here, because not all combinations are valid and some code combinations are rotations of each other. The new algorithm rotates the vertex order of triangles to reduce the number of codes, which lead to one 4 bit code per triangle. This nets us a performance improvement and a mild compression gain.
There were also some left over codes, so I added a special encoding that took the edge + new vertex triangle case and special cased it for the 0th and 1st edges in the FIFOs. This could potentially net very big wins if you restrict your ordering to always use a triangle adjacent to the last one, trading vertex cache optimisation for compression.
Here’s the performance of the old algorithm:
MODEL | TRIANGLES | UNCOMPRESSED | COMPRESSED | COMPRESSION TIME | DECOMPRESSION TIME |
---|---|---|---|---|---|
Bunny | 69,630 | 835,560 | 111,786 | 3.7ms | 1.45ms |
Armadillo | 345,944 | 4,151,328 | 563,122 | 17ms | 6.5ms |
Dragon | 871,306 | 10,455,672 | 1,410,502 | 44.8ms | 16.6ms |
Buddha | 1,087,474 | 13,049,688 | 1,764,509 | 54ms | 20.8ms |
And here is the performance of the new one:
MODEL | TRIANGLES | UNCOMPRESSED | COMPRESSED | COMPRESSION TIME | DECOMPRESSION TIME |
---|---|---|---|---|---|
Bunny | 69,630 | 835,560 | 108,908 | 2.85ms | 0.92ms |
Armadillo | 345,944 | 4,151,328 | 547,781 | 15.9s | 4.6ms |
Dragon | 871,306 | 10,455,672 | 1,363,265 | 36.3ms | 11.8ms |
Buddha | 1,087,474 | 13,049,688 | 1,703,928 | 46ms | 14.7ms |
Overall, there seems to be a 30%-40% speedup on decompression and a modest 3% improvement on compression.
Also, I’d like to thank Branimir Karadžić, whose changes I back-ported, which should hopefully add support for 16bit indices and MSVC 2008 (unless I’ve broken them with my changes - I don’t have MSVC 2008 installed to test).
If you haven’t checked out the source, it’s available on GitHub.
blog comments powered by Disqus