|sam2 on 2D Dynamic Lighting Demo|
|Sam on 2D Dynamic Lighting Demo|
|sam on 2D Dynamic Lighting Demo|
|Nathan Beam on Web 2.0 with Haxe|
|Sam on 2D Dynamic Lighting Demo|
Flash, Haxe, Game Dev and more…
Once again I’ve revisited my rendering system. This time with the intention of switching over to a batched setup instead of a one-draw-call-per-sprite setup. Overall the switch was relatively painless, but I did learn some lessons along the way. I wanted to share some of the pitfalls that I experienced.
First, I will briefly explain what a batched sprite rendering system is. The basic idea behind batched sprite rendering is that everytime you call drawTriangles() you incur some overhead. If you are using a naive setup (Like I was) you are probably calling drawTriangles() once for every sprite. This is okay for a few sprites, but once you get a couple thousand it becomes extremely inefficient. Basically the name of the game is to minimize the number of drawTriangles() calls. To do this, we don’t immediately render the sprite when render is called. Instead we batch the sprite’s verticies onto a global vertex buffer with the intent of rendering everything at the end with one drawTriangles() call. Simple. Well not really. Using this method has some implications:
1. We can no longer use the GPU to apply sprite-specific transforms.
2. A batch of sprites must share the same texture.
The implications of (1) mean that we must do the coordinate transforms on the CPU. This not the best, but it is unavoidable. If you have a reasonable amount of sprites (Say a couple thousand) then this should be fine.
Because of (2) we must make seperate drawTriangles() calls everytime we need to switch textures. But wait hang on. If we need to make another call everytime we switch textures then doesn’t that leave us where we started seeing as different sprites will likely have different images? The answer is yes — and no. First and foremost you can group sprites of the same image into the same draw call. However, you can even go one step further.Instead of allocating a texture for every image we allocate a global store of massive textures (2048×2048 pixels). We then stamp in all the smaller textures and give the render jobs appropriate U,V coordinates. Very nice! If your game has a lot of small sprites this will be lightning fast. Probably under 3 draw calls.
Ok so we go ahead and do this and oops we have another problem. Because we are grouping the sprites by texture they will no longer necessarily be sorted by depth. Ok, so this is a set-back, but perhaps we could just batch the sprites until we encounter a texture change then flush the buffer and start again. This will of course work, but it is only efficient if sprites of similar depth also share the same texture which may not be true. For example when testing this with my game I went from 3 draw calls to about 300. This is unacceptable.
Well we are rendering on a 3D graphics card so why not use the depth buffer to do the sorting for us! Every frame update we assign a global depth value to each sprite accordingly and enable the depth buffer. This may appear to be the best solution possible, but it does have one major flaw. You can’t use translucent textures. The reason behind this is the depth buffer does not understand alpha compositing. All it understands is geometry. Either a triangle is blocking something behind it or it is not. This is where we have to make a decision. There are four equally valid solutions that I have come up with.
General Purpose Solutions
1. Fall back to the render on texture switch method and do some optimizations to try and group depth-locality with texture-locality. (Could be optimal depending on setup)
2. Render all opaque images first entirely on the GPU. Then do transparent images after using method (1). (Works very well if there isn’t many transparent images)
3. Only use opaque textures. (Optimal)
4. Only use textures with quantized alpha values of either 0 or 1. (Optimal, but requires an extra instruction in the shader)
All of these methods have there strengths and weaknesses. Personally I decided to go with method (4) for my game. Method (4) is very similar to method (3). They only differ by one instruction in the shader. For (4) you include a KIL opcode (kill() in hxsl) which when the alpha channel is less than one will abort the pixel and depth buffer writes.
There are definitely other solutions out there, but these were the best ones I could come up with after working on this for several hours. Hope this helps.