Sam MacPherson

Flash, Haxe, Game Dev and more…

Category Archives: optimization

2D GPU-Accelerated Rendering with Molehill/HxSL

Ok, so I’ve been working on this rendering problem for probably a little over a month now. The problem being that Flash’s vector renderer is just too slow for my needs.

My first approach which I illustrated in the last post was to use blitting with a cached store for repeated affine transformations. I ended up with a decent renderer more or less, but it was not without issues. For one, it did not work very well in general cases. I could only get decent performance with very contrived examples which of course is not very helpful.

Another issue with the blitting renderer was the complexity of the code. I was not aware when I started out just how complicated it would be to optimize the code. I went through several revisions with many hours of hair pulling before I got something that was reasonable (Reasonable meaning 30 fps).

When I started working on the blitting engine I was aware of the release of Flash Player 11 beta and molehill, but I was determined to get my own version working. I don’t really have any good reasons for not using molehill right away as a rendering engine other then perhaps my own ignorance. That shortly changed after I started playing around with the molehill api using HxSL (Haxe Shader Language) – http://haxe.org/manual/hxsl.

If you don’t know what HxSL is dont worry. For now you can just remember it as an easier way of writing Shaders (Computer Programs) for the GPU. Currently the adobe alternative involves writing low-level assembly code which is not very pretty. Once again Haxe is on the fore-front of Flash technology.

At first glance 3D programming seems very complicated. I myself have never done anything with the graphics card. I was actually surprised at how quickly I picked the whole thing up. I’m not going to explain how to use HxSL or program 3D applications. There are a ton of tutorials out there. The ones I used were http://haxe.org/doc/advanced/flash3d for examples on how to use HxSL and http://lab.polygonal.de/2011/02/27/simple-2d-molehill-example/ as well as some conceptual stuff on matricies. This post is about the 2D rendering engine I made using Haxe.

When I started learning molehill about two weeks ago I was surprised at how few examples there were for HxSL specifically 2D rendering with HxSL. Really the only things I could find were the general 3D examples in HxSL and a whole ton of actionscript examples. I did find another 2D rendering engine written in actionscript (https://github.com/egreenfield/M2D), but I wanted a solution in Haxe! So I decided to do it myself.

So after some learning/experimentation I was finally ready to put it all together. Basically the idea is to setup a simulated 2D environment by fixing the camera (the screen) and representing all the display objects as flat rectangles that ‘hover’ slightly in front of the camera.

The end result looks identical to a normal 2D environment (Excuse my bad 3d drawing skills).

So let’s get into some code!

private function _initFrame ():Void {
     _s = flash.Lib.current.stage.stage3Ds[0];
     _s.viewPort = new Rectangle(0, 0, SCREEN_WIDTH, SCREEN_HEIGHT);
     _s.addEventListener(Event.CONTEXT3D_CREATE, _onReady);
     _s.requestContext3D();
}
private function _onReady (e:Event):Void {
     _c = _s.context3D;
     _c.configureBackBuffer(Std.int(_s.viewPort.width), Std.int(_s.viewPort.height), ANTI_ALIAS, true);

     //Setup projection matrix
     _mproj = new Matrix3D();
     _mproj.appendTranslation(-SCREEN_WIDTH/2, -SCREEN_HEIGHT/2, 0);
     _mproj.appendScale(2/SCREEN_WIDTH, -2/SCREEN_HEIGHT, -1);
     _mproj.appendTranslation(2/SCREEN_WIDTH, 2/SCREEN_HEIGHT, 1);

     //Setup shader
     _shader = new Shader(_c);
     _ready = true;
}

This may look scary, but most of the code above is just boilerplate code. The one piece that isn’t boilerplate code is the projection matrix. Now the math behind projection matrix can get complicated fast, but you can think of it like this. The projection matrix can be thought of as a camera and we need to translate pixels in the 3D world onto this 2D screen. We do this using the projection matrix. According to the code above, we fix the camera at (0, 0, 1) facing towards the origin and we apply a perspective change to make the x/y plane at z=0 show exactly (SCREEN_WIDTH x SCREEN_HEIGHT) units. This will give us the exact same setup as a regular 2D environment.

@:shader({
     var input:{
          pos:Float3,
          uv:Float2
     };
     var tuv:Float2;
     function vertex (mpos:M44, mproj:M44) {
          out = pos.xyzw * mpos * mproj;
          tuv = uv;
     }
     function fragment (t:Texture) {
          out = t.get(tuv);
     }
}) class Shader extends format.hxsl.Shader {
}

Now here is where the beauty of HxSL comes in. The above piece of code is a Shader. If you did this in actionscript you would of had to write that in assembly. Yuck! I don’t want to get into too much detail as to how this shader works as that is not really the purpose of this tutorial. If you are interested you can read the HxSL documentation (http://haxe.org/manual/hxsl). It’s a pretty basic shader.

private inline function _render ():Void {
     //Clear last render and setup next one
     _c.clear(0, 0, 0, 0);
     _c.setDepthTest(true, Context3DCompareMode.ALWAYS);
     _c.setCulling(Context3DTriangleFace.BACK);
     _c.setBlendFactors(Context3DBlendFactor.SOURCE_ALPHA, Context3DBlendFactor.ONE_MINUS_SOURCE_ALPHA);

     //Render children and display
     _renderChild(this);
     _c.present();
}
private function _renderChild (child:CanvasObject):Void {
     var frame:Frame = child.getFrame();
     if (frame != null) {
          _shader.init(
               { mpos:child.getStageTransform(), mproj:_mproj },
               { t:frame.texture }
          );
          _shader.bind(frame.vbuf);
          _c.drawTriangles(frame.ibuf);
     }
     for (i in 0 ... child.getSize()) {
          _renderChild(child.get(i));
     }
}

The above code is performed once every frame update. The function _render() clears the last render and sets up the properties for the next one. You don’t have to worry too much about that. The magic comes in from the _renderChild() function.

To start you may be wondering what the CanvasObject class is. It is not part of molehill. It is my own top-level class that represents an graphics object. The implementation of CanvasObject is extensive and not part of this tutorial. Basically you need to concentrate on these two functions:

CanvasObject.getFrame():Frame;
CanvasObject.getStageTransform():Matrix3D;

getStageTransform() will return a 3D matrix which has rotations/translations/scalings/etc (All the standard 2D transformations) to get the object into stage coordinates. In the simplest case you can just return an identity matrix and have the graphic be drawn to [0,0] (Really (0, 0, 0)).

The other function getFrame() returns a Frame object which is basically just a wrapper class for a bunch of properties. The three most important of which are:

Frame.texture:Texture;
Frame.vbuf:VectexBuffer3D;
Frame.ibuf:IndexBuffer3D;

All of which are part of the molehill API. There are a bunch of tutorials out there on how you create these objects, but for the case of 2D bitmaps the setup is static.

Since the graphics card can only draw triangles we need to draw two triangles for each bitmap to make a rectangle. First we create the vertex buffer:

vbuf = c.createVertexBuffer(4, 5);
var vpts:flash.Vector<Float> = new flash.Vector<Float>();
vpts.push(bounds.xmin);
vpts.push(bounds.ymin);
vpts.push(0);
vpts.push(0);
vpts.push(0);

vpts.push(bounds.xmax);
vpts.push(bounds.ymin);
vpts.push(0);
vpts.push(bounds.intervalX / bmdPow2.width);
vpts.push(0);

vpts.push(bounds.xmin);
vpts.push(bounds.ymax);
vpts.push(0);
vpts.push(0);
vpts.push(bounds.intervalY / bmdPow2.height);

vpts.push(bounds.xmax);
vpts.push(bounds.ymax);
vpts.push(0);
vpts.push(bounds.intervalX / bmdPow2.width);
vpts.push(bounds.intervalY / bmdPow2.height);
vbuf.uploadFromVector(vpts, 0, 4);

In the code above we define 4 verticies. Each vertex has 5 coordinates defined. The first 3 are (x, y, z) (Notice how the z coordinate in all four verticies is 0). The last two are (u,v) coordinates which map the (x,y,z) to the texture. The reason behind the division in the (u,v) coord is because the bounds might not be a power of 2. The graphics driver requires that all textures have dimensions that are powers of 2. The bounds variable is just a rectangle which defines the bounds of the bitmap.

ibuf = c.createIndexBuffer(6);
var ipts:flash.Vector<UInt> = new flash.Vector<UInt>();
ipts.push(0);
ipts.push(1);
ipts.push(3);

ipts.push(0);
ipts.push(3);
ipts.push(2);
ibuf.uploadFromVector(ipts, 0, 6);

The index buffer just links together the verticies to define two triangles. There are four verticies we defined in the vertex buffer indexed by (0-3). So we link vertex 0, 1 and 3 to form the first triangle and 0, 3 and 2 to form the second. Voila we have defined a 2D sprite which can be written to the screen. Well almost.

texture = c.createTexture(bmdPow2.width, bmdPow2.height, flash.display3D.Context3DTextureFormat.BGRA, false);
texture.uploadFromBitmapData(bmdCpy);

We have to upload the image into a texture. Ok, now we are done.

So there we have it, and if you pre-compute every vector graphic into a bitmap then you can render everything through this method REALLY fast. I have yet to fully test this, but initial results are looking good. Full 30 FPS on my current zombie game project (Even in software rendering mode).

I know the above code is sort of in bits and pieces, but I wanted to describe how to do 2D rendering in general so I pulled the code directly from my game dev library. I will probably release my game dev library into the open source world sometime in the near future. It has support for converting MovieClips/Sprites into my rendering framework as well as a unified asset loading system.

Cheers.

Advertisements

Blitting with Caching = Real Time Rendering

Ok, so this is my first time blogging or even really publishing my thoughts anywhere. There are a few things I like to share from time to time so I figured I would start a blog to publish some ideas/experiments of mine. A few of you may already know me under the alias Blank101 on pawngame.com. For those of you who don’t know me, I design video games with my friend Justin (alias JPillz). I am also a CS undergrad at the University of Waterloo, Canada.

Mostly this blog will be concerned with actionscript and haxe with some java sprinkled in. Specifically relating to game design and programming. I may also decide to move this blog to our new site once it is ready.

Well the reason I started this blog in the first place was to write about something I achieved today so I’ll get right to it…

So up until recently I’ve been doing all my rendering using the flash player’s built in vector renderer. I don’t really have a good reason for doing this other then just I hadn’t considered an alternative. That was until I had a talk with Sean McGee (Creator of games like Thing-Thing, etc) at the FGS this year. He told me about the widely known concept of blitting and we went over it for a bit.

For those of you who don’t know what blitting is. Blitting is a way of rendering a game by using the fast BitmapData.copyPixels() method. A practical way of utilizing this is to pre-render all of your vector graphics as bitmaps using the BitmapData.draw() method. Then to draw the graphic you just call BitmapData.copyPixels().

However, there are some drawbacks to blitting. For one the BitmapData.copyPixels() method does not allow general affine transformations (rotation, scaling, etc). This is a problem.

The obvious solution to this problem is to figure out which assets will need to be rotated/scaled and pre-render all possible orientations at some small increment epsilon. This will of course work, but at the cost of huge memory consumption. To give you an idea of how much memory we are talking say we had a small 50×50 pixel movieclip with 10 frames and pre-rendered all the images at increments of 5 degrees. A single render will be 10KiB. So that gives 10KiB*10 frames*(360/5 renders per frame) = ~7MiB to store this movieclip. Now this may be acceptable if you have only a few different assets, but if you have say ~1000 different assets then this will add up. I don’t know many machines with 7GiB of ram available to the flash player.

So my idea was to only store the base images without any rotations/scaling and allow the user to choose if they want to speed up rendering by including an option to cache recently rendered affine transforms in a global cache. This may not be appropriate in all cases, but if you have a lot of instances of the same movieclip playing over and over while rotating then the speed up will be very noticable. Not only will this cache the base assets, but it can also cache static images that have been generated on-the-fly.

In essence you get the full flexability of the flash player renderer with an optional caching option for similar images that need to be transformed a lot. An example of where this would be useful is when you are rendering a lot of enemies that all use the same graphic.
Now the memory usage can still be fairly high depending on the numbers you give the cache, so I have included a QUALITY property which is a number between 0 and 1 which gets factored with the dimensions of the images to increase render time/reduce memory usage. The factor is quadratically related to the memory usage so this is good news if you don’t mind a loss of quality for a LOT of memory saved.

I was going to upload an example, but it seems like too much of a hastle on here. I will start posting flash examples when I transfer this blog to my new site.

Also, I welcome any feedback on my writing style. This is my first time doing this so let me know if my writing is too verbose, not verbose enough, etc.