Progress Report November 2017
Posted on Nov 01, 2017
Here's a short update regarding what has been going on with the engine recently, and my plans for the near future.
In general the development is going as strong as ever, and initial feature set for version 1.0 is nearing completion. It's great to see that after so much work, last pieces are finally falling into place. Aside from adding new major features I've also done a lot of refactoring and cleanup (as usual) to ensure I have a well designed, quality codebase as a foundation for the future, which is my priority right now.
While internals have been getting cleaned up (even more than they already were), the editor has been gathering dust for a while. It still sorely lacks polish and some basic usability fixes. But since the C++ side of things are getting near completion, and are in a much better state than the editor, I decided to focus on publishing v1.0 of the C++ framework first, before the release of the editor.
So the editor will remain in the dust for a while longer, but once I start working on it, it should be fairly quick to polish up and potentially fix what I broke lately. All the basic systems and internals are there, they just need minor tweaks to get it up to a standard I feel comfortable with. Its release will likely follow just a few months after the framework.
Continue reading for specifics of what was added over the course of the last few months.
I'm happy to announce Banshee (both the engine and the editor) now run on Linux. All the features, from low level utilities like filesystem, to windowing, OpenGL, Vulkan and high-level editor features like folder monitor, drag and drop and a variety of others are now fully functional. MonoDevelop support was also integrated with the editor to help you write C# scripts.
I'm sure there will still be minor issues to iron out, especially with Linux distibutions I didn't have a chance to test out, but the bulk of the work is done and any potential issues should be fairly trivial to fix.
I expected GPU driver issues regarding high end OpenGL or Vulkan rendering, as I had them on Windows with some less commonly used features for certain high-end effects, but luckily I haven't encountered any.
It took about two months total to port to Linux, which I believe is pretty impressive considering and scope and how intrusive a procedure like porting is. I think it's a good testament to Banshee's design. Also much of the time was spent on me actually learning the needed APIs as this was technically my first Linux port.
A lot of the time was spent on fixing the build as it was very Windows focused. This will certainly help with further ports to come and make them easier.
I've completed the bulk of the work on the renderer a couple of months ago (just before I started work on the Linux port). All the fancy features listed last time are functional across DirectX, OpenGL and Vulkan API. (See the feature list for everything that was implemented).
I wanted to post some screenshots showing off the new renderer, but you'll have to wait for them a bit longer. I've still yet to set up a better (prettier) test scene that's more representative of what the engine can do, plus I still have a few minor features to wrap up.
There have also been a few (previously unplanned changes) to the renderer, primarily in its design but also in the quality/performance department. I've pretty much completly rewritten the renderer internals so they are much more neatly designed and easier to follow and modify. I haven't seen any engine so far with such renderer design and I believe it will make adding new effects, profiling, debugging and even creating separate rendering paths (e.g. for mobile) much easier.
The main changes were:
Renderer material variations
RendererMaterial class was created to allow those writing renderer effects (i.e. shaders) to easily reference them, set up their parameters and execute the effect. This encapsulated the entire effect in a nice clean interface and essentially allowed external code to execute the effect with a single method call.
This was the approach I also noticed they use in Unreal engine's internals. There were two problems with this approach however:
- The RendererMaterial class has to be instantiated in order to be used. Which doesn't sound bad on its own, but you generally only need one effect, which means these tend to be static. And the problem with statics is that they get destroyed after the main application exits. This is bad since RendererMaterial will reference a lot of resources that don't live as long. So you need to worry about manually destroying each such instance as well.
- The second, bigger problem were variations. Rarely you only have a single version of an effect (shader). Generally there are some #defines that allow you to tweak the effect quality and similar options (e.g. number of MSAA samples, enable high or low quality path, etc.). The way I handled that, and the way I noticed they also do it in Unreal, was to use template parameters. This way each variation would essentially be its own RendererMaterial.
- First problem was that internally this caused compiler to duplicate a lot of code
- And the second, more important problem was it made a huge mess when the code needed to choose which variation to use. Since each variation has its own class you'd have to use switch statements to pick the one you require. Optionally you could use polymorphism but in that case you need to create a base class interface, which is boilerplate code - and one of the main purposes of RendererMaterial is to make writing effects easier. And this became worse the more methods RendererMaterial code had, as external code had to switch for any method call. It got even worse if the variations had multiple parameters, as every parameter combination had to be handled. Additionally since each RendererMaterial had to be instantiated, each variation also had to be instantiated explicitly.
In short it was messy and needed quite a lot of boilerplate code, especially for variations with a lot of parameters. I wanted something cleaner so I went with a non-templated approach. Instead all RendererMaterials register all potential variations during static initialization, and a static get method is provided that simply retrieves the requested variation. The get method also returns the same class of the material requested, regardless of the variation, meaning the calling code can be identical regardless of the variation used, and we don't even need to use polymorphism (RendererMaterial simply points to a different Material instance that was compiled with different #defines).
It's a simple concept but it greatly simplifies the code, at the cost of almost non-existent performance overhead (the material variation lookup). This change had me removing hundreds of lines of boilerplate code, making the whole renderer neater.
Node based compositor
New variation handling code solved part of the problem, but much of the renderer code was still very spaghetti-like. This is primarily because there is an extensive set of renderer options that allow you to tweak and enable/disable various effects. And each effect usually depends on some previous effect, which might and might not be enabled. All in all there end up being a lot of ifs and other kind of conditionals, and the problem only gets worse as new effects are added, or when existing ones are updated (as making changes can be pretty hellish).
My approach to resolving this issue is a pretty simple concept, but it works really well and I haven't seen it used in other engines. Most engines, even many large AAA ones go with the spaghetti approach described above. Closest thing I've seen is a post-processing effect graph in some engines, except Banshee's version works with all renderer effects, not just post-processing. The idea is simply to split each effect into its own "node". Each node has a set of inputs, a set of outputs, and a set of dependencies.
Each node can then implement an effect like "render to G-buffer", "perform lighting", "build hi-z", etc. Internally the nodes just create the necessary output resources (like textures and buffers) and call one or multiple RendererMaterials, which further encapsulate the work required. This makes each effect usually less than 100 lines of code and very simple.
The render compositor system can then dynamically "patch up" different path through the nodes depending on enabled options and node dependencies. No complex spaghetti code and each effect is neatly encapsulated in its own little box, making the rendering code even easier to understand and modify. Each node pulls its own dependencies, and nodes re-use dependent effects, making sure nothing unnecessary is rendered.
Additionally this approach allows for easy profiling, as each node execution can be trivially surrounded with a GPU query, as well as easy debugging (e.g. contents of each node outputs can be dumped for debugging purposes).
Most importantly it makes adding or removing effects (i.e. nodes) easy, as well as the ability to create different node trees depending on the feature set (e.g. desktop vs. mobile). Later on it should be possible to extend the system to allow users to add new nodes through a scripting interface, taking the renderer extensibility to a whole new level.
Unlike the last two changes, which focus on the design, this one can significantly improve rendering quality and/or performance. I won't go into details about it, but the gist of it that by combining rendering results from previous frames you can spread out various shading calculations over those frames.
As one example take SSAO, which needs to take a bunch of samples for every pixel, in order to determine its occlusion value. Instead of taking 64 samples every frame, you can take 8 samples every frame and then blend the effect over 8 frames. This works for variety of effects, including SSR or just general-purpose antialiasing (e.g. MSAA takes 4 samples per frame, Temporal AA takes 1, yet the quality is nearly the same + temporal also solves any temporal aliasing issues like shimmer, so it's potentially even better quality, at almost quarter the cost).
The main problem with this approach is of course camera and object movement. It only works if the pixels blended represent the same surface as they were last frame. If camera and scene is still, we have no problem, but that is rarely the case. But it turns out that using some fairly simple heuristics by tracking camera movement and object movement (through a velocity buffer), comparing the distance and (more importantly) color similarity, the effect works really well.
I've implemented a general-purpose temporal filtering algorithm that's quite customizable, and is currently used for the SSR effect. Eventually I'll add it for the SSAO effect and also add a full on temporal anti-aliasing effect. They should be pretty simple to add now that the filtering algorithm is done, but I decided I didn't want to spend even more time on unplanned renderer features, so they'll come a bit later.
I can see this being used in a lot of areas, and many large engines have already adopted it.
I've started work on the macOS port, which is the final major feature before v1.0. The macOS specific features should be implemented fairly quickly and easily, but what worries me is that macOS only supports OpenGL 4.1. And this is a fairly ancient OpenGL version that doesn't support some major rendering effects which were based on OpenGL 4.5 (like compute :( ).
I've still yet to explore this but one solution would be to add Metal render API support. It wouldn't be impossible but I would then also need to add cross-compilation for Metal shaders as well. Also not impossible, but those features would be another ~3 months of work at least, which I'm not willing to afford at the moment. It is definitely something I want to do in the future however.
Instead I'm likely going to create a separate versions of rendering effects not supported on OpenGL 4.1 (i.e. a separate rendering path). The quality loss should hopefully be minimal, and those versions will then also serve for the mobile version of the renderer (since mobiles also have limited capabilities). This means I might be adding mobile versions of the engine sooner than planned - their ports should be even simpler than Linux/macOS as editor doesn't need to be ported, and it uses a lot of platform specific functionality not used during normal runs.
Another thing I'm slowly working on, is porting all the boilerplate scripting glue code to the automated ScriptBindingGenerator approach. This system allows me to "tag" different C++ elements for export, and the tool (based on Clang) does the rest - it's very intuitive and super easy to export C++ code to the scripting world (takes literally seconds!). Most components/resources have already been ported, and when I'm completely done it should be fairly easy to auto-generate a set of SWIG files, exposing the entire API to other languages such as Pyhton, Java or Lua!