raytracing 187,500 voxels in the browser with only 13 kilobytes

js13k is a web game development competition that happens every year from August 13th - September 13th. Participants are challenged during that time to build a game around a specific theme. The catch is that the final submitted zip file for the game must be less than or equal to 13 kilobytes. That is exactly 13,312 bytes.

I participated for the first time back in 2018 and won 14th place that year. Career and life obligations have prevented me from participating in the years since. Unfortunately, this year I was not able to come up with an idea for a game in time. At least one I thought was interesting enough. I went through many prototypes that never went further than that.

So this year, I ended up submitting my entry “F-Stop” to the “Unfinished” category as an engine tech demo.


For a while now, I have been fascinated with the game Teardown and its approach to rendering. I thought js13k was the perfect opportunity to try and implement a similar voxel engine. The challenge of getting it to run in the browser with only 13 kilobytes was also intriguing.

Teardown has a unique and somewhat unconventional approach to rendering. What stands out immediately is the size of its voxels. They are incredibly tiny compared to other voxel based engines, such as Minecraft. And there are millions of them on the screen. How is this possible?

Teardown


In normal rasterization, 3D triangle data is sent to the GPU and then 2D pixels are returned based on those triangles. Each one of these pixels can then be shaded using vertex and fragment shaders.

rasterization pipeline


In Teardown and F-Stop, the entire scene is instead rendered entirley inside of these fragment shaders using ray tracing. A single ray (one for each pixel) is shot from the camera, through each pixel on the screen, and out into the scene. For example, a screen resolution size of 964 x 732 would send 705,648 rays from the camera. The scene is then drawn based on what objects these rays intersect with as they travel through the scene.

ray tracing


Since voxels are simple box shapes, we can perform ray-box intersection tests to see if the ray hits any voxels in its path. However, this requires the world coordinates of the voxel to be known. You might be wondering, “how in the world are we supposed to do a ray-box intersection test for potentially millions of voxels, for every ray, every frame?” If you think that sounds inefficient, then you are correct.

If we want our application to actually run, then there is no way we can perform millions, or even thousands, or even hundreds of individual ray-box intersection tests, for every ray, every frame. Even if we were NOT trying to keep the code under 13 kilobytes. It is impossible (Maybe hundreds is possible for a really powerful GPU? I don’t know). To solve this problem, we can store each voxel inside of a 3D texture. Then step through this volume of voxels and determine which voxels the ray hits.

ray traversal through a voxel volume


In the F-Stop demo, the camera always sits slightly above this volume. It is a 250 x 3 x 250 texture with space for 187,500 voxels of size 0.1 each. Each voxel has a coordinate that can range from (0, 0, 0) to (249, 2, 249). Using JavaScript, we pass the texture to the shader as the uniform uVoxelTexture.



Then using GLSL texelFetch, we can quickly look up the information about any voxel in the texture.



This is done by determining the point where each ray first intersects with the bounding box of the volume, and then calculating the (x, y, z) coordinates of each voxel the ray hits as it travels straight through the volume. The best part is that each ray can quickly perform these tests in parallel with every other ray, every frame. The GPU magic of shaders!

Lighting and shadows are calculated by allowing each ray to bounce around the scene multiple times every frame. The max number of bounces for each ray is set to 3. I found this to be the sweet spot for balancing nice visuals with good performance. This is able to run pretty well (even on my not so great GPU) because the texture height is only 3. Unless the camera is facing down vertically along the negative Y axis (Y is vertical here), then many rays will never even hit the volume each frame to begin a traversal. Instead, they will immediately hit the sky and terminate. This greatly increases performance.

It is possible to optimize further by turning the 250 x 3 x 250 grid into an octree. This data structure allows rays to skip over large empty spaces of the grid that have no voxels occupying them. However, I was not able to work out a proper octree implementation in time. I got close but it ended up costing too many bytes. The demo runs well enough without an octree, but if I were to increase the height of the grid a bit more or remove many of the randomly placed voxels (which would open up lots of empty space), then I would definitely need to implement an octree in order for the application to continue running smoothly.

Now let's talk about noise. Noise is a side effect of ray tracing that occurs naturally when we sample hundreds of thousands, to even millions of rays, in order to approximate lighting and shadows. It is unavoidable but can be removed. Professionally made triple A games that have ray tracing deal with it by using special denoising filters and other techniques. I did not have that luxury here due to the 13 kilobyte constraint, but was able to at least make it a bit easier on the eyes by turning it into blue noise.

Normally you would generate blue noise by sampling from a texture file. But that was immediately eliminated as an option due to the size of texture files. For example, I had a little 128 x 128 blue noise PNG that was 40 KB on its own. We must instead generate the noise pattern ourselves in the code. Here is a short blue noise function in GLSL that introduces randomness in a structured way to make the noise that appears in shadows more pleasant to look at when the camera moves.

It does this by hashing the UV pixel coordinates, creating a pseudo-random value for each pixel that is consistent but appears randomized across the screen.

Fun fact. Blue noise is easier on the human eye (compared to other common noise patterns like white noise) because the distribution of photoreceptors in our eyes have a pattern often compared to blue noise. They help us process visual information in ways that reduce the perception of visual noise and makes the world around us appear smooth and continuous. I thought that was interesting.



I’ll end it here. If you want to know anything more, then let me know. If anyone actually read this far, then thank you!

Comments

Popular posts from this blog

the chief