OpenGL Notes

Beyond Turing - Ray Tracing and the Future of Computer Graphics

General Features of 3D File Formats
As we discussed earlier, the general features of a 3D file format are: Encoding geometry of the 3D model Storing appearance of the 3D model Saving scene information Encoding animations 1. 3D File Formats: Encoding Geometry of the 3D Model Every 3D model has a unique geometry and the capability of encoding this geometry can be considered to be the most basic feature of a 3D file format. Every 3D file format supports this — otherwise, they wouldn’t be considered 3D file formats. There are three distinct ways of encoding surface geometry, each with their corresponding strengths and weaknesses. They are called approximate mesh, precise mesh and constructive solid geometry (CSG).

Beyond Turing - Ray Tracing and the Future of Computer Graphics

I won't cover the history of graphics but just Sprites, (rasterization) see through graphics and there was no hidden line removal. Everything appeared to me made out of wire and indeed the name given to the technology was wireframe graphics. The tally was very late 70s early 80s but in 1982 a home computer lauched that would change my world. It was the ZX spectrum from an 8-bit computer developed by a UK company Sinclair research. Here we can see the original model with rubber keys. This ZX Spectrum Plus which lauched at the end of 1984.. discuss what was going on under the hood all this time. We saw that the early games wireframe graphics are made of polygons. Polygons are themselves a collection of triangles and triangles can be represented by three points called vertices in 3d space. Each vertex contains information like its position in 3d space, as well as color, texture, and its facing. A process called rasterization simkple takes a stream of vertices and transform them into the corresponding 2d image on your screen. The rasterization algorithm actually projects triangles onto the screen. In other words, we go from a 3d representation to a 2d representation of that triangle using perspective projection. The next step in the algorithn is to full up all the pixels of the image that are covered by that 2d triangle. This rasterization algorithm is object centric because we actually start from geometry and project back to camera; creating the initial image on screen. This is then further processed, or shaded depending on elements like light sources interacting with the pixel and whatever textures are applied. What we get at end is a final color in each pixel on our screens. Programming techniques and hardware advanced but the underlying technique of creating objects out of points in 3d space and then rasterizing to the screen remains until this day; so why use this particular method? Well, first of all it's fast, very fast, allowing for over 60 frames per second. One of the major reasons why it's fast is because this is how it's always been done and if you want to get faster at doing something particular it's no real surprise when the hardware evolves to get better at that task. Graphics cards are built to rasterize so what's a problem then we've always done it this way and as you say in the first five minutes graphics have improved massively over the years. Everybody knows the meme "can it run crisis" and the reason we still say that is crisis was the last truely draw dropping moment in PC gaming but it's over 10 years old and we aren't making anywhere near the same level of advancement in graphics as we previously did. There's a couple of reasons for this but it essentially all comes down to the problems of rasterization. It's not photo realistic; it's a hack, an approximation of real life that we can make more realistic with shaddow mapping and ambient occlusion, depth-of-field, etc. but with each new technology we develop to make games look more realistic they've adding to the artistic workload with in turn is an increase in production costs and an increase in production time. This is because reasterization while also looking good got so extremely complex that artists are spending too much time on the technicalities of a project rather than art istelf.

So what if any are the alternatives to rasterization. Remember, rasterization is object centric. Tha is, as we start from the geometry and project back to the camera. One older method know as ray casting reverses this process: casting rays from the camera to the objects. this is one ray per pixel on the screen. When the ray hits an object or using the correct terminology when the objects in the scene intersect the ray, the color information of the point closest to the camera which is determined by its texture is recorded for that pixel. But let's say we have 1,000 by 1,000 pixels that is 1 million rays to be case with each ray being many lines of code. Clearly, this is very computationally expensive and the result ends up looking very flat as we can see in the case of the 1992 game "Wolfenstein 3D" which actually used this rendering method. The reason it looks flat is due to the color information simply coming from the texture, There's no lighting or shadow affecting any pixel. Going back to the beginning of this video where I mentioned NVIDIA recent Turing presentation and their short graphics history opening movie. Part of that showed this demo from 1978 "The Compleat Angler" by a gentleman named John Turner Whitted working at Bell Labs. This was the first true ray tracing demo where we can see some shading where some light is blocked by objects, reflection - where we see the image of one object reflected on another object, and refraction where light passes through transparent or semi-transparent objects. This demo took almost two weeks to render. Remember with ray casting the final pixel color on screen simply came from the testure of the first object that intersected the ray. In the case of the first demo that would be a rather flat red color. In this new demo showing two apples on a table using the ray casting algorithm the ray would send back color information of the table. In this case it would be a rather flat looking brown. But with his ray tracing algorithm, Turner introduced recursion and now when a ray hits the table surface, instead of the color information being sent to the display instantly, the ray can now generate up to three rays those being shadow, reflection, and refraction. Shadow rays are traced from the surface towards each light source and if any opaque objects (the apples in the example) intersects the surface and the light source then the surface must be in shadow. In this case we can see that the red apple is blocking a light source causing the surface to be in shadow. Zooming in we can see the ray has just hit outside of the green apples much darker shadow. So rather than the final pixel color being updated to a flat brown color like we'd get with ray casting, we instead have a shaded brown due to ray tracing. If a ray hits a reflected surface, a reflection ray as traced at the mirror reflection angle and the first object it intersects, in the case the table, will be seem in this reflection. Obviously this object can be in shadow too and in this case again we can see that it is. What's important to grasp here is that with this raytracing algorithm, the final pixel you see on the screen is a much more complete combination of object color, materials and lighting interaction resulting in a far more realistic overall. Imaging trying to get the true pixel color in a case like this using raterization hacks. If you think about it simply in the case of rasterization projecting the image to the screen from object it seems flawed from a fundamental level. Tracing rays from the camera to the object seems a far more sensible method and it is if you desire realistic looking images. The major draw backs of ray tracing however as lately obvious to you by now is of course incredibly computational expensive with each individual ray being multiple line of code and bouncing off objects to create even more rays all for one pixel. This is why up to now ray tracing has generally been done off line in render farms with single frames taking hours to render. That's a far cry from the minimum 60 frames per second requirement of high-end gaming. So perhaps wasn't surprising when back in May the Unreal Engine Star Wars demo showing real-time ray tracing was met with a high degree of cycism even with the use of extremely high-end NVIDIA supercomputers and the suvsequent months all became clear that NVIDIA was about to shift the gaming industry in a new direction and what their recent release of their Turing architecture we learned that this will be done through a combination of rasterization and ray tracing. What's been called hybrid rendering and branded as RTX. At the same time Microsoft revealed their new DXR update to DirectX for ray tracing for the DirectX 12 API. Essentially the ray tracing elements of DXR and RTX are limited to reflections and shadows, area lighting and ambient occlusion and this is for people actually seeing what the Unreal Engine Star Wars demo. In other words, this is not full scene ray tracing yet this is only ray tracing specific scene elements while the rest is still rasterized hybrid rendering. Turing's RTX relies heavily on denoising hardware and its tensor cores. Denoising is a huge part of ray tracing due to each ray being so computationally heavy. Casting a large number of rays, especially in real-time is currently out of the question with today's hardware but unless you leave a ray tracer running long enough to fill in the scene you end up with alot of unpleasant looking noise even with optimizations to help which rays are cast you still end up with alot of noise. Turing I believe only casts one or two rays per pixel in real time and such a small number of rays will leave a very noisy image guarented. Clearly for gaming purpose neither leaving the ray tracer long enough to fill in the scene or allowing the noise to exist is acceptable. The first would dramatically lower frame time and the second would drastically lower image quality. To show what I mean here is an example using RTX scene using ray traced shadows with hundreds of rays per pixel, a number which is far hight than any real-time ray tracing card is capable of today. Dropping down to only one SPP (that's one Sample Per Pixel) and more in line than Turings real-time ray tracing we see how noisy the shadows look. The magic occurs when Turing's denoising hardware gets to work. We see the image is cleaned up massively even at only one ray per pixel and the final look if very close to the ground truth with over 100 rays per pixel. NVIDIA claims that their denoising hardware can do this in under one millisecond. Impressive stuff but we can see though from this Electronic Arts seed presentation of hybrid rendering we're really still looking at a series of hacks; that's what rasterization as today is a series of hacks. One of the main reasons for implementing ray tracing in thes first place was artists trying to scale back on these hacks in order to make their lives easier and to bring down production costs. Hybrid rendering and its current implementation at least appears to be replacing one set of hacks with another and it's still only for shadows and reflections. It's also important to realize that even full scenery ray tracing still doesn't give physically perfect results. It comes closer to the real world than rasterization but it's no simulation of reality. Just like rasterization engines have to cheat to achieve reflections and refractions, a ray tracer has to cheat to get soft shadows, caustics and global illumination in order to achieve photo realism. RTX DXR will also require a massive industry wide investment from studios to adopt as a tool. Is this really worth all teh time and effort or is there a better way?

With path tracing the camera again sends rays from the camera to the scene but unlike with ray tracing which traces new rays to points of light now the ray is bounced about the objects and the scene collecting information like the color and material with each bounce. when a ray is finished bouncing the final result is taken as a sample as in samples in per pixel as previously discussed. Each sample is added to the average for the source pixel and the final pixel color you see on screen is the average of all the sample values for that pixel. Tens, hundreds, or even thousands of samples can be taken for each pixel depending on how capable your hardware is and also how long you're happy to wait. Path tracing also make use of different sizes of lights. Rather than just simple light points and this allows for free shoft shadows without the need for hacks as larger light surfaces mean softer shadows. Tracing the path of rays also allows free color bleeding, truly intense colors when a object is intensly lit and of course caustics like reflections and materials like glass and water. Path tracing also behaves similar to how light behaves in the real world. It's still not a 100% physically accurate model of course but this global illumination gets us pretty close to photo realism. Let's now take a look at this fully path traced scene inside a house with only two samples per pixel. Remember, Turing is one sample per pixel for shadows and lighting effects only. The scene is fairly obviously a room within an apartment and we can make out most of the objects but we'd never accept this during game play. When we let the denoiser get to work the image is cleaned up of the noise and we have something that looks perhaps from 2004. Doubling to four samples per pixel adn we still have a noisy mess however all scene objects are obvious and in fact it looks vastly better than two samples did. Denoising on top of the four samples however and now we getting to something we'd almost be happy to play. Doubling again to eight samples first with noise and the denoised and we again have an increase in the final image quality. It's especially noticeable at the walls in this example but you can already probably firgure that with samples it's very much going to be a case of diminishing returns. At fifteen noisy samples we can see the image taking on a more natural color. Removing the noise and we have what is a very nice-looking final render. One thousand samples again we see another improvement and overall ambience with the scene but again very much diminishing returns given it requires twenty times the horsepower. By this point the denoiser is basically removing the grain from the brick wall and finally twelve thousand samples per pixel call it the final perfect image and almost zero difference after denoising. Checking out the difference between 50 samples per pixel and 12,000 samples... well, I'll let you decide. It does depend on other factors too. I wouldn't drink from these 8 samples per pixel wine glasses