https://www.youtube.com/watch?v=pQcC2CqReSA
In this tutorial we're going to be talking about the diffrent coordinate systems that your're going to be working with throughout the rest of the semester and this is going to require us to talk about matrices as well at least at a high level because those are what allows us to transition between these different coordinate systems.
The first coordinate system we're going to talk about is called the local coordinate system and this is sometimes called object space. Now realize that when an artist makes a 3d model, they typically use some 3d modeling software like Maya or Blender or 3d Studio Max. In the image that you see here I went to a site called Mixamo and downloaded a model and then I imported it into Maya. Now what's important to note is where this model is in relation to the point (0,0,0). In this case that point is directly between the monster's feet, but sometimes you'll see that the models actually been shifted down so that that point is directly in the middle of the model. So in summary the local coordinate system is simply the coordinate system that the model was made in.
Alright, the next coordinate system that we're going to talk about is the world coordinate system sometimes called the world space and this is the coordinate system of the virtual environment that the model is going to be in. So for example, we may want to embed this creature into some kind of virtual world but notice that the point between his feet is no longer at (0,0,0) Instead he's now been translated into the world coordinate system and if we were to view this environment from the top, you can see that these worlds can be pretty large. Now the question is how did we get the monster to go from the local coordinate system into the world coordinate system? Well, before we talk about that we need to talk about one more coordinate system. The next coordinate system we need to talk about is called the camera coordinate system and this is sometimes called the camera space or the view space. Realize that in this coordinate system everthing is relative to the cameras position and when you think of it in these terms the camera never. Instead the world moves around the camera. As an example of this concept, look at the image below. In this case we have a creature directly in front of us but because the camera's position is fixed at the origin in the camera coordinate system, the creatures position in the camera coordinate system might be something like (0,0,-10). If we were to move closer to the creature its camera space coordinates might be something like (0,0,-5). Realize that the camera is not moving forward through the environment but everything in the environment is moving around the camera. So here's the main idea, how do we transition our creature between these different coordinate systems? Well, for every model in the scene it's going to have a model matrix and what this is going to do is transform that model from object space into world space. To get that model into the camera coordinate system we're going to have a view matrix. Realize that unlike needing one model matrix per model, we're going to only need one view matrix. Now depending on who you talk to some people combine the model and View matrices into one matrix called (appropiately enough) the modelview matrix; However in this class we're going to keep this information separate because it makes it easier to think about. Now we've been using the term matrix bit we haven't talked about what a matirx is. A matrix is a mathematical structure that has the ability to translate and rotate and scale 2d and 3d points and usually we store matrices as a 4 by 4 array of floats in
Now the basic idea is that we're going to multiply each vertex of a model by a matrix and this is going to give us a new point. So for example we may take a vertex and multiply it by a rotation matrix to figure out where that vertex would be if it was rotate. The really cool thing about this that your graphics card is really efficient with matrices so we try to get everything into matrix form. Allright. now if you multiply an integer times an integer you'er going to get back an interger. Similarly if you multiply a matirx by a matrix you get back a single matrix. So the question is how could we construct the model matrix that takes us from the object space into the world space? Well image that we have a couple of different matrices: we have a translation matrix called T, a couple of different rotation matrices called R1 and R1 and a scaling matrix called S. All of these matrices might be necessary to correctly place our creature into the world. Initially you might think we would have an equation that looks like the one you see here; however, before we go on we have to have a meaningful discussion about matrix order.
Right, good, when we multiply matrices together the effects that it has on the vertex are going to occur from left to right and to show what I mean by this we have to walk through an example. if the order of operation says first rotate by 45 degrees and then do a translate by 10 units the end result is going to look like what you see here. However if the order of operations is reversed such that we do the translation first and then the rotation you can see that the results are drastically different and so the lesson learned here is that the order of matrix operations is important.
So going back to how we constructed the model matrix you can see that we actually did it backwards. Instead you'll probably want to scale first followed by a series of rotations and then finish with a translation. So the last matrix we have to talk about is called the projection matrix and it's responsibility is to take 3D data and project it into 2D space. Now we have two kinds of projection. We have orthographic projection and that means that depth doesn't really matter. Now the other kind of projection we have is a perspective projection and this is what's used to give depth to the scene. In other words as objects get further away from the camera they become smaller. Now interestingly the end result of multiplying by the projection matrix is that it gives us normalized device coordinates and if you remember we've seen this term before. These are coordinates between negative 1 and positive 1. Now the difference between an orthographic and a perspective projection can be subtle but I tried to give you a demonstration of what that would look like. The left image is the orthographic projection and the right one is perspective. Now I've purposely put a couple of red spheres in the scene so you can see the difference between these two and in orthographic projection you can see that those sphere remain the same size. However, if you look at the perspective projection on the right hand side you can see that the spheres are different sizes. In other words the spheres that are further away from the camera appears smaller. Now we're finally at the point that we can understand what's been going on in the vertex shader. The example that you see here is an old vertex that we've been using and if you remember the position information that we passed a V position had to be in the range of -1.0 to +1.0 and if you looke down here in main you can see that we've assigned vPosition variable into gl_Position with is in normalized device coordinates. Now in our new vertex shader a couple of things have changed. First of all vPosition is assumed to be in the local coordinate system (or the one that it was modeled in. Also notice that we have three different matrices here called M V and P and that's for the Model View and Projection matrices and where it gets interesting is how we use those down in main(). In this case we take vPosition and we multiply it by those three matrices and that's going to give us a new position in the normalized device coordinates In other words, in this one line of code we've taken a vertex from its local coordinate and calculated its new position in normalized deviced coordinates. So that's it. I know that was alot to digest but these are pretty important concepts so what I would recommend as always is to go back and review this material a couple of times
n this tutorial I’ll explain why OpenGL ES needs the viewport transformation matrix, and how this matrix is derived. It’s the final transformation applied to the vertices before the graphic is ready to be rendered, and having reached this stage of the OpenGL ES transformation pipeline signifies that the end is nigh. If you’ve missed the previous articles about how the transformation matrices work in OpenGL ES, please read “The OpenGL transformation matrices“. The Viewport transformation matrix is the last matrix transformation that is applied to the vertices before they are drawn on the physical screen. The incoming vertices are in NDC-space (Normalized Device Coordinates), and the viewport matrix will transform the coordinates to actual screen pixel coordinates. This means that all the vertices that will be visible in the final render are already in the range of [-1,-1,-1] to [1,1,1] – they are all contained within the unit cube. Any vertices outside this range will not be rendered.
Before we continue with the explanation, lets take a look at the OpenGL glViewport function. The definition is: void glViewport( GLint x, GLint y, GLsizei width, GLsizei height); 1 void glViewport( GLint x, GLint y, GLsizei width, GLsizei height); x,y – Specify the lower left corner of the viewport rectangle. Usually, this value would be (0,0). width, height – Specify the width and height of the viewport. If on the other hand we’d want to cover a portion of the screen, say we want our OpenGL viewport to be in the top right corner of the display, we’d pass (480,320,480,320) as arguments. This indicates that the viewport’s bottom-left corner will be at position (480,320), and the viewport’s width will be 480 pixels, and its height will be 320 pixels. This effectively sets the viewport in the top-right quarter of the screen. Now, getting back to the Viewport matrix – the reason for squashing the vertices into a space of a unit cube is done because it makes the mapping onto an arbitrary sized screen very easy. To understand this better, lets work with real numbers. Lets take the x-components of the vertices as an example. We know that if we had to look at the x-component of all the incoming vertices, they would all lie somewhere in the range between -1 and 1. (Thanks to the projection matrix). Lets work with a viewport that covers the entire screen, so the arguments passed into the glViewport() function would be (0,0,960,640). Our iPhone screen is 960 pixels wide, and 640 pixels high. So if the x-component of a vertex happens to be -1, we want it to be transformed by the Viewport matrix to be at position 0. If the x-component value is 1, then we’d want it to be transformed so it has a value of 960. If you’re following, you’ll see we are transforming the x-components into physical screen pixel values. We will do the same for the y-components as well, but in this case we’ll use the height of the screen instead of the width. The z-components are not mapped onto the screen, but rather they’re placed into the Depth Buffer (if it has been created). Placing info into the Depth buffer ensures that our objects appear the way they’re supposed to in the scene – it places the objects at the correct distance from the camera when being drawn, i.e. it places the objects in front or behind each other as we’d expect. In the following example, I’ve derived the mapping of the x-component … the y and z components follow a similar derivation and are not shown. In my derivation, I’ve named x_{w} to indicate the x value for the window, which is my screen – I use them interchangeably.
So, in summary, the vector that the NDC stage outputs is a 4D vector \begin{pmatrix} x,y,z,1 \end{pmatrix}^{ T }. The x and y components of the vector now have to be mapped onto a 2D screen, and since our screen is a standard iPhone screen with a resolution of 960×640, in landscape mode, then l_{ w }=960\, and\, h_{ w }=640, where the bottom-left coordinate is (0,0) and the top-right coordinate is (960,640). The x- and y-components are mapped from [-1,1] to the screen size, and the z-component is mapped from [-1,1] into the Depth Buffer ranging from default [0,1] where it is used to determine whether the fragment is visible or not. And by the way, concerning the depth buffer – the glDepthRange function sets the viewport matrix Z_{ min }\, and\, Z_{ max } values. The Z_{ min } and Z_{ max } values indicate the depth-ranges into which the scene will be rendered. Most applications will set these members to 0.0 and 1.0 to enable OpenGL to render to the entire range of depth values in the depth buffer. In some cases, you can achieve special effects by using other depth ranges. For instance, to render a heads-up display in a game, you can set both values to 0.0 to force OpenGL to render objects in a scene in the foreground, or you might set them both to 1.0 to render an object that should always be in the background. By default, the range of the Depth buffer is in the range of 0 to 1, where 0.0 is the closest to the viewer and 1.0 being the furtherest away. The OpenGL function glDepthRange(n,f) sets the range values of the depth range, and the function glViewport(x, y, w, h) sets the screen dimensions.