## DOT3 Versus Tangent-Space Bump-Mapping

One concept which has been used often in the design of Fragment Shaders and/or Materials, is “DOT3 Bump-Mapping”. The way in which this scheme works is rather straightforward. A Bump-Map, which is being provided as one (source) texture image out of several, does not define coloration, but rather relief, as a kind of Height-Map. And it must first be converted into a Normal-Map, which is a specially-formatted type of image, in which the Red, Green and Blue component channels for each texel are able to represent floating-point values from (-1.0 … +1.0) , even though each color channel is still only an assumed 8-bit pixel-value belonging to the image. There are several ways to do this, out of which one has been accepted as standard, but then the Red, Green and Blue channels represent a Normal-Vector and its X, Y, and Z components.

The problem arises in the design of simple shaders, that this technique offers two Normal-Vectors, because an original Normal-Vector was already provided, and interpolated from the Vertex-Normals. There are basically two ways to blend these Normal-Vectors into one: An easy way and a difficult way.

Using DOT3, the assumption is made that the Normal-Map is valid when its surface is facing the camera directly, but that the actual computation of its Normal-Vectors was never extremely accurate. What DOT3 does is to add the vectors, with one main caveat. We want the combined Normal-Vector to be accurate at the edges of a model, as seen from the camera-position, even though something has been added to the Vertex-Normal.

The way DOT3 solves this problem, is by setting the (Z) component of the Normal-Map to zero, before performing the addition, and to normalize the resulting sum, after the addition, so that we are left with a unit vector anyway.

On that assumption, the (X) and (Y) components of the Normal-Map can just as easily be computed as a differentiation of the Bump-Map, in two directions. If we want our Normal-Map to be more accurate than that, then we should also apply a more-accurate method of blending it with the Vertex-Normal, than DOT3.

And so there exists Tangent-Space Mapping. According to Tangent-Mapping, the Vertex-Normal is also associated with at least one tangent-vector, as defined in model space, and a bitangent-vector must either be computed by the Vertex Shader, or provided as part of the model definition, as part of the Vertex Array.

What the Fragment Shader must next do, after assuming that the Vertex- Normal, Tangent and Bitangent vectors correspond also to the Z, X and Y components of the Normal-Map, and after normalizing them, since anything interpolated from unit vectors cannot be assumed to have remained a unit vector, is to treat them as though they formed the columns of another matrix, IF Mapped Normal-Vectors multiplied by this texture matrix, are simply to be rotated in 3D, into View Space.

(Above Corrected 07/05/2018 . )

I suppose I should add, that these 3 vectors were part of the model definition, and needed to find their way into View Space, before building this matrix. If the rendering engine supplies one, this is where the Normal Matrix would come in – once per Vertex Shader invocation.

Ideally, the Fragment Shader would perform a complete Orthonormalization of the resulting matrix, but to do so also requires a lot of GPU work in the FS, and would therefore assume a very powerful graphics card. But an Orthonormalization will also ensure, that a Transposed Matrix does correspond to an Inverse Matrix. And the sense must be preserved, of whether we are converting from View Space to Tangent-Space, or from Tangent-Space into View Space.

## Modern Photogrammetry

Modern Photogrammetry makes use of a Geometry Shader – i.e.  Shader which starts with a coarse grid in 3D, and which interpolates a fine grid of microplygons, again in 3D.

The principle goes, that a first-order, approximate 3D model provides per-vertex “normal vector” – i.e. vectors that always stand out at right angles from the 3D model’s surface in an exact way, in 3D – and that a Geometry Shader actually renders many interpolated points, to several virtual camera positions. And these virtual camera positions correspond in 3D, to the assumed positions from which real cameras photographed the subject.

The Geometry Shader displaces each of these points, but only along their interpolated normal vector, derived from the coarse grid, until the position which those points render to, take light-values from the real photos, that correlate to the closest extent. I.e. the premise is that at some exact position along the normal vector, a point generated by a Geometry Shader will have positions on all the real camera-views, at which all the real, 2D cameras photographed the same light-value. Finding that point is a 1-dimensional process, because it only takes place along the normal vector, and can thus be achieved with successive approximation.

(Edit 01/10/2017 : To make this easier to visualize. If the original geometry was just a rectangle, then all the normal vectors would be parallel. Then, if we subdivided this rectangle finely enough, and projected each micropolygon some variable distance along that vector, There would be no reason to say that there exists some point in the volume in front of the rectangle, which would not eventually be crossed. At a point corresponding to a 3D surface, all the cameras viewing the volume should in principle have observed the same light-value.

Now, if the normal-vectors are not parallel, then these paths will be more dense in some parts of the volume, and less dense in others. But then the assumption becomes, that their density should never actually reach zero, so that finer subdivision of the original geometry can also counteract this to some extent.

But there can exist many 3D surfaces, which would occupy more than one point along the projected path of one micropolygon – such as a simple sphere in front of an initial rectangle. Many paths would enter the sphere at one distance, and exit it again at another. There could exist a whole, complex scene in front of the rectangle. In those cases, starting with a coarse mesh which approximates the real geometry in 3D, is more of a help than a hindrance, because then, optimally, again there is only one distance of projection of each micropolygon, that will correspond to the exact geometry. )

Now one observation which some people might make, is that the initial, coarse grid might be inaccurate to begin with. But surprisingly, this type of error cancels out. This is because each microploygon-point will have been displaced from the coarse grid enough, that the coarse grid will finally no longer be recognizable from the positions of micropolygons. And the way the micropolygons are displaced is also such, that they never cross paths – since their paths as such are interpolated normal vectors – and so no Mathematical contradictions can result.

To whatever extent geometric occlusion has been explained by the initial, coarse model.

Granted, If the initial model was partially concave, then projecting all the points along their normal vector will eventually cause their paths to cross. But then this also defines the extent, at which the system no longer works.

But, According to what I just wrote, even the lighting needs to be consistent between one set of 2D photos, so that any match between their light-values actually has the same meaning. And really, it’s preferable to have about 6 such photos…

Yet, there are some people who would argue, that superior Statistical Methods could still find the optimal correlations in 1-dimensional light-values, between a higher number of actual photos…

One main limitation to providing photogrammetry in practice, is the fact that the person doing it may have the strongest graphics card available, but that he eventually needs to export his data to users who do not. So in one way it works for public consumption, the actual photogrammetry will get done on a remote server – perhaps a GPU farm, but then simplified data can actually get downloaded onto our tablets or phones, which the mere GPU of that tablet or phone is powerful enough to render.

But the GPU of the tablet or phone is itself not powerful enough, to do the actual successive approximation of the micropolygon-points.

I suppose, that Hollywood might not have that latter limitation. As far as they are concerned, all their CGI specialists could all have the most powerful GPUs, all the time…

Dirk

P.S. There exists a numerical approach, which simplifies computing Statistical Variance in such a way, that Variance can effectively be computed between ‘an infinite number of sample-points’, at a computational cost which is ‘only proportional to the number of sample-points’. And the equation is not so complicated.
 s = Mean(X2) - ( Mean(X) )2 
(Next)