Modern Photogrammetry

Modern Photogrammetry makes use of a Geometry Shader – i.e.  Shader which starts with a coarse grid in 3D, and which interpolates a fine grid of microplygons, again in 3D.

The principle goes, that a first-order, approximate 3D model provides per-vertex “normal vector” – i.e. vectors that always stand out at right angles from the 3D model’s surface in an exact way, in 3D – and that a Geometry Shader actually renders many interpolated points, to several virtual camera positions. And these virtual camera positions correspond in 3D, to the assumed positions from which real cameras photographed the subject.

The Geometry Shader displaces each of these points, but only along their interpolated normal vector, derived from the coarse grid, until the position which those points render to, take light-values from the real photos, that correlate to the closest extent. I.e. the premise is that at some exact position along the normal vector, a point generated by a Geometry Shader will have positions on all the real camera-views, at which all the real, 2D cameras photographed the same light-value. Finding that point is a 1-dimensional process, because it only takes place along the normal vector, and can thus be achieved with successive approximation.

(Edit 01/10/2017 : To make this easier to visualize. If the original geometry was just a rectangle, then all the normal vectors would be parallel. Then, if we subdivided this rectangle finely enough, and projected each micropolygon some variable distance along that vector, There would be no reason to say that there exists some point in the volume in front of the rectangle, which would not eventually be crossed. At a point corresponding to a 3D surface, all the cameras viewing the volume should in principle have observed the same light-value.

Now, if the normal-vectors are not parallel, then these paths will be more dense in some parts of the volume, and less dense in others. But then the assumption becomes, that their density should never actually reach zero, so that finer subdivision of the original geometry can also counteract this to some extent.

But there can exist many 3D surfaces, which would occupy more than one point along the projected path of one micropolygon – such as a simple sphere in front of an initial rectangle. Many paths would enter the sphere at one distance, and exit it again at another. There could exist a whole, complex scene in front of the rectangle. In those cases, starting with a coarse mesh which approximates the real geometry in 3D, is more of a help than a hindrance, because then, optimally, again there is only one distance of projection of each micropolygon, that will correspond to the exact geometry. )

Now one observation which some people might make, is that the initial, coarse grid might be inaccurate to begin with. But surprisingly, this type of error cancels out. This is because each microploygon-point will have been displaced from the coarse grid enough, that the coarse grid will finally no longer be recognizable from the positions of micropolygons. And the way the micropolygons are displaced is also such, that they never cross paths – since their paths as such are interpolated normal vectors – and so no Mathematical contradictions can result.

To whatever extent geometric occlusion has been explained by the initial, coarse model.

Granted, If the initial model was partially concave, then projecting all the points along their normal vector will eventually cause their paths to cross. But then this also defines the extent, at which the system no longer works.

But, According to what I just wrote, even the lighting needs to be consistent between one set of 2D photos, so that any match between their light-values actually has the same meaning. And really, it’s preferable to have about 6 such photos…

Yet, there are some people who would argue, that superior Statistical Methods could still find the optimal correlations in 1-dimensional light-values, between a higher number of actual photos…

One main limitation to providing photogrammetry in practice, is the fact that the person doing it may have the strongest graphics card available, but that he eventually needs to export his data to users who do not. So in one way it works for public consumption, the actual photogrammetry will get done on a remote server – perhaps a GPU farm, but then simplified data can actually get downloaded onto our tablets or phones, which the mere GPU of that tablet or phone is powerful enough to render.

But the GPU of the tablet or phone is itself not powerful enough, to do the actual successive approximation of the micropolygon-points.

I suppose, that Hollywood might not have that latter limitation. As far as they are concerned, all their CGI specialists could all have the most powerful GPUs, all the time…


P.S. There exists a numerical approach, which simplifies computing Statistical Variance in such a way, that Variance can effectively be computed between ‘an infinite number of sample-points’, at a computational cost which is ‘only proportional to the number of sample-points’. And the equation is not so complicated.

s = Mean(X2) - ( Mean(X) )2


If somebody would like to implement such a Geometry Shader, here are a few caveats, illustrated with some of my own, personal pseudocode.

In reality, rather than performing a successive approximation, because we are using a GPU, we have the luxury of simplifying the problem a tad. The interpolated normal-vector needs to be normalized again – as usual – after which to scalar-multiply it by a negative distance works as well as by a positive one. Thus, if we wanted the minimum and maximum displacement to be (-Dmax, +Dmax), then the actual function to determine the multiplier with this unit vector would be,

M(step, tess) := 2.0 * Dmax * step / tess - Dmax;

Since it would be decided, to tessellate the initial mesh geometry by some factor to arrive at a number of discrete micropolygons, that same tessellation factor could also be used for the displacement-trials for each micropolygon.

Further, the real-world assumption should be that the data is always slightly askew, meaning that although possible, a single, perfect solution is improbable. The algorithm should keep a record of the best-candidate displacement Do and the best-candidate deviation So. Then, if the currently-tried displacement D produces a deviation S which is 10% smaller than So, the displacement whose absolute is closer to the original geometry should should prevail. Thus, we would obtain:


Probe() {
  step = 0;
  Do = 0.0;
  So = SomeRidiculouslyHighNumber;
  StepO = tess / 2;
  Vis = 0;

  do {
    D = M(step, tess);
    S = FindVariance( D ... );

    if ((S < So * 0.9) ||
      (S + So == 0.0 ||
        abs(S - So) / (S + So) < 0.05264 ) &&
      abs(D) < abs(Do) ) ) {
        Do = D;
        So = S;
        StepO = step;
        Vis = 255;

  } while (step <= tess) ;


Another question which people might justifiably ask would be, ‘How is this GS supposed to output its results?’

The scheme I have suggested is, that this GS like any other GS, generates a series of 3D coordinates in model-space, which can be transformed into screen-space for any of the cameras, which have provided a photograph, using a matrix. But, using a matching matrix, the GS actually transforms these coordinates into texture-coordinates, to sample each provided photo as a texture instead.

In theory the same model-view matrix can be used, if the following facts are considered:

  • Screen Output Coordinates are floating-point values from ( -1.0 … +1.0 ) while Texture Sampling Coordinates have a range from [ 0.0 … + 1.0 ) , to cover the same surface. Hence, the output from a unified matrix would need to be given an additional, constant transformation.
  • The coordinate space is a projection space either way. This means that the 4th (W) vector-element does not stay equal to 1.0, and the actual, linear 3D coordinates need to be divided by the Z coordinate at some point in time, to arrive at surface coordinates that are tangent functions of former angles. This means, that the algorithm inherent in the natural rendering system, needs to be known to write the final shader code, in order for the behavior of Texture Sampling Coordinates truly to match. Some of us are not 100% clear what algorithm our rendering system applies.
  • More specifically, it does not follow from the rules of Physics, whether the rendering engine divides the vector by its own (3rd) Z-component, or by its own (4th) W-component. It does either invisibly to most Devs, when we tell the engine just to accept our vector, the way the provided matrix formats it, as the position-vector of a vertex, to the Fragement Shader stage. If it naturally divides by (Z), then (W) is being used to set up the Z-Buffer, while if it divides everything by (W), then (Z) is actually being used to set up the Z-buffer. Either way, some unadulterated record of (Z) must be passed to the rasterizer downstream, because the interpolations that take place on their way to the Fragment Shader are corrected for perspective, using ‘real Z’. Due to the versatility of Linear Algebra, the difference between these two scenarios is considered trivial. Two rows can simply be switched, without changing how Linear Algebra works in any way. But when a render-target needs to double as a texture, the difference is critcal, because the Dev needs to duplicate whatever is being used, in his own shader code. I have encapsulated this step in a function I simply named ‘Clip()’.
  • The Model-View Matrix simply transforms all coordinates ‘into View Space’, which means that they are still Linear and three-dimensionally correct, only that they have the camera-position at their center, instead of the model-position. Here, the 4th, (W) component stays equal to 1.0 . The Model-View-Projection Matrix is provided by any one rendering engine, and is used in practice, to format the coordinates, so that the rasterizer can both make 2D screen coordinates out of them, and determine what the fractional Z-buffer-value should be, from [ 0.0 … 1.0 ) , 0.0 corresponding to (Real-Z == clip-near), and 1.0 corresponding either to (Real-Z == clip-far) or (Real-Z -> Infinity) . Many game-devs are not told how to do more, and the difference between the two vectors generated is slight. The actual (X) and (Y) components generated, are the same, unless we have also specified a non-default viewing arc, which could be wider or narrower. Because of this, (X) and (Y) will simply be scaled differently by Model-View-Projection, from how it is by Model-View…


However, just as any vertex is usually U,V-mapped to an actual Texture, each of the vertices of the original mesh should also be U,V-mapped, to a 2D image, which is to become the height-map of the results of this Geometry Shader.

After completing its computations for the current Point, the GS substitutes the interpolated U,V-coordinates as its screen-position, to render the Point to, as a point-sprite. And the pixel-value to render, will have to become

int Pixel.r = StepO * 255 / tess; Pixel.a = Vis;

For the sake of argument, a subsequent Fragment Shader stage can rasterize this point-sprite and fill in the output, height-map.

So then, the only question which remains will be, how the GS manages to interpolate the per-vertex attributes, which were fed in by the Vertex Shader. In this capacity, the GS will have to act as a kind of Tessellator, which runs its own systematic loop, nested 2 deep, with each Point it aims to output as one iteration.


mat4x4 Tex2Screen =
  {{2.0, 0.0,  0.0, -1.0},
   {0.0, 2.0,  0.0, -1.0},
   {0.0, 0.0, -1.0,  0.0},
   {0.0, 0.0,  0.0,  1.0}};

main() {
  for (int i = 0; i <= tess; i++) {
    for (int j = 0; j <= i; j++) {
      Vertex1Weight = float(tess - i)/tess;
      Vertex2Weight = float(j)/tess;
      Vertex3Weight = float(i - j)/tess;

      Position = Vertex1.Position * Vertex1Weight;
      Position += Vertex2.Position * Vertex2Weight;
      Position += Vertext3.Position * Vertex3,Weight;

      Normal = Vertex1.Normal * Vertex1Weight;
      Normal += Vertex2.Normal * Vertex2Weight;
      Normal += Vertex3.Normal * Vertex3Weight;

      HeightMapUV = Vertex1.uv * Vertex1Weight;
      HeightMapUV += Vertex2.uv * Vertex2Weight;
      HeightMapUV += Vertex3.uv * Vertex3Weight;

//  Actual Geometry Shader code begins below here:



      ViewPosition =
        Vec4(Vec3(HeightMapUV, 1.0) *
          (clip-near + 2.5), 1.0);

      Position = ViewPosition *
        Tex2Screen * ProjectionMatrix;
      //  Output Pixel();



Now, A rendering device with a powerful GPU would not need to do the derivation of this output Heightmap-Value, only to follow it, to find the derived micropolygon-position of each Point. This means that the model on the rendering device needs to be U,V-mapped exactly corresponding to this example, but to the image as one of its textures, which the above example rendered to.

Contrarily, for a rendering device with a low-powered GPU, the output from the above example needs to be processed by a CPU-based program, which decides an arbitrarily modest value for (tess), and which turns the micropolygons into vertices, stored in a vertex array in the more traditional way, so that no GS would be needed to render that.

Obviously, values for (tess) below 2 will not work properly, because they lack an effort to test (Do == 0.0) for suitability, only outputting the possibilities of (Do == -Dmax , Do == +Dmax) . So their results should be regarded as always inferior to the original geometry. The use of even-numbered values for (tess) is not absolutely necessary, especially in the case of higher values, but at least the even-numbered ones will produce (Do == 0.0) as one of their trials.

Should the reader need to output something other than ‘point-sprites’, let us say because he has found a better way to translate the height-map into an actual model geometry, based on GPU, then the best alternative I could suggest would be a GS output topology of ‘triangle-strip’, since at least that follows the logic of my tessellation loops partially. Yet then, the loops above need to be modified, to produce one triangle-strip for every value of (i) greater than zero, and then additional vertex-weights for (i-1) must be computed for every iteration. This exercise would be a bit beyond the scope of the present posting, but still possible… In practice that would require that a function be defined, which sets only 3 vertex-weight-variables, given parameters that state (i) and (j), but which is reused 3 times, starting from (j == 1) within each triangle strip, after which the simpler operation can be repeated, to fetch a height-map texel, displace a Point, and then Emit that Point according to true geometry.


SetWeights(i2, j2) {
      Vertex1Weight = float(tess - i2)/tess;
      Vertex2Weight = float(j2)/tess;
      Vertex3Weight = float(i2 - j2)/tess;;

//  Actual Geometry Shader code begins below here:

FollowHeightMap() {

      /*  Todo

      (Blend Position)
      (Blend Normal-Vector)
      (Blend U,V-heightmap coordinates)
      (Fetch former StepO, Vis)
      (Compute Displacement using M() )
      (Displace Position)
      (Transform into Screen-Space)



RenderHeightMap() {
  for (int i = 1; i <= tess; i++) {
    for (int j = 1; j <= i; j++) {

      SetWeights(i, j-1);

      SetWeights(i-1, j-1);

      SetWeights(i, j);



Interpolated normal-vectors are best just passed through.


Finally, to compute Variance (required earlier):


Mat4x4 Screen2Tex =
  {{0.5, 0.0, 0.0, 0.5},
   {0.0, 0.5, 0.0, 0.5},
   {0.0, 0.0, 1.0, 0.0},
   {0.0, 0.0, 0.0, 1.0}};

Vec2 ClipSpace(Vec4 ViewPos) {
  return Vec2(ViewPos * (1 / -ViewPos.z))

float FindVariance(float D, Vec3 Position,
    Vec3 Normal, Mat4x4 MA[],
    Texture TA[], Texture32-bit SMapA[]) {
  Vec3 Position2;
  int N = 0;
  int VisN = 0;
  float XSqA = 0.0;
  float YSqA = 0.0;
  float XA = 0.0;
  float YA = 0.0;
  Texel T;
  Texel32-bit SMapT;
  Vec2 coords;
  int Lum = 0;
  float temp = 0.0;

  while (N < NumPictures) {
    Position2 =
      Vec4(Position + (D * Normal), 1.0) *
    coords =
      ClipSpace(Position2 * Screen2Tex);

    SMapT = LoadTexel(SMapA[N], coords);

    if (Position2.z > SMapT - 0.01) {

      T = LoadTexel(TA[N], coords);

      Lum = max(1, T.r + T.g + T.b);

      //  2-channel approach:
      temp = ( float(T.r) / Lum );
      XSqA += temp * temp;
      XA += temp;

      temp = ( float(T.g) / Lum );
      YSqA += temp * temp;
      YA += temp;


  if (VisN < 2) {
    return SomeRidiculouslyHighNumber;
  } else {
    return ( ((XSqA + YSqA) / VisN) -
      ((XA / VisN) * (XA / VisN)) -
      ((YA / VisN) * (YA / VisN)) );


(Edit 01/08/2017 : )

Now, existing, commercial-grade tessellators allow the graphics specialists to specify 3 separate intervals of tessellation, one for each edge of the triangle input. Some brain-tree has figured out a way to compute 3 vertex-weights from that. I am happy for him, but consider that question to be a waste of my time.

Further, if we have s sufficiently-high specification of Shader Model, we can use PN-Triangles, as opposed to linearly-interpolated ones, where PN-Triangles have Positions according to cubic splines, and Normals according to quadratic splines. Yet, I do not see us needing them here.

However, should it be needed, I can offer an algorithm which defines 2 degrees of tessellation. One reason I did not pursue this at first, was my own idea that the order in which 3 vertices are being handed to the tessellator is unknown. Assuming that (i) defines (not(Vertex1)), and that (jMax) caps (Vertex2 / Vertex 3), the way in which the corresponding code-block would need to be modified is such:


    // Inside the (i) loop, to control the
    // (j) loop:
    int jMaxl = min(jMax, i);

      Vertex1Weight = float(iMax - i)/iMax;
      Vertex2Weight = float(i * j)/iMax/jMaxl;
      Vertex3Weight = 1.0 - Vertex1Weight - Vertex2Weight;

    // If generating a triangle-strip, append:

    if (i > jMaxl) {
      Compute(i - 1, jMaxl);


The next question a reader might ask would be,

‘My tablet has GLES 3.4 . Why am I still not able to download such a model in the best format (as a Height-Map), and why must I nevertheless download it as simplified geometry, to be rendered from a static vertex-array, befitting of GLES 2.x?’

The answer to this is the fact that given DirectX 11, which is equivalent to OpenGL 4, we should leave it up to the explicit pipeline-stages, to implement our tessellator. Doing so will also do us the favor, of getting rid of the nested ‘for()‘ loops, which many graphics cards do not have the shader language necessary, to allow.

Practical examples of this approach will have a GS which really begins from the ‘Probe()‘ or the ‘FollowHeightMap()‘ functions above, where I took care only to use ‘while()‘ clauses.

The GPUs that allow nested ‘for()‘ loops, are also the ones which support OpenCL and CUDA fully, and which support OpenGL 4.

In fact, I suspect that my function ‘FindVariance()‘ is so long, it actually needs to run as a subroutine, while the default for compiling high-level shader-code is ‘inline‘ (with conditional execution), and then the distance the ‘while()‘ clause inside ‘Probe()‘, would ask the GPU to branch back, is too large a number of GPU instructions, for which practical GPUs set limits. And this would be, because each GPU core is fed instructions as a stream with a limited buffer-size.

Does your tablet actually support GLES 4?

Also, ‘Conditional Execution’ deserves some explanation. High-Level shader languages often allow them to be written in a language that resembles C. But, actual GPU instructions are not as complex as main-CPU instructions. They work with a conditional-execution bit, that controls whether the following instructions are executed or not. So that this bit can be reset, each ‘if ()‘ instruction also has an unconditional execution bit, which tells the GPU whether to ignore the conditional bit, and always to execute the instruction.

Above, in my function ‘Probe()‘, I gave an ‘if ()‘ clause that was slightly convoluted. Shader-programmers have no guarantee that their compiler can do the job, required for such complex conditions. If not, they must unravel such conditional clauses manually, in low-level form. So, if I may express the unconditional bit with a preceding exclamation-point, this is how my example unravels:


  while (step <= tess) {
    D = M(step, tess);
    S = FindVariance( D ... );

    if ((S < So * 0.9) ||
      (S + So == 0.0 ||
        abs(S - So) / (S + So) < 0.05264 ) &&
      abs(D) < abs(Do) ) ) {
        Do = D;
        So = S;
        StepO = step;
        Vis = 255;


//  Flattens to:

  D = M(step, tess);
  S = FindVariance( D ... );

  float temp1 = 0.0;
  if (S + So == 0.0) UNSET_EXEC;
  temp1 = abs(S - So) / (S + So);
! if (temp1 < 0.05264) SET_EXEC;
  if (abs(D) >= abs(Do)) UNSET_EXEC;
! if (S < So * 0.9) SET_EXEC;
  Do = D;
  So = S;
  StepO = step;
  Vis = 255;
! if (step <= tess) SET_EXEC;
  branch Label;


I realize that this turns my earlier ‘while()‘ clause, into a ‘do ... while()‘ clause, meaning that the first pass is no longer conditional. But my human understanding tells me that (step) was initialized to zero, so that the result is the same. The compiler may not understand this, as its main reason not to give the ability to execute my conditional clause, conditionally.


Print Friendly, PDF & Email

4 thoughts on “Modern Photogrammetry”

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>