Modern Photogrammetry

Modern Photogrammetry makes use of a Geometry Shader – i.e.  Shader which starts with a coarse grid in 3D, and which interpolates a fine grid of microplygons, again in 3D.

The principle goes, that a first-order, approximate 3D model provides per-vertex “normal vector” – i.e. vectors that always stand out at right angles from the 3D model’s surface in an exact way, in 3D – and that a Geometry Shader actually renders many interpolated points, to several virtual camera positions. And these virtual camera positions correspond in 3D, to the assumed positions from which real cameras photographed the subject.

The Geometry Shader displaces each of these points, but only along their interpolated normal vector, derived from the coarse grid, until the position which those points render to, take light-values from the real photos, that correlate to the closest extent. I.e. the premise is that at some exact position along the normal vector, a point generated by a Geometry Shader will have positions on all the real camera-views, at which all the real, 2D cameras photographed the same light-value. Finding that point is a 1-dimensional process, because it only takes place along the normal vector, and can thus be achieved with successive approximation.

(Edit 01/10/2017 : To make this easier to visualize. If the original geometry was just a rectangle, then all the normal vectors would be parallel. Then, if we subdivided this rectangle finely enough, and projected each micropolygon some variable distance along that vector, There would be no reason to say that there exists some point in the volume in front of the rectangle, which would not eventually be crossed. At a point corresponding to a 3D surface, all the cameras viewing the volume should in principle have observed the same light-value.

Now, if the normal-vectors are not parallel, then these paths will be more dense in some parts of the volume, and less dense in others. But then the assumption becomes, that their density should never actually reach zero, so that finer subdivision of the original geometry can also counteract this to some extent.

But there can exist many 3D surfaces, which would occupy more than one point along the projected path of one micropolygon – such as a simple sphere in front of an initial rectangle. Many paths would enter the sphere at one distance, and exit it again at another. There could exist a whole, complex scene in front of the rectangle. In those cases, starting with a coarse mesh which approximates the real geometry in 3D, is more of a help than a hindrance, because then, optimally, again there is only one distance of projection of each micropolygon, that will correspond to the exact geometry. )

Now one observation which some people might make, is that the initial, coarse grid might be inaccurate to begin with. But surprisingly, this type of error cancels out. This is because each microploygon-point will have been displaced from the coarse grid enough, that the coarse grid will finally no longer be recognizable from the positions of micropolygons. And the way the micropolygons are displaced is also such, that they never cross paths – since their paths as such are interpolated normal vectors – and so no Mathematical contradictions can result.

To whatever extent geometric occlusion has been explained by the initial, coarse model.

Granted, If the initial model was partially concave, then projecting all the points along their normal vector will eventually cause their paths to cross. But then this also defines the extent, at which the system no longer works.

But, According to what I just wrote, even the lighting needs to be consistent between one set of 2D photos, so that any match between their light-values actually has the same meaning. And really, it’s preferable to have about 6 such photos…

Yet, there are some people who would argue, that superior Statistical Methods could still find the optimal correlations in 1-dimensional light-values, between a higher number of actual photos…

One main limitation to providing photogrammetry in practice, is the fact that the person doing it may have the strongest graphics card available, but that he eventually needs to export his data to users who do not. So in one way it works for public consumption, the actual photogrammetry will get done on a remote server – perhaps a GPU farm, but then simplified data can actually get downloaded onto our tablets or phones, which the mere GPU of that tablet or phone is powerful enough to render.

But the GPU of the tablet or phone is itself not powerful enough, to do the actual successive approximation of the micropolygon-points.

I suppose, that Hollywood might not have that latter limitation. As far as they are concerned, all their CGI specialists could all have the most powerful GPUs, all the time…


P.S. There exists a numerical approach, which simplifies computing Statistical Variance in such a way, that Variance can effectively be computed between ‘an infinite number of sample-points’, at a computational cost which is ‘only proportional to the number of sample-points’. And the equation is not so complicated.

s = Mean(X2) - ( Mean(X) )2


Continue reading Modern Photogrammetry

There are situations in which Photogrammetry won’t do the job.

In place of painting a human actor with a laser-grid, there now exists an alternative, which is called “Photogrammetry”. This is a procedure, by which multiple 2D photos from different angles, of the same subject, are combined by a computer program into an optimal 3D model.

The older photogrammetry required humans in the loop, while the newer approaches do not.

With the 3D grid-lines, a trick they use is to have their cameras take two sets of photos: First with the grid off, and then with the grid on. The second is used to make the 3D model, while the first is used to create the texture-images.

One main problem with photogrammetry is instead, that the subject must have exactly the same geometry in 3D, shared between 4, 5, 6 photos etc., depending on how high we want the level of quality to be.

Peter Cushing, for example, would need to have been standing in a circle of cameras once, that all fired at once, in order to have been recreated in “Star Wars – Rogue One”.

Instead, the stock footage consists of many 2D views, each from one perspective, each with the subject in a different pose, each with the subject bearing a different facial expression, each with his hair done slightly differently…

That type of footage tends to be the least useful for photogrammetry.

So what they probably did, was try to create a 3D model of him ‘to the best of their human ability’. And the way human vision works, that data only needs to be wrong by one iota, for the viewer ‘not to see the same person’.

Similarly, I still don’t think that a 3D Texture, as opposed to a 2D Texture, can just be photographed. 3D, Tangent-Mapped Textures need to have normal-maps generated, which derive from depth-maps, and these depth-maps tend to be the works of human Texture Artists, who Paint – yes, Paint them.

They can sometimes also be ‘off’. The Texture Artist may exaggerate certain scars or pimples that the real Actor had, and cause the 3D model not to look real – again.