December 22, 2005

Synthesis, analysis and invariants

"If people do not believe that mathematics is simple, it is only
because they do not realize how complicated life is." -- J. H. Von Neumann

One can wonder why computer vision (CV) has not benefited from the increase in computational power as much as computer graphics (CG), although the two fields grew from the same seminal work (Roberts, 1965).

CG is a synthesis activity, that starts from mathematical models of some physical objects and their environment, and applies some mathematical models of light and material physics to create an image of the objects in their environments under various conditions. Better models, and computational power directly translate into more realistic images. The models are designed by humans.

Computer vision is an analysis activity, that seeks to form models of objects, environments, situations, activities, etc., from images (still pictures or video streams). Depending on their intended purpose, these models are not necessarily of the same nature as those used in graphics. The task in CV is of a very different nature from that of CG, and does not benefit directly from traditional mathematics.

Mathematics allow to manipulate invariants to study their properties and produce other invariants. Only specific branches of mathematics are concerned with extracting invariants from observations. "Machine learning" does give interesting results when applied to analytical (classification/recognition) tasks for example in speech recognition and CV.

However much remains to be done to completely formalize and understand the processes by which meaningful invariants are learned, recognized and used in cognitive activities. One principle that seems important is that an observer can never make the same observation twice: both the environment and the observer keep evolving. For example, mathematical models enforce that the same processing of the same picture will always give the same result no matter how many times it is ran. Intuitively, a human presented with the same picture several times will have a different experience each time, and maybe will notice different things in the picture. On the other hand, when presented with two pictures of the same object or person, even in totally different circumstances, a human will recognize that the object or person depicted is the same physical entity. The same goes with any perceptual/cognitive task.

To come back to the original question and the quote, it seems that a possible reason for CG's success is that mathematical tools apply directly to the problem. A possible reason for CV's apparent lag is that mathematical tools that address directly the roots of the problem are not as well developed.

No comments: