December 28, 2005

Peer review, human nature and scientific publication

Peer review determines what gets published and what gets funded in science and technology, at least in academia. Since most (all?) institutions evaluate their researchers based on their publication and fund raising records, the defining principle of scientific life. It is what distinguishes the respected professional from the mad scientist operating unchecked in his garage/home office.

A bulletin from the Office of Management and Budget titled "Final Information Quality Bulletin for Peer Review" (www.whitehouse.gov/omb/inforeg/peer2004/peer_bulletin.pdf) gives the following definition:

"Peer review is one of the important procedures used to ensure that the quality of published information meets the standards of the scientific and technical community. It is a form of deliberation involving an exchange of judgments about the appropriateness of methods and the strength of the author's inferences. Peer review involves the review of a draft product for quality by specialists in the field who were not involved in producing the draft."


Criticisms of the peer review systemaboundd, not all unjustified, but nobody has been able to come up with something better so far...

Academic success is "all about how you choose your peers." Nowadays, it is not rare for researchers to log tens of "publications" per (few) year(s) on average. Journals multiply, so do peer-reviewed conferences and workshops, for which acceptance rates have become lower than journal's, while the size of their proceedings have increased so much that nobody wants to carry the printed version back home anymore. All this is further accentuated by the increase in the number of researchers.

Does the peer review system scale up? When the pool of peers grows, who selects the ones to act as reviewers? Are the best researchers likely to have a lot of time to devote to reviewing everybody else's drafts? The original purpose of peer review was to check for consistency and correctness of inference. However, peer review has become a ranking system to select the few "worthy" drafts among hundreds or thousands of submissions. Therefore the reviewers influence what will get published or funded not only on scientific merit alone, but on more subjective factors. With the increasing number of peers, the choice of the reviewers also influences the success.

Now, if reality TV has taught us one thing, it is that the best candidate to win the game according to the rules rarely wins. Successful candidates team up to eliminate underperforming individuals early, and use the high performance individuals until they become too much of a threat at which point they eliminate them. The final head-to-head confrontation involves the contestants that best understood and leveraged the human aspect of the game. Often the human (political) factors defeat the purpose of the simply stated rules. On TV it makes for (supposedly) exciting drama. In science it could make for sub-optimal progress in seemingly prosperous fields. Consider the recent scandal sparked by the revelation that Dr. Hwang Woo Suk, a leader in stem cell research, falsified some of his reported findings. The key here is not a failure of the peer review system for this specific article, but rather how he reached such a prominent position and how he got to faking data (see http://www.nytimes.com/2005/12/25/science/25clone.html?th&emctheh).

While current evaluation schemes seem to favorize quantity over quality of publications, quality assessment remains a problem. Number of citations (similar to the system used by google PageRank algorithm) is a common metric, however difficult to assess and quite easily biased (search results are a good example). Publications count, weighted by some quality evaluation measure for the publishing venue, is another metric, also quite difficult to interpret and easily biased.

The reasoning behind these methods of evaluation is a bit confusing. It seems relatively safe to hypothesize, based on the core principles of scientific publication (including peer review), that (1) successful scientists publish papers, which are frequently cited:

Evaluations seem to be based on the premises that (2) scientists who publish (a lot) and are frequently cited must be successful, which cannot be inferred from (1).

(2) is often replaced with a more "efficient" variant (i.e. easier for busy committee members to work with): (2') all scientists who publish a lot must be successful.

Somewhat of a stretch. It seems that the problem is tackled backwards. In fact, this remark suggests a different approach to evaluation, using machine learning techniques. Somebody should build graphical models that capture, in each field, the dependencies between number of publications in each relevant journal, conference, workshop, etc. and scientific success. The models could be trained on a large amount of data (the level of success for each individual used for learning could be evaluated by--what else?--a peer review system).

A common over generalization of (1) is that all successful scientists publish a lot and are frequently cited, which suggests that scientists who do not publish a lot and/or a not frequently cited must be bad.

This is clearly wrong: before publishing anything, all great scientists were obviously doing good research. In fact, the peer review system favorizes continuation and discourages bold innovation and radical evolutions (see "We Are Sorry to Inform You..." in IEEE Computer volume 38, issue 12). This might be why great scientists keep praising "cross-disciplinarity." If anything new has a slight chance of being accepted by a given community, it should be coming from a different field.

Come to think of it, the fact that a researcher has been successful in the past does not guaranty success in the future. So maybe time should be taken into consideration?

In any case, human nature is such that there will always be a doubt whether a successful scientist is successful at doing great science, or is just successful at leading a scientific career. "Scientist" could be substituted for pretty much anything else, but it is especially ironic for scientists...

December 22, 2005

Synthesis, analysis and invariants

"If people do not believe that mathematics is simple, it is only
because they do not realize how complicated life is." -- J. H. Von Neumann

One can wonder why computer vision (CV) has not benefited from the increase in computational power as much as computer graphics (CG), although the two fields grew from the same seminal work (Roberts, 1965).

CG is a synthesis activity, that starts from mathematical models of some physical objects and their environment, and applies some mathematical models of light and material physics to create an image of the objects in their environments under various conditions. Better models, and computational power directly translate into more realistic images. The models are designed by humans.

Computer vision is an analysis activity, that seeks to form models of objects, environments, situations, activities, etc., from images (still pictures or video streams). Depending on their intended purpose, these models are not necessarily of the same nature as those used in graphics. The task in CV is of a very different nature from that of CG, and does not benefit directly from traditional mathematics.

Mathematics allow to manipulate invariants to study their properties and produce other invariants. Only specific branches of mathematics are concerned with extracting invariants from observations. "Machine learning" does give interesting results when applied to analytical (classification/recognition) tasks for example in speech recognition and CV.

However much remains to be done to completely formalize and understand the processes by which meaningful invariants are learned, recognized and used in cognitive activities. One principle that seems important is that an observer can never make the same observation twice: both the environment and the observer keep evolving. For example, mathematical models enforce that the same processing of the same picture will always give the same result no matter how many times it is ran. Intuitively, a human presented with the same picture several times will have a different experience each time, and maybe will notice different things in the picture. On the other hand, when presented with two pictures of the same object or person, even in totally different circumstances, a human will recognize that the object or person depicted is the same physical entity. The same goes with any perceptual/cognitive task.

To come back to the original question and the quote, it seems that a possible reason for CG's success is that mathematical tools apply directly to the problem. A possible reason for CV's apparent lag is that mathematical tools that address directly the roots of the problem are not as well developed.

Scientific freedom

"I have just one wish for you--the good luck to be somewhere where you are free to maintain the kind of integrity I have described, and where you do not feel forced by the need to maintain your position in the organization, or financial support, or so on, to lose your integrity. May you have that freedom."

Richard P. Feynman, "Cargo Cult Science," adapted from the Caltech commencement address given in 1974, published in "Surely you're joking, Mr. Feynman" (Norton, 1985).