December 28, 2005

Peer review, human nature and scientific publication

Peer review determines what gets published and what gets funded in science and technology, at least in academia. Since most (all?) institutions evaluate their researchers based on their publication and fund raising records, the defining principle of scientific life. It is what distinguishes the respected professional from the mad scientist operating unchecked in his garage/home office.

A bulletin from the Office of Management and Budget titled "Final Information Quality Bulletin for Peer Review" (www.whitehouse.gov/omb/inforeg/peer2004/peer_bulletin.pdf) gives the following definition:

"Peer review is one of the important procedures used to ensure that the quality of published information meets the standards of the scientific and technical community. It is a form of deliberation involving an exchange of judgments about the appropriateness of methods and the strength of the author's inferences. Peer review involves the review of a draft product for quality by specialists in the field who were not involved in producing the draft."


Criticisms of the peer review systemaboundd, not all unjustified, but nobody has been able to come up with something better so far...

Academic success is "all about how you choose your peers." Nowadays, it is not rare for researchers to log tens of "publications" per (few) year(s) on average. Journals multiply, so do peer-reviewed conferences and workshops, for which acceptance rates have become lower than journal's, while the size of their proceedings have increased so much that nobody wants to carry the printed version back home anymore. All this is further accentuated by the increase in the number of researchers.

Does the peer review system scale up? When the pool of peers grows, who selects the ones to act as reviewers? Are the best researchers likely to have a lot of time to devote to reviewing everybody else's drafts? The original purpose of peer review was to check for consistency and correctness of inference. However, peer review has become a ranking system to select the few "worthy" drafts among hundreds or thousands of submissions. Therefore the reviewers influence what will get published or funded not only on scientific merit alone, but on more subjective factors. With the increasing number of peers, the choice of the reviewers also influences the success.

Now, if reality TV has taught us one thing, it is that the best candidate to win the game according to the rules rarely wins. Successful candidates team up to eliminate underperforming individuals early, and use the high performance individuals until they become too much of a threat at which point they eliminate them. The final head-to-head confrontation involves the contestants that best understood and leveraged the human aspect of the game. Often the human (political) factors defeat the purpose of the simply stated rules. On TV it makes for (supposedly) exciting drama. In science it could make for sub-optimal progress in seemingly prosperous fields. Consider the recent scandal sparked by the revelation that Dr. Hwang Woo Suk, a leader in stem cell research, falsified some of his reported findings. The key here is not a failure of the peer review system for this specific article, but rather how he reached such a prominent position and how he got to faking data (see http://www.nytimes.com/2005/12/25/science/25clone.html?th&emctheh).

While current evaluation schemes seem to favorize quantity over quality of publications, quality assessment remains a problem. Number of citations (similar to the system used by google PageRank algorithm) is a common metric, however difficult to assess and quite easily biased (search results are a good example). Publications count, weighted by some quality evaluation measure for the publishing venue, is another metric, also quite difficult to interpret and easily biased.

The reasoning behind these methods of evaluation is a bit confusing. It seems relatively safe to hypothesize, based on the core principles of scientific publication (including peer review), that (1) successful scientists publish papers, which are frequently cited:

Evaluations seem to be based on the premises that (2) scientists who publish (a lot) and are frequently cited must be successful, which cannot be inferred from (1).

(2) is often replaced with a more "efficient" variant (i.e. easier for busy committee members to work with): (2') all scientists who publish a lot must be successful.

Somewhat of a stretch. It seems that the problem is tackled backwards. In fact, this remark suggests a different approach to evaluation, using machine learning techniques. Somebody should build graphical models that capture, in each field, the dependencies between number of publications in each relevant journal, conference, workshop, etc. and scientific success. The models could be trained on a large amount of data (the level of success for each individual used for learning could be evaluated by--what else?--a peer review system).

A common over generalization of (1) is that all successful scientists publish a lot and are frequently cited, which suggests that scientists who do not publish a lot and/or a not frequently cited must be bad.

This is clearly wrong: before publishing anything, all great scientists were obviously doing good research. In fact, the peer review system favorizes continuation and discourages bold innovation and radical evolutions (see "We Are Sorry to Inform You..." in IEEE Computer volume 38, issue 12). This might be why great scientists keep praising "cross-disciplinarity." If anything new has a slight chance of being accepted by a given community, it should be coming from a different field.

Come to think of it, the fact that a researcher has been successful in the past does not guaranty success in the future. So maybe time should be taken into consideration?

In any case, human nature is such that there will always be a doubt whether a successful scientist is successful at doing great science, or is just successful at leading a scientific career. "Scientist" could be substituted for pretty much anything else, but it is especially ironic for scientists...

No comments: