ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Main Area and Open Discussion > Living Room

Peer Review and the Scientific Process

<< < (4/47) > >>

IainB:
Very interesting - if not scary - report via motherjones, about the trial of AstraZeneca's Seroquel drug, relevant to this thread:
The Deadly Corruption of Clinical Trials
This is just an excerpt (my emphasis):
...Yet the more I examined the medical and court records, the more I became convinced that the problem was worse than the Pioneer Press had reported. The danger lies not just in the particular circumstances that led to Dan's death, but in a system of clinical research that has been thoroughly co-opted by market forces, so that many studies have become little more than covert instruments for promoting drugs. The study in which Dan died starkly illustrates the hazards of market-driven research and the inadequacy of our current oversight system to detect them...

--- End quote ---

IainB:
Just provisionally copying the "Faith" discussion posts across to another (new) thread...under the name of Faith under the microscope. - though the name may change, or the thread might get moved to the Basement or expunged.

IainB:
Very relevant analysis here: Moriarty on peer review
There is compelling evidence that, across the disciplines, peer review often fails to root out science fraud. Yet even basic errors in the literature can now be extremely difficult to correct on any reasonable timescale. –Philip Moriarty, Times Higher Education, 18 April 2013

--- End quote ---

IainB:
Informative WUWT post on How a scientist becomes a con man.
Excerpt (my emphasis):
...finding that Stapel’s fraud went undetected for so long because of “a general culture of careless, selective and uncritical handling of research and data.” If Stapel was solely to blame for making stuff up, the report stated, his peers, journal editors and reviewers of the field’s top journals were to blame for letting him get away with it. The committees identified several practices as “sloppy science” — misuse of statistics, ignoring of data that do not conform to a desired hypothesis and the pursuit of a compelling story no matter how scientifically unsupported it may be.

--- End quote ---

Well worth a read, though likely to be a bit depressing if you (like me) used to place a high degree of trust in "science". Mind you, Stapel was apparently a "Sociological scientist", or something, and thus arguably not a scientist per se, as one of the commenters points out:
Paraphrasing Robert Heinlein, “Any discipline with the word ‘science’ in the name, such as ‘social science’, isn’t one.”

--- End quote ---

IainB:
Over at hunch.net the Machine Learning (Theory) blog has a cogent and useful post on the subject of peer reviews:
Representative Reviewing
6/16/2013
Tags: Conferences, Reviewing , Workshop — [email protected] 10:09 am

When thinking about how best to review papers, it seems helpful to have some conception of what good reviewing is. As far as I can tell, this is almost always only discussed in the specific context of a paper (i.e. your rejected paper), or at most an area (i.e. what a “good paper” looks like for that area) rather than general principles. Neither individual papers or areas are sufficiently general for a large conference—every paper differs in the details, and what if you want to build a new area and/or cross areas?

An unavoidable reason for reviewing is that the community of research is too large. In particular, it is not possible for a researcher to read every paper which someone thinks might be of interest. This reason for reviewing exists independent of constraints on rooms or scheduling formats of individual conferences. Indeed, history suggests that physical constraints are relatively meaningless over the long term — growing conferences simply use more rooms and/or change formats to accommodate the growth.

This suggests that a generic test for paper acceptance should be “Are there a significant number of people who will be interested?” This question could theoretically be answered by sending the paper to every person who might be interested and simply asking them. In practice, this would be an intractable use of people’s time: We must query far fewer people and achieve an approximate answer to this question. Our goal then should be minimizing the approximation error for some fixed amount of reviewing work.

Viewed from this perspective, the first way that things can go wrong is by misassignment of reviewers to papers, for which there are two easy failure modes available.

* 1. When reviewer/paper assignment is automated based on an affinity graph, the affinity graph may be low quality or the constraint on the maximum number of papers per reviewer can easily leave some papers with low affinity to all reviewers orphaned.
* 2. When reviewer/paper assignments are done by one person, that person may choose reviewers who are all like-minded, simply because this is the crowd that they know. I’ve seen this happen at the beginning of the reviewing process, but the more insidious case is when it happens at the end, where people are pressed for time and low quality judgements can become common.
An interesting approach for addressing the constraint objective would be optimizing a different objective, such as the product of affinities rather than the sum. I’ve seen no experimentation of this sort.

For ICML, there are about 3 levels of “reviewer”: the program chair who is responsible for all papers, the area chair who is responsible for organizing reviewing on a subset of papers, and the program committee member/reviewer who has primary responsibility for reviewing. In 2012 tried to avoid these failure modes in a least-system effort way using a blended approach. We used bidding to get a higher quality affinity matrix. We used a constraint system to assign the first reviewer to each paper and two area chairs to each paper. Then, we asked each area chair to find one reviewer for each paper. This obviously dealt with the one-area-chair failure mode. It also helps substantially with low quality assignments from the constrained system since (a) the first reviewer chosen is typically higher quality than the last due to it being the least constrained (b) misassignments to area chairs are diagnosed at the beginning of the process by ACs trying to find reviewers (c) ACs can reach outside of the initial program committee to find reviewers, which existing automated systems can not do.

The next way that reviewing can go wrong is via biased reviewing.

* 1. Author name bias is a famous one. In my experience it is real: well known authors automatically have their paper taken seriously, which particularly matters when time is short. Furthermore, I’ve seen instances where well-known authors can slide by with proof sketches that no one fully understands.
* 2. Review anchoring is a very significant problem if it occurs. This does not happen in the standard review process, because the reviews of others are not visible to other reviewers until they are complete.
* 3. A more subtle form of bias is when one reviewer is simply much louder or charismatic than others. Reviewing without an in-person meeting is actually helpful here, as it reduces this problem substantially.
Reviewing can also be low quality. A primary issue here is time: most reviewers will submit a review within a time constraint, but it may not be high quality due to limits on time. Minimizing average reviewer load is quite important here. Staggered deadlines for reviews are almost certainly also helpful. A more subtle thing is discouraging low quality submissions. My favored approach here is to publish all submissions nonanonymously after some initial period of time.

Another significant issue in reviewer quality is motivation. Making reviewers not anonymous to each other helps with motivation as poor reviews will at least be known to some. Author feedback also helps with motivation, as reviewers know that authors will be able to point out poor reviewing. It is easy to imagine that further improvements in reviewer motivation would be helpful.

A third form of low quality review is based on miscommunication. Maybe there is silly typo in a paper? Maybe something was confusing? Being able to communicate with the author can greatly reduce ambiguities.

The last problem is dictatorship at decision time for which I’ve seen several variants. Sometimes this comes in the form of giving each area chair a budget of papers to “champion”. Sometimes this comes in the form of an area chair deciding to override all reviews and either accept or more likely reject a paper. Sometimes this comes in the form of a program chair doing this as well. The power of dictatorship is often available, but it should not be used: the wiser course is keeping things representative.

At ICML 2012, we tried to deal with this via a defined power approach. When reviewers agreed on the accept/reject decision, that was the decision. If the reviewers disgreed, we asked the two area chairs to make decisions and if they agreed, that was the decision. It was only when the ACs disagreed that the program chairs would become involved in the decision.

The above provides an understanding of how to create a good reviewing process for a large conference. With this in mind, we can consider various proposals at the peer review workshop and elsewhere.

* 1. Double Blind Review. This reduces bias, at the cost of decreasing reviewer motivation. Overall, I think it’s a significant long term positive for a conference as “insiders” naturally become more concerned with review quality and “outsiders” are more prone to submit.
* 2. Better paper/reviewer matching. A pure win, with the only caveat that you should be familiar with failure modes and watch out for them.
* 3. Author feedback. This improves review quality by placing a check on unfair reviews and reducing miscommunication at some cost in time.
* 4. Allowing an appendix or ancillary materials. This allows authors to better communicate complex ideas, at the potential cost of reviewer time. A standard compromise is to make reading an appendix optional for reviewers.
* 5. Open reviews. Open reviews means that people can learn from other reviews, and that authors can respond more naturally than in single round author feedback.
It’s important to note that none of the above are inherently contradictory. This is not necessarily obvious as proponents of open review and double blind review have found themselves in opposition at times. These approaches can be accommodated by simply hiding authors names for a fixed period of 2 months while the initial review process is ongoing.

Representative reviewing seems like the real difficult goal. If a paper is rejected in a representative reviewing process, then perhaps it is just not of sufficient interest. Similarly, if a paper is accepted, then perhaps it is of real and meaningful interest. And if the reviewing process is not representative, then perhaps we should fix the failure modes.

Edit: Crossposted on CACM.

--- End quote ---

This is coincidentally the same website as @mouser referred to in another DC Forum discussion thread in 2006: Nice blog essays on Fixing Peer Reviews and Collaborative Research « on: 2006-09-19, 00:54:05 » - where he quoted from the hunch.net post What is missing for online collaborative research?:

I've been reading http://hunch.net/ for their take on machine learning articles but they've been posting some nice essays recently on the underlying frameworks for reviewing papers, etc.
Reviewing is a fairly formal process which is integral to the way academia is run. Given this integral nature, the quality of reviewing is often frustrating. I’ve seen plenty of examples of false statements, misbeliefs, reading what isn’t written, etc…, and I’m sure many other people have as well.

Recently, mechanisms like double blind review and author feedback have been introduced to try to make the process more fair and accurate in many machine learning (and related) conferences. My personal experience is that these mechanisms help, especially the author feedback. Nevertheless, some problems remain.

The game theory take on reviewing is that the incentive for truthful reviewing isn’t there. Since reviewers are also authors, there are sometimes perverse incentives created and acted upon. (Incidentially, these incentives can be both positive and negative.)

Setting up a truthful reviewing system is tricky because their is no final reference truth available in any acceptable (say: subyear) timespan. There are several ways we could try to get around this.
...

--- End quote ---
-mouser (September 19, 2006, 12:54 AM)
--- End quote ---

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version