Eighteen years ago in this journal, Spitzer and colleagues1 published "Quantification of Agreement in Psychiatric Diagnosis," in which they argued that a new measure, Cohen's k statistic,2 was the appropriate index of diagnostic agreement in psychiatry. They pointed out that other measures of diagnostic reliability then in use, such as the total percent agreement and the contingency coefficient, were flawed as indexes of agreement since they either overestimated the discriminating power of the diagnosticians or were affected by associations among the diagnoses other than strict agreement. The new statistic seemed to overcome the weaknesses of the other measures. It took into account the fact that raters agree by chance alone some of the time, and it only gave a perfect value if there was total agreement among the raters. Furthermore, generalizations of the simple k statistic were already available. This family of statistics could be used to assess
Shrout PE, Spitzer RL, Fleiss JL. Quantification of Agreement in Psychiatric Diagnosis Revisited. Arch Gen Psychiatry. 1987;44(2):172–177. doi:10.1001/archpsyc.1987.01800140084013
Customize your JAMA Network experience by selecting one or more topics from the list below.