• Because it corrects for chance agreement, kappa (K) is a useful statistic for calculating interrater concordance. However, K has been criticized because its computed value is a function not only of sensitivity and specificity, but also the prevalence, or base rate, of the illness of interest in the particular population under study. For example, it has been shown for a hypothetical case in which sensitivity and specificity remain constant at .95 each, that K falls from .81 to .14 when the prevalence drops from 50% to 1%. Thus, differing values of K may be entirely due to differences in prevalence. Calculation of agreement presents different problems depending on whether one is studying reliability or validity. We discuss quantification of agreement in the pure validity case, the pure reliability case, and those studies that fall somewhere between. As a way of minimizing the base rate problem, we propose a statistic for the quantification of agreement (the Y statistic), which can be related to K but which is completely independent of prevalence in the case of validity studies and relatively so in the case of reliability.
Spitznagel EL, Helzer JE. A Proposed Solution to the Base Rate Problem in the Kappa Statistic. Arch Gen Psychiatry. 1985;42(7):725–728. doi:10.1001/archpsyc.1985.01790300093012