Evaluating the incremental value of a new model: Area under the ROC curve or under the PR curve
Incremental value (IncV) evaluates the performance improvement from an existing risk model to a new model. In this paper, we compare the IncV of the area under the receiver operating characteristic curve (IncV-AUC) and the IncV of the area under the precision-recall curve (IncV-AP). Since they are both semi-proper scoring rules, we also compare them with a strictly proper scoring rule: the IncV of the scaled Brier score (IncV-sBrS). The comparisons are demonstrated via a numerical study under various event rates. The results show that the IncV-AP and IncV-sBrS are highly consistent, but the IncV-AUC and the IncV-sBrS are negatively correlated at a low event rate. The IncV-AUC and IncV-AP are the least consistent among the three pairs, and their differences are more pronounced as the event rate decreases. To investigate this phenomenon, we derive the expression of these two metrics. Both are weighted averages of the changes (from the existing model to the new one) in the separation of the risk score distributions between events and non-events. However, the IncV-AP assigns heavier weights to the changes in the higher risk group, while the IncV-AUC weighs the entire population equally. We further illustrate this point via a data example of two risk models for predicting acute ovarian failure. The new model has a slightly lower AUC but increases the AP by 48 group, the IncV-AP is a more appropriate metric, especially when the event rate is low.
READ FULL TEXT