Why predictive indexes perform less well in validation studies. Is it magic or methods?
Surgical Procedures, Operative
When prognostic indexes have been tested in a second population, they have often performed less well. Since this is believed to be inevitable, methodologic differences that may explain the discrepancies have been overlooked. Data from a prospective study of 232 patients undergoing noncardiac surgery were used to examine the effect of methodologic differences in assembly of population, postoperative surveillance, and the criteria for cardiac complications on the performance of Goldman's cardiac risk index. Our prospective population was used to simulate the methods used in Goldman's study and in three other studies using the risk index to demonstrate the potential impact of differences in population, surveillance, and outcome criteria for cardiac complications. If Goldman's detection and outcome criteria were employed and only the eligibility criteria used for assembly of the populations differed, the overall complication rates would be between 5.2% and 6.9%; and the complication rates for the different Goldman classes were similar. When both different detection strategies and different outcome criteria were used, however, important discrepancies in cardiac complication rates emerged. For example, complication rates in class 2 varied from 2% to 23%. In conclusion, important discrepancies in performance of prognostic indexes may arise from differences in surveillance strategies and definitions of outcome. With sufficient attention to methodologic consistency, the performance of predictive indexes may not inevitably deteriorate in subsequent studies.