A simulation study for evaluating the area under the ROC curve and the error rate in binary classifications

The area under the ROC curve (AUC) and the error rate are two important criteria designed to measure the performance of classifiers. The maximum AUC and the minimum error rate indicates the best classification. However, one cannot get the minimum error rate and the maximum AUC simultaneously under the same classifier. It is thus of interest to investigate the relationship between the AUC and the error rate. Studying the relationship between the AUC and error rate, Cortes and Mehryar (2004) have provided an expression of the expected value of the AUC for a given error rate. In this thesis, I first study the validity of the expression given by Cortes and Mehryar (2004), after that, I investigate the error rate distribution under a fixed range of AUC.My results show that Cortes and Mehryar’s expression is not valid under some specific situations, and the expected average value of AUC is always smaller than the estimate of AUC from Mote-Carlo samples. When the proportion of positive samples is not close to 0.5, the expected average value of AUC calculated by Cortes and Mehryar’s expression deviates largely from the Mote-Carlo samples of AUC. This indicates that the expression of the expected average value of AUC for given error rate may not be accurate and should be caution used. I also provide useful information for the quantiles of the error rate for given fixed range of AUC, with the proportion of positive samples varying in [0.1, 0.5].

Read

In Collections: Electronic Theses & Dissertations

Copyright Status: In Copyright

Material Type: Theses

Authors: Huang, Qinhua

Thesis Advisors: Todem, David

Committee Members: Li, Chenxi
Lu, Qing

Date Published: 2015

Subjects: Discriminant analysis
Receiver operating characteristic curves
Classification
Methodology

Program of Study: Biostatistics - Master of Science

Degree Level: Masters

Language: English

Pages: viii, 43 pages

ISBN: 9781339250434
1339250438

A simulation study for evaluating the area under the ROC curve and the error rate in binary classifications

Full text