A simulation study for evaluating the area under the ROC curve and the error rate in binary classifications
The area under the ROC curve (AUC) and the error rate are two important criteria designed to measure the performance of classifiers. The maximum AUC and the minimum error rate indicates the best classification. However, one cannot get the minimum error rate and the maximum AUC simultaneously under the same classifier. It is thus of interest to investigate the relationship between the AUC and the error rate. Studying the relationship between the AUC and error rate, Cortes and Mehryar (2004) have provided an expression of the expected value of the AUC for a given error rate. In this thesis, I first study the validity of the expression given by Cortes and Mehryar (2004), after that, I investigate the error rate distribution under a fixed range of AUC.My results show that Cortes and Mehryar’s expression is not valid under some specific situations, and the expected average value of AUC is always smaller than the estimate of AUC from Mote-Carlo samples. When the proportion of positive samples is not close to 0.5, the expected average value of AUC calculated by Cortes and Mehryar’s expression deviates largely from the Mote-Carlo samples of AUC. This indicates that the expression of the expected average value of AUC for given error rate may not be accurate and should be caution used. I also provide useful information for the quantiles of the error rate for given fixed range of AUC, with the proportion of positive samples varying in [0.1, 0.5].
Read
- In Collections
-
Electronic Theses & Dissertations
- Copyright Status
- In Copyright
- Material Type
-
Theses
- Authors
-
Huang, Qinhua
- Thesis Advisors
-
Todem, David
- Committee Members
-
Li, Chenxi
Lu, Qing
- Date
- 2015
- Program of Study
-
Biostatistics - Master of Science
- Degree Level
-
Masters
- Language
-
English
- Pages
- viii, 43 pages
- ISBN
-
9781339250434
1339250438