In practice, the underlying probability model is unknown, and thus the CoD is not known. The need arises thus to find estimators of the CoD from i.i.d. sample data drawn from the unknown probability model distribution. All CoD estimators considered here will be of the form

where is one of the usual error estimators for a selected discrete prediction rule, and is the empirical frequency estimator for the prediction error with no variables

where and are random variables corresponding to the number of sample points belonging to classes and , respectively. We assume throughout that , that is, each class is represented by at least one sample. Note that has the desirable property of being a universally consistent estimator of in (5), that is, in probability (in fact, almost surely) as , regardless of the probability model.

The discrete prediction rule to be used with the error estimator is the discrete histogram rule, which is the "plug-in" rule for approximating the minimum-error Bayes predictor [9]. Even though we make this choice, we remark that the methods described here can be applied to any discrete prediction rule. Given the sample data , the discrete histogram classifier is given by

where is the number of samples with in bin , and is the number of samples with in bin , for .

We review next some facts about the distribution of the random vectors = and = , which will be needed in the sequel. The variables , , , and , for , are random variables due to the randomness of the sample data (this is the case referred to as "full sampling" in [9]). More specifically, is a random variable binomially distributed with parameters , that is, , for , while the vector-valued random variable is trinomially distributed with the parameter set , that is,

for . In addition, the vector follows a multinomial distribution with parameters , so that

We introduce next each of the CoD estimators considered in this paper.

### 3.1. Resubstitution CoD Estimator

This corresponds to the choice of resubstitution [11] as the prediction error estimator

where, for the discrete histogram predictor,

The resubstitution CoD can be written equivalently as

which reveals that has the desirable property of being a universally consistent estimator of in (6), that is, in probability (in fact, almost surely) as , regardless of the probability model.

### 3.2. Leave-One-Out CoD Estimator

This corresponds to the choice of the leave-one-out error estimator [12] as the prediction error estimator

where, for the discrete histogram predictor (as can be readily checked)

The leave-one-out CoD estimator provides an opportunity to reflect on the uniform choice of the empirical frequency estimator in (8) as an estimator of , including here. Clearly, the empirical frequency corresponds to the resubstitution estimator of . The question arises as to whether, for the leave-one-out CoD estimator, the leave-one-out error estimator of should be used instead. For , we get with the choice of the resubstitution estimator (empirical frequency), but with the choice of leave-one-out estimator, which is a useless result. Similar problems beset other estimators of . Hence, the empirical frequency estimator is employed here as the estimator of for all CoD estimators.

### 3.3. Cross-Validation CoD Estimator

This corresponds to the choice of the cross-validation error estimator [12, 13] as the prediction error estimator. In -fold cross-validation, sample data is partitioned into folds , for . For simplicity, we assume that can divide . A classifier is designed on the training set , and tested on , for . Since there are different partitions of the data into folds, one can repeat the -fold cross-validation times and then average the results. Such a process leads to the -repeated -fold cross-validation error estimator , given by

where represents the th sample point in the th fold for the -th repetition of the cross-validation, for , and .

Based upon (17), the -repeated -fold cross-validation CoD estimator is defined by

In order to get reasonable variance properties, a large number of repetitions may be required, which can make the cross-validation CoD estimator slow to compute.

### 3.4. Bootstrap CoD Estimator

This corresponds to the use of the bootstrap [14, 15] for the prediction error estimator. A bootstrap sample consists of equally-likely draws with replacement from the original data . Some sample points from the original data may appear multiple times in the bootstrap sample whereas other sample points may not appear at all. The actual proportion of times a sample point appears in can be written as , for . A predictor may be designed on a bootstrap sample , and tested on , for , where is a sufficiently large number of repetitions (in this paper, ). Then, the basic bootstrap zero estimator is given by

The bootstrap estimator then performs a weighted average of the bootstrap zero and resubstitution estimators

Based on (19) and (20), the bootstrap CoD estimator is then defined as

The bootstrap CoD estimator can be very slow to compute due to the complexity of .