In practice, the underlying probability model is unknown, and thus the CoD is not known. The need arises thus to find estimators of the CoD from i.i.d. sample data
drawn from the unknown probability model distribution. All CoD estimators considered here will be of the form
where
is one of the usual error estimators for a selected discrete prediction rule, and
is the empirical frequency estimator for the prediction error with no variables
where
and
are random variables corresponding to the number of sample points belonging to classes
and
, respectively. We assume throughout that
, that is, each class is represented by at least one sample. Note that
has the desirable property of being a universally consistent estimator of
in (5), that is,
in probability (in fact, almost surely) as
, regardless of the probability model.
The discrete prediction rule to be used with the error estimator
is the discrete histogram rule, which is the "plug-in" rule for approximating the minimum-error Bayes predictor [9]. Even though we make this choice, we remark that the methods described here can be applied to any discrete prediction rule. Given the sample data
, the discrete histogram classifier is given by
where
is the number of samples with
in bin
, and
is the number of samples with
in bin
, for
.
We review next some facts about the distribution of the random vectors
=
and
=
, which will be needed in the sequel. The variables
,
,
, and
, for
, are random variables due to the randomness of the sample data
(this is the case referred to as "full sampling" in [9]). More specifically,
is a random variable binomially distributed with parameters
, that is,
, for
, while the vector-valued random variable
is trinomially distributed with the parameter set
, that is,
for
. In addition, the vector
follows a multinomial distribution with parameters
, so that
We introduce next each of the CoD estimators considered in this paper.
3.1. Resubstitution CoD Estimator
This corresponds to the choice of resubstitution [11] as the prediction error estimator
where, for the discrete histogram predictor,
The resubstitution CoD can be written equivalently as
which reveals that
has the desirable property of being a universally consistent estimator of
in (6), that is,
in probability (in fact, almost surely) as
, regardless of the probability model.
3.2. Leave-One-Out CoD Estimator
This corresponds to the choice of the leave-one-out error estimator [12] as the prediction error estimator
where, for the discrete histogram predictor (as can be readily checked)
The leave-one-out CoD estimator provides an opportunity to reflect on the uniform choice of the empirical frequency estimator
in (8) as an estimator of
, including here. Clearly, the empirical frequency corresponds to the resubstitution estimator of
. The question arises as to whether, for the leave-one-out CoD estimator, the leave-one-out error estimator of
should be used instead. For
, we get
with the choice of the resubstitution estimator (empirical frequency), but
with the choice of leave-one-out estimator, which is a useless result. Similar problems beset other estimators of
. Hence, the empirical frequency estimator is employed here as the estimator of
for all CoD estimators.
3.3. Cross-Validation CoD Estimator
This corresponds to the choice of the cross-validation error estimator [12, 13] as the prediction error estimator. In
-fold cross-validation, sample data
is partitioned into
folds
, for
. For simplicity, we assume that
can divide
. A classifier
is designed on the training set
, and tested on
, for
. Since there are different partitions of the data into
folds, one can repeat the
-fold cross-validation
times and then average the results. Such a process leads to the
-repeated
-fold cross-validation error estimator
, given by
where
represents the
th sample point in the
th fold for the
-th repetition of the cross-validation, for
,
and
.
Based upon (17), the
-repeated
-fold cross-validation CoD estimator is defined by
In order to get reasonable variance properties, a large number of repetitions may be required, which can make the cross-validation CoD estimator slow to compute.
3.4. Bootstrap CoD Estimator
This corresponds to the use of the bootstrap [14, 15] for the prediction error estimator. A bootstrap sample
consists of
equally-likely draws with replacement from the original data
. Some sample points from the original data may appear multiple times in the bootstrap sample whereas other sample points may not appear at all. The actual proportion of times a sample point
appears in
can be written as
, for
. A predictor
may be designed on a bootstrap sample
, and tested on
, for
, where
is a sufficiently large number of repetitions (in this paper,
). Then, the basic bootstrap zero estimator is given by
The
bootstrap estimator then performs a weighted average of the bootstrap zero and resubstitution estimators
Based on (19) and (20), the
bootstrap CoD estimator is then defined as
The bootstrap CoD estimator can be very slow to compute due to the complexity of
.