doi:10.1155/2008/170497 Research Article Evaluation of Robust Estimators Applied to

We evaluated standard robust methods in the estimation of fluorescence signal in novel assays used for determining the biomolecule concentrations. The objective was to obtain an accurate and reliable estimate using as few observations as possible by decreasing the influence of outliers. We assumed the true signals to have Gaussian distribution, while no assumptions about the outliers were made. The experimental results showed that arithmetic mean performs poorly even with the modest deviations. Further, the robust methods, especially the -estimators, performed extremely well. The results proved that the use of robust methods is advantageous in the estimation problems where noise and deviations are significant, such as in biological and medical applications.


INTRODUCTION
Bioaffinity assays are used for determining the concentrations of biomolecules-analytes or antigens-of interest in several fields, such as clinical diagnostics and drug discovery. The method is based on using biological molecules of specific affinity towards the analyte for binding the analyte molecules on a surface and for labelling the analytes. In the fluorescence assays, the fluorophore label yields a measurable signal in the range of visible light proportional to the analyte concentration. In this work, the measurements have been carried out by applying the single-step ArcDia TPX assay technology [1,2]. Figure 1 illustrates the solid phase assay, where microparticles are used as a binding surface to condense the analyte molecules.
The ArcDia TPX technology has been used for the measurement of different assay types: microparticle-based assays with molecular labels (molecular measurement), assays of microparticle and nanoparticle complexes where nanoparticles are used as a labelling reagent (nanoparticle measurement), and liquid assays where the fluorochrome concentration in liquid is defined (liquid measurement) [2]. Recently, this technology has also been used for monitoring bacterial growth. In that application, the bacterial cells are captured by microparticles and labeled with a specific fluorescent-labeled antibody.
In the TPX technology the fundamental concept is twophoton excitation which allows excitation of fluorochromes to take place only in a limited focal volume, providing threedimensional resolution for the measurement. The measurement setup for particles is illustrated in Figure 2, which shows how a laser beam traps the particle and pushes it through the focal volume [1,2].
In typical assay measurements, the signals from several tens of microparticles are integrated and averaged to reduce the variance. Similarly, the fluorescence signal from the liquid measurements is sampled approximately ten times per second and integrated for several seconds. Despite this, some measurements show fairly large variance. This is due to the variance in the measurements, fluorescing dust particles in the assay solution, and so on. Different bioaffinity assays introduce different types of deviations, for example, asymmetric deviation, in which case the arithmetic mean or the traditional robust method, the median, gives a biased signal estimate. Recognizing the outliers is not a trivial task; the ad hoc-based method of choosing suitable   thresholds is troublesome since signal magnitudes vary. The approach of detecting outliers using the standard deviation as the measure of distance from the arithmetic mean or median, for example, as utilized by Koskinen et al. in [3] for similar TPX measurements, has the disadvantage of nonrobustness. In other words, the measure of distance and the point of comparison are strongly affected by the outliers. Earlier, a new method called the DER algorithm was developed and applied to similar types of measurements, but with multiple fluorescent labels [4]. It was shown to give good results in estimating the values of the standard particle measurements. In this study, we use the same Parzen windowing-based method for the calculation of the probability density functions of the measurements as in the previous study, but only for the reference case with a large number of observations. Our aim was to evaluate the standard robust estimation methods for the assays of single fluores-cent label. Using robust methods, we could avoid calculating the individual probability density functions for each measurement set which, although it gives good results, is computationally more complex than the standard robust methods. In our approach, attention is paid particularly to sample size. The ultimate goal is to decrease the number of required particle observations, that is, the length of the total integration time, while maintaining sufficient accuracy of the measurement. Since it is not possible to choose the optimal method to cover every instance as the conditions change, for example, type of contamination (outliers), some prior information and preprocessing are used. The idea is to find a link between the type of measurement and the type of outliers. In the preprocessing, some of the observations are discarded in advance as potential outliers based on their time in focus value, that is, time spent in transition through the focal volume. However, all outliers cannot be recognized by this transition time. Thus, the remaining contamination justifies the use of robust methods.
The applied robust estimates of location comprise the median, the modified trimmed mean (MTM), and Mestimators. These estimators treat outliers in three principal ways: bounding outliers' influence, smooth rejection, and hard rejection. The MTM lies in the first category and can be thought of as a robust version of the above-mentioned standard deviation approach for detecting outliers. The complete rejection of outliers is attained with redescending Mestimators. The properties of each estimator are measured through the influence function and the breakdown point, offering guidelines for choosing a suitable method or the parameters for a given problem, and explaining the estimator performance in the experimental part. The experiments include evaluation of the data through probability density estimates, estimation considering sample size, and demonstrations with repeated measurements. Since the correct parameter values are unknown, the reference points are derived from the distributions and used only in the evaluation of the estimation results. The main purpose of this study is to estimate the measurement data accurately, paying attention to sample size. The experiments comprise different types of measurements: solid phase assays with molecular labels and nanoparticle labels and liquid phase assays. Practical examples are included to further illustrate the effectiveness of the robust estimation in repeated measurements, applied also to the dynamic bacterial data.

METHODS
To be able to select suitable estimators and tune their parameters, we need to evaluate the estimator properties. On the basis of the evaluation of the characteristics, we chose to apply as the robust estimates of location the median, the modified trimmed mean (MTM), and a generalization of the maximum likelihood estimator (MLE), known as the Mestimator. In addition, a scale-estimate is needed to evaluate the scale or spread of the sample. In the following, we define the estimators and explain their properties in detail relying on the influence function (IF) and the breakdown point. The influence function is defined as follows [5]: The influence function describes the effect of infinitesimal contamination at the point x on the estimate T standardized by the mass t of the contamination [5]. IF is an asymptotic concept, where the statistic T is defined as a functional of assumed sample distribution F. Here, the standard normal distribution is used as F, that is to say, the measurement data are assumed to be composed of Gaussian distributed true signals and of a contamination part without any specific distribution. To study the robustness properties of the estimators, the influence function is quantified, providing measures such as the gross-error sensitivity (γ * ), the rejection point (ρ * ), and the asymptotic variance V (T, F). Due to severe asymmetric deviations in part of the data, attention is paid particularly to the rejection point, the point at which IF becomes zero and contamination further away does not have any influence on the estimate. The gross-error sensitivity gives the upper bound for the bias, and the asymptotic variance defines the efficiency of the estimator. Due to the local nature of the IF, it is necessary to use an additional global measure of robustness, the breakdown point (ε * ). The breakdown point is the smallest proportion of outliers which can carry the statistic over all bounds and makes the estimate totally uninformative [5,6]. In the case of the translation equivariant estimator, the value of the breakdown point is between 0 and 1/2 [7].

Modified trimmed mean
The modified trimmed mean (MTM) is based on the rejection of observations lying too far away from the sample median [8]: The MTM is represented in the form of a weighted mean, an observation (X i ) having the weight(a i ) equal to one when distance to the median is within q and otherwise having the weight zero. Here a fixed value of q is used along with scale estimation. When q is large, the estimator will resemble the arithmetic mean; when q is close to zero, the median type of behavior will be dominant. Due to the use of the median, the MTM possesses the highest possible breakdown point of 1/2.

M-estimators
The M-estimator is a generalization of MLE, and it is formed by replacing the negative log likelihood function with an even function [8][9][10]. Since MLE may be solved through mini-mization, M-estimators are usually defined through derivative functions where θ is the estimate and ψ is the derivative function identifying the M-estimator. The M-estimators applied here are Andrews' sine function (ψ sin ), skipped median (ψ sk ), and Welsch estimator (ψ wel ): The derivative function of Andrews' sine consists of one period of a sinusoidal function, where the width of the period, thus also the rejection point, is adjusted by the parameter a. Similarly, the derivative function of the skipped median is equal to zero beyond its rejection point r. Both estimators are of redescending type, that is, they have finite rejection points. The third estimator, Welsch, does not have a finite rejection point, but its IF approaches zero as shown in Figure 3. All the M-estimators have breakdown points equal to 1/2 due to an iterative solving method, where the iteration is started from the sample median [9]. The median is utilized to obtain a robust starting value and to avoid the problem of nonuniqueness in solving the redescending Mestimate [8,11]. Although estimator properties are set by fixed parameters, the required scale estimation in solving the location estimate decides how the observations are treated, for example, which observations are rejected.

Scale estimate MAD
The robust estimate of scale MAD (median of absolute deviation from median) is based on the double median [5,10]: The MAD gives the median of distances between observations and the median. Factor 1.483 is used due to the assumed normal distribution on the true signals, and again the use of the median gives a high breakdown point of 1/2. Concerning the Gaussian distribution, the MAD also has the lowest possible gross-error sensitivity among all the scale estimates [12].

Estimator properties
In the selection of the estimator parameters, the idea is to keep the influence of outliers low considering the rather large deviations in part of the data. In Figure 3(a), the influence 4 EURASIP Journal on Advances in Signal Processing Figure 3: The influence functions of mean, median, and MTM (q = 2) in (a), and the influence functions of skipped median (r = π/2), Andrews' sine (a = 1/2), and Welsch (c = 0.9) in (b). Standard normal distribution is assumed.

Mean
Median functions of the arithmetic mean, the median, and the modified trimmed mean [13] are shown. The IF of the MTM indicates the tradeoff between the mean and the median, the linearly behaving central part, while influence outside the distance q is bounded to a constant. It should be pointed out that the observations outside the distance q have an influence on the estimate despite the rejection. Selecting q equal to two yields very low influence outside distance q, while q itself has a reasonably low value. Figure 3(b) shows the influence functions of the applied M-estimators. The skipped median and Andrews' sine, representing the redescending type of Mestimators, are able to reject observations completely, that is, they have finite rejection points. The Welsch estimator does not have a finite rejection point, although its influence function approaches zero. In the case of the M-estimators, the parameters have been chosen to give a low-rejection point at the expense of the asymptotic variance, considering the larger and asymmetric deviations present in the data. The chosen parameter values and the quantified estimator properties are summarized in Table 1. The MAD is utilized as the estimate of the scale to standardize the data when applying the MTM and the M-estimators.
The asymptotic breakdown point (ε * ) is a rough measure of robustness defining the minimum proportion of outliers that makes the estimate totally uninformative. The grosserror sensitivity (γ * ) quantifies the worst influence an outlier can have, and the rejection point (ρ * ) designates the estimators' ability to totally nullify the influence of an outlier outside the given distance. Asymptotic variance (V (T, F)) describes the efficiency of the estimator, that is, low variance indicates high efficiency. In the selection of the parameters for Andrews' sine and the skipped median, the low-rejection point has been emphasized to avoid the inclusion of outliers, although this results in higher gross-error sensitivity and reduction of asymptotic efficiency. Further decreasing the parameter leads to exponential deterioration of the gross-error sensitivity and the asymptotic variance. The Welsch estimator approximately coincides with Andrews' sine according to other measures than the rejection point. The parameter value applied with the MTM corresponds to the distance of two standard deviations, estimated robustly using the MAD.
To complement the evaluation of asymptotic estimator properties, the finite sample estimator behavior was studied by forming the output distributional influence functions (ODIF) for expectation, which is closely related to the sensitivity curve [14,15]. Mainly the ODIFs for expectation were similar to the influence functions; only smoothing at discontinuities was observed. The exception was the skipped median, for which the ODIF did not vanish outside the rejection point. This indicates that the finite sample skipped median has quite strong median type properties as it only bounds the influence of outliers, instead of rejecting them.

EXPERIMENTS
Data from different types of TPX assays-molecular, nanoparticles, and liquid-were analyzed. Here, molecular and nanoparticle refer to the use of molecular and nanoparticle labels in a solid phase assay, respectively. Both assay types employ 3 μm microparticles as a solid phase. In the liquid assays, measurements are performed in the absence of microparticles. The molecular label assay data consisted of 8 datasets containing 198 to 552 particle observations. The sample consisted of BF560.7-BSA coated standard particles (Arctic Diagnostics Ltd., Turku, Finland). The data from the nanoparticle label assay of Influenza B virus consisted of 13 datasets with the number of observations ranging from 309 to 493. The liquid phase assay data consisted of 7 datasets where the number of observations recorded at 100 milliseconds intervals varied between 198 and 990. The sample was a BF560.7 fluorochrome standard solution (Arctic Diagnostics Ltd.). Additionally, bacterial growth of Staphylococcus aureus was observed by using a novel type of assay, where microparticles were used as a solid phase for binding the fluorescent-labeled bacteria. The fluorescence signal from the particles was recorded over 11.5 hours resulting in 24 datasets containing 53 to 96 particle observations each. The objective with the bacterial data was to observe the effect of robust estimation on this kind of dynamic data containing many outliers.

Calculation of reference values
Since the correct parameter value to be estimated was not known, we used the probability density estimates of the data to define the correct value as the location of the highest peak in the PDF. In addition, the distributions gave information on the nature of the measurement data in general, for example, the type of contamination. The idea of employing density estimation can be found in the DER algorithm as well, but here the approach was based on large sample size, at least 198 observations for molecular, 309 for nanoparticles, and 198 for liquid type of data. Using Parzen's method, the density estimate f N is defined as [16] where X i is the observation, N is number of observations, h is a smoothing parameter, and k is a Gaussian kernel function Equations (6) and (7) give the density estimate at location x as the mean of Gaussian distributions, where X i and h are expectations and deviation, respectively. Regardless of the Gaussian kernel, the method does not contain any assumption about the underlying distribution [17]. The smoothing parameter h was chosen subjectively. The densities were utilized only for evaluation purposes relying on large sample size and densely computed PDF. Figure 4 shows the density estimates of typical molecular data, nanoparticle data, and liquid data.
In the case of the particle measurements, a part of the outliers has been discarded, based on the time in focus. De-  spite this, outliers still remain, introducing asymmetric contamination as seen in Figures 4(a) and 4(b), whereas liquid measurements typically contain fewer deviating observations. The location of the highest peak is assumed to give the correct value, that is, the parameter to be estimated.

Evaluation of sample size
The estimation was repeated multiple times (n = 1000) for each sample size (N = 10, 20, . . . , 150) by applying the bootstrap method [18] to the original data. In total, eight molecular, thirteen nanoparticle, and seven liquid measurements were resampled with replacement to obtain a large amount of pseudosamples. Naturally, each dataset was resampled separately. Bias and root mean squared error (RMSE) were considered as measures of performance. Both measures were calculated with respect to the correct parameter given by the density estimation. To make results from measurements with different magnitudes comparable, normalization using the correct parameter was applied. This was done by dividing the bias and the error term of the RMSE with the correct parameter. After randomly selecting N observations and before performing the estimation, part of the potential outliers was discarded in advance by setting a minimum of 20 milliseconds for the time in focus. The procedure corresponds to the real measurement situation since N gives the number of measured observations, though the actual number of observations used in the estimation is usually less than N. In the experimental data, the proportion of discarded observations was approximately one third. This discarding by the time in focus is applicable only with the particle measurements, not with the liquid phase. Hence, all the measured observations of the liquid assays were used in the estimation. Figure 5 shows the bias and RMSE of the molecular measurement estimation: the results are a combination of eight data sets. Bias was formed by averaging the normalized absolute bias values given by distinct measurements.
Similarly, RMSE is the root mean square of normalized errors with respect to the correct parameter. The arithmetic mean shown for comparison differs clearly from the performance of the robust methods and has the largest bias and RMSE. The M-estimators have the lowest bias and RMSE, while MTM and median show slightly poorer performance. The Welsch estimator and Andrews' sine are the best among the M-estimators, undershooting the bias and RMSE levels of 0.02 and 0.04, respectively. Though the margins between robust methods are rather small, applying Andrews' sine or the Welsch in the estimation makes it possible to reach an RMSE of 0.05 using 70 observations, while conventional and the most simple robust method, the median, requires 110 observations. Additionally, the differences between the methods become more distinct as the sample size increases.
The results of the bootstrap analysis for the nanoparticle measurements (13 data sets) are displayed in Figure 6. The bias and RMSE are larger, approximately double, compared to the molecular data. However, the tendency of the results is similar, the Welsch estimator and Andrews' sine have the lowest bias and RMSE. Moreover, the differences between the methods are apparent, especially as sample size increases. The best performance is again obtained with the Welsch and Andrews' sine; RMSE less than 0.08 with the sample size of 150.
In Figure 7, the results are shown for the liquid measurements (7 data sets). In general, the performance of the methods is clearly better than in the previous cases; even the arithmetic mean undershoots the RMSE level of 0.025, while the robust methods reach an RMSE of 0.015 with the sample size of 90, except for the Welsch and Andrews' sine.   contamination. This was also predicted by the high asymptotic variances in Table 1. The preceding points out the tradeoff between powerful bounding or accurate exclusion of outliers and the efficiency of the estimator [19,20]. The somewhat-different behavior of the skipped median, compared to other M-estimators, can be explained by its finite sample properties given by ODIF. The outcome indicated that the finite sample skipped median had quite strong median-type properties as the ODIF for the expectation was only bounded but did not vanish outside the rejection point. The observed behavior was due to the iterative-weighted mean solution of the estimate which, in the case of the skipped median, puts a lot of weight on the previous solution, making the estimate converge to near the starting point, the median.

Time series evaluation
The advantage of the robust estimation is demonstrated further with the time series of measurements, that is, repeated measurements; a more practical example since the sample size is not pre-determined, only the measurement time. The measurement time was 10 seconds for the liquid assay and 60 seconds for the other assays. The data in Figure 8 are organized according to the type of the assay: molecular, nanoparticle, liquid assay, and bacterial application where the sample sizes were 81-128, 35-64, 99, and 53-96, respectively. In the panels, the fluorescence signals of single observations and the estimated values are shown for each repetition. In contrast to the other data where a stable estimate is desired, bacterial growth is a dynamic process. In the beginning, the signal was proportional to the number of bacteria in the assay. However, the excess of bacteria compared to the fluorescent label resulted in signal reduction after some hours (hook effect). In the panel, the horizontal axis represents a time span of about 11.5 hours. The robust estimation methods in Figure 8 were chosen on the basis of the results in the previous section ( Figures 5-7), the median for the liquid assay and Andrews' sine for the others. Estimates given by arithmetic mean are shown for comparison. To visualize the performance of the methods, estimates are represented as curves.
In the case of molecular and nanoparticle measurements, Andrews' sine gives more stable estimates, particularly in the latter case since the data contain some severe deviations. With the liquid measurements, the median provides steady estimates; only mild fluctuation is observed. Andrews' sine was also applied to the dynamic bacterial data, yielding a smooth curve following the dense clusters and clearly indicating the expected increase and decrease in the signal over time. The arithmetic mean performs much more poorly.

CONCLUSION
The goal of this study was to improve the accuracy and the repeatability of the new TPX assay technology-based measurement by decreasing the influence of the outliers in the estimation of the true signal value from measured observations. Since the true values were unknown, they were defined using the density estimates of the data having an abundant number of observations for experimental purposes. True signals, that is, the proper part of the measurement data, were assumed to be normally distributed, which is a typical assumption considering biological data. In the experimental data (molecular-labeled microparticles, nanoparticle-labeled microparticles, and liquid assay), somewhat-different types of contamination were noticed. The aim was twofold: to study whether we could estimate the true signal with a smaller number of observations leading to a shorter measurement time and to investigate the parameters of the robust methods using the influence function (IF). When applied to the solid phase measurements, introducing large and asymmetric contamination, the M-estimators showed the best performance. With the liquid data, having only mild deviations, good results were achieved using simpler robust estimators, such as the median. The robustness of the median against small proportions of contamination was also pointed out by Bickel and Frühwirth in [21]. Therefore, we propose to use Andrews' sine or the Welsch estimator in estimation with the TPX particle data, and the median with the liquid data, in the future.
Besides the IF, assessing estimator properties relied on the breakdown point. However, there are some drawbacks. First of all, the IF is an asymptotic concept and may not correspond to the finite case, as noticed with the skipped median. Secondly, the IF considers infinitesimal contamination and the breakdown point the smallest proportion of outliers making the estimate totally uninformative, that is, minimum and maximum number of outliers, respectively. Clearly, in real life, deviations in measurements lie somewhere between these two situations. Other means of assessing estimator properties are different types of approximations of the IF for an arbitrary estimator, for example, the sensitivity curve used in [21] to evaluate the effect of a single contamination point. In [22], a sensitivity curve with more outliers was introduced, but the approach is obviously computationally problematic when applied to large sample sizes.
We have shown in this study that the application of robust estimation methods complements the two-photon excited fluorescence-based assay measurement. The experiments indicated that the feasibility of the estimator depends on the characteristics of the contamination. Obviously, it il-lustrates the problem of having different types of data; the estimator can be optimal only under certain conditions. This leads to the selection of the methods and the parameters according to the nature of the deviations, for which the influence function offers a suggestive tool. Often it is difficult or impossible to exactly characterize deviations, but we have shown that even a crude division of contamination, for example, into asymmetric or mild, can help to achieve better results using robust estimation methods. Further, application of robust methods ensures more precise results when sample size is undetermined due to restricted measurement time, as was shown in Figure 8. Therefore, the use of the robust estimators is beneficial in biological and medical applications, which are inherently noisy due to the sensitivity of the measurement and the complexity of the problem.