A framework for the unification of statistical and structural information for pattern retrieval based on local feature sets is pre-sented. We use local features constructed from coefficients of quantized block transforms borrowed from video compression which robustly preserving perceptual information under quantization. We then describe statistical information of patterns by histograms of the local features treated as vectors and similarity measure. We show how a pattern retrieval system based on the feature histograms can be optimized in a training process for the best performance. Next, we incorporate structural information description for patterns by considering decomposition of patterns into subareas and considering their feature histograms and their combinations by vectors and similarity measure for retrieval. This description of patterns allows flexible varying of the amount of statistical and structural information; it can also be used with training process to optimize the retrieval performance. The novelty of the presented method is in the integration of information contributed by local features, by statistics of feature distribution, and by controlled inclusion of structural information which are combined into a retrieval system whose parameters at all levels can be adjusted by training which selects contribution of each type of information best for the overall retrieval performance. The pro-posed framework is investigated in experiments using face databases for which standardized test sets and evaluation procedures exist. Results obtained are compared to other methods and shown to be better than for most other approaches.