 Research
 Open Access
DLCHI: a dictionary learningbased contemporaneous health index for degenerative disease monitoring
 Aven Samareh^{1}Email authorView ORCID ID profile and
 Shuai Huang^{1}
https://doi.org/10.1186/s1363401805388
© The Author(s) 2018
 Received: 7 August 2017
 Accepted: 13 February 2018
 Published: 5 March 2018
Abstract
Effective monitoring of degenerative patient conditions is crucial for many clinical decisionmaking problems. Leveraging the nowadays datarich environments in many clinical settings, in this paper, we propose a novel clinical data fusion framework that can build a contemporaneous health index (CHI) for degenerative disease monitoring to quantify the severity of deterioration process over time. Our framework specifically exploits the monotonic progression patterns of the target degenerative disease conditions such as the Alzheimer’s disease (AD) and articulate these patterns with a systematic optimization formulation. Further, to address the patient heterogeneity, we integrate CHI with dictionary learning to build sets of overcomplete bases to represent the personalized models efficiently. Numerical performances on two realworld applications show the promising capability of the proposed DLCHI model.
Keywords
 Dictionary learning
 Patient monitoring
 Convex optimization
 Personalized healthcare
1 Introduction
In this paper, we concern the problem of patient risk monitoring that is to characterize the trajectory over the course of progression. Although there is no universal definition of the concept “patient condition,” it has been a crucial concept in the communications between clinicians and frequently referenced by healthcare providers. Developing a precise contemporaneous longitudinal index (CHI) that can faithfully reflect the underlying patient condition across the course of the condition’s progression holds great value for facilitating a range of clinical decisionmakings. For instance, it will help early detection of patient deterioration to help reduce the number of serious incidents, i.e., it is reported that 11% of serious incidents are a function of deterioration not acted upon mainly due to the failure to recognize the sign of deterioration [1, 2]. It will also help enhance the continuity of care since a longitudinal perspective of the patient condition can be provided for clinicians and healthcare providers. Also, it may ultimately lead to development of control system engineering that can implement adaptive interventions for better healthcare management [3–6], with a global representation of the dynamic condition in evolution.
Towards this goal, technological innovations are emerging in many healthcare applications, which have given rise to a datarich environment where an abundance of longitudinal clinical measurements that reflect the degeneration of the health condition can be continuously collected. For example, to monitor the surgical site infection (SSI), daily wound measurements, such as the temperature, granularity, and distance of the wound, could be acquired to assess the condition of the wound, together with other nonwoundrelated but important clinical signals such as heart rate, morning body temperature, NG tube presence, etc. However, particular data characteristics present challenges that call for specialized data fusion models to predict patient conditions using the multivariate longitudinal data. For instance, as these multivariate longitudinal data are actually temporal realizations of an underlying disease progression in different dimensions, how to leverage our knowledge of the disease progression process to fuse the data is a challenge. Also, the fact that these data are usually sampled at irregular time points adds in another layer of complexity. And even if we could fuse the data properly, the existence of patient heterogeneity multiplies the complexity of the problem that calls for a generic framework to personalize the model based on individual’s characteristics implicitly embedded in data.
To tackle those challenges, we propose a novel framework, named as DLCHI, that focuses on a particular category of disease conditions that follow a monotonic disease progression process. In our previous work [7], we have developed a contemporaneous health index (CHI) that fuses the irregular multivariate longitudinal time series data to quantify the severity of degenerative disease conditions to fit the monotonic degradation process of the disease condition. However, CHI is designed for average user and ignores the patient heterogeneity and therefore limits their applicability in realworld applications. For example, it is known that patients of Alzheimer’s disease (AD) suffer from very diverse and heterogeneous progression processes [8–10]. A possible remedy is to build personalized model on an individual’s basis. However, this demands a great amount of labeled training samples, which are very likely not feasible in many clinical settings.
Thus, this motivates us to develop the DLCHI framework by integrating CHI with dictionary learning [8, 11]. The basic idea shared by the dictionary learning algorithms is that the input signal is approximated with a sparse linear combination of a few dictionary elements or basis [12]. DL has been used in many signal processing applications, such as signal reconstruction [13], face recognition [14], and healthcare [15, 16]. The dictionary basis provides a succinct representation that can span the space of the personalized models to capture the patient heterogeneity and reveal the hidden structures in the data (in a similar spirit as principal component analysis). It has been shown that the performance of a classification task can be improved by learning a sparsifying dictionary from the data set. [17, 18]. The reason is that the sparsifying dictionary actually plays a role in the regularization of the model learning, as the dictionary basis vectors are numerical representations of patient heterogeneity. Translating this wisdom into DLCHI, our basic idea is to first learn individual models through the CHI formulation and then reconstruct the model parameters of the learned individual models via supervised dictionary learning. Each column of the dictionary represents a basis vector. As such, each individual model is represented as a sparse linear combination of the basis vectors.
The paper is organized as follows. In Section 2, related work in the literature will be reviewed and discussed. In Section 3, the proposed analytic framework will be presented, and the corresponding computational algorithm will be derived. In Section 4, the proposed method will be implemented and validated using two realworld applications; one is for monitoring of brain health in AD and the other is monitoring of SSI. We will conclude the study in Section 5. Note that, in this paper, we use lowercase letters, e.g., x, to represent scalars, boldface lowercase letters, e.g., v, to represent vectors, and boldface uppercase letters, e.g., W, to represent matrices.
2 Related works
2.1 The CHI model
The CHI model is developed in [7] which specifically utilizes the monotonicity of disease progression to enhance the data fusion of multivariate clinical measurements taken at irregular time points. In this section, we will first briefly present the basic formulation of the CHI model and then present the DLCHI model that integrates CHI with dictionary learning for personalized models.
The CHI model was motivated by the common characteristics of many degenerative conditions such as AD which shows monotonic progression trajectory. For example, for AD, a number of biomarkers have been developed to measure the degeneration of the neural systems, including the neuroimaging modalities such as PET and MRI scans [19, 20]. It is typical to see that, along with the disease progression, the brain volumes shown in the MRI scans continue to shrink over time. The same phenomenon could be observed on the PET scans with the persistent decrease of metabolic activities. Those monotonic patterns indicate that the disease progression, once started, tends to be worse and worse.
The task of CHI is to translate multivariate longitudinal measurements into a contemporaneous health index h_{n,t} that captures patient condition changing over the course of progression. Note that different individuals could be measured with different length of time and at different time locations. As we target degenerative conditions, CHI should be monotonic, i.e., \(h_{n,t_{1}}\geq h_{n,t_{2}}\) if t_{1}≥t_{2}, if we assume that higher index represents more severe condition. Since CHI is a latent construct that is not directly measurable, clinical variables associated with it can be measured over time, which provide us data to learn it.
Denote the training set by \(\mathbf {x}_{n, t} = \left [x_{n,1,t},..., x_{n,d,t}\right ]^ T\in \mathbb {R}^{d}\) collected from N patients. Here, each measurement x_{n,i,t} is the value of the ith variable for the nth subject for a given time t, where t∈{1,…,T_{ n }} is the time index. Converting the measurements x_{n,t} into h_{n,t} needs a mathematical model for h_{n,t}=f(x_{n,t}). Here, for simplicity and interpretability, we start with the linear models, i.e., h_{n,t}=x_{n,t}·w, where w∈R^{ d } is a vector of weight coefficients to combine the d variables. Denote the total number of positive and negative samples by N^{+} and N^{−} respectively, i.e., N^{+}:={ny_{ n }=1} and N^{−}:={ny_{ n }=−1}.

The first term (1a) and second term (1b) are the SVM formulation that aims to utilize the label information to enhance the discriminatory power of CHI. Here, y_{ n }∈{1,−1} is the label of the nth sample that indicates if the nth subject is diseased or not.

The term (1c) is invented to enforce the monotonicity of the learned health index, i.e., \(h_{n,t_{1}}\geq h_{n,t_{2}}\) if t_{1}≥t_{2}. Here, z_{n,t} is the difference of two successive data vectors z_{n,t}:=x_{n,t+1}−x_{n,t}.

Items (1d) and (1f) are invented to encourage the homogeneity of CHI within the group that has the same health status. Here, \(\bar {\mathbf {x}}^{+}_{T_{n}}\) and \(\bar {\mathbf {x}}^{}_{T_{n}}\) represent the center of data vectors at time T_{ n } for all positive and negative samples, respectively, that are:$$\begin{array}{*{20}l} \bar{\mathbf{x}}^{+}_{T_{n}}:=&{\frac{1}{N^{+}}}\sum_{n\in \{ny_{n}=1\}}\mathbf{x}_{n, T_{n}}\\ \bar{\mathbf{x}}^{}_{T_{n}}:=&{\frac{1}{N^{}}}\sum_{n\in \{ny_{n}=1\}}\mathbf{x}_{n, T_{n}}. \end{array} $$

The last term, (1f), is the L_{1}norm penalty that is used to encourage sparsity of the features.
Note that the proposed formulation generalized many existing models, such as SVM, sparse SVM, LASSO, etc. The CHI model could be efficiently solved by using the block coordinate descent algorithm that is illustrated in Appendix: “CHI model formulation” section.
2.2 Dictionary learning
Developing models like CHI helps us to capture changes in various aspects of the disease trajectory. But as CHI assumes the same model for the whole population, it ignores heterogeneity of degenerative diseases and therefore limits its applicability in realworld applications that have shown great patient heterogeneity [21, 22]. Recently, it has been shown that learning a dictionary can overcome the above limitations [14, 23, 24]. The basic idea of dictionary learning algorithms is to approximate training samples as a sparse linear combination of the few dictionary elements. Hence, dictionary learning algorithm can be considered as a way to represent lowdimensional structure of highdimensional data.
DL was applied to many applications and achieved stateoftheart performances, such as image denoising [13] and inpainting [25], clustering [26, 27], classification [28, 29] etc. It is known that the conventional DL framework was designed for a reconstruction task instead of adapting to classification. It is believed that classification performance will be further improved if we carefully learn a classificationoriented dictionary. For instance, in [12] a sparse representationbased classification (SRC) method was proposed for robust face recognition and achieved very impressive results. SRC treats the original data set as a dictionary, wherein the classspecific training sets are subdictionaries contributing to discrimination. Inspired by SRC, Yang et al. proposed a metaface learning [14] to learn an adaptive dictionary for each class, and Ramirez et al. [17] added another term to derive more delicate classificationoriented dictionaries.
The use of dictionary learning for personalization of prediction models is also achieved by proposing novel transfer learning approaches. For example, in [6] personalization task was performed in two phases: learning userspecific source classifiers and learning a distributiontoclassifier mapping via implementing dictionary learning. Another approach is to perform multimodal taskdriven dictionary learning algorithm under the joint sparsity constraint to enforce collaborations among multiple homogeneous/heterogeneous sources of information. In taskdriven formulation, the multimodal dictionaries are learned simultaneously with their corresponding classifiers. The resulting multimodal dictionaries can generate discriminative sparse codes from the data that are optimized for a given task such as binary or multiclass classification [30].
There are various dictionary learning algorithms that are effective for classification tasks [31–34]. Zhang and Li proposed discriminative KSVD to simultaneously achieve a dictionary which has a good representation power while supporting optimal discrimination of the classes [33]. The name KSVD refers to updating a dictionary with K vectors. A collection of training vectors corresponding to the dictionary vector in its approximation are taken by minimizing the Frobenius norm of the approximation error by solving for the dictionary vector at each iteration. This algorithm starts with an initial dictionary and initial sparse code coefficients, and then, one dictionary vector is updated at each iteration. The corresponding sparse coefficient is changed before proceeding to update the next dictionary vector. The minimization is done through singular value decomposition (SVD). Another example is the iterative least squares dictionary learning algorithms (ILSDLA) presented in [31, 32], where assumes known sparse code coefficients at each iteration and derives the best possible dictionary using either the orthogonal matching pursuit (OMP) or Focal Underdetermined System Solver (FOCUSS). ILSDLA method deploys a second order update which makes it nearly impractical in reasonable dimensions due to its matrix inversion step. Another example is the recursive least squares dictionary learning algorithm RLSDLA, which is an online alternation of ILSDLA. In the online alternation, each training signal is processed one at a time to improve the dictionary. One of the larger challenges with ILSDLA and KSVD is to find a good initial dictionary. The online nature of RLSDLA prevents getting stuck in a local minimum close to the initial dictionary contrary to the KSVD and ILSDLA. RLSDLA uses the forgetting factor to improve the convergence properties of the algorithm and hence makes the algorithm less dependent on the initial dictionary. However, RLSDLA method requires to permute the order of training vectors and adapt the forgetting factor to satisfy the randomness and convergence properties of the online nature of the algorithm.
There are several properties that should be considered in the search for a successful dictionary training algorithm. Flexibility: The algorithm should be flexible enough to run with various sparse approximation algorithm such as pursuit algorithm which involves finding the best projections of input signal onto the span of an overcomplete dictionary D. The flexibility property would enable different choices in favor of runtime constraints. Usually, methods that are flexible enough would separate the dictionary updates with sparse coding stage. Adaptivity: An overcomplete dictionary D either can be chosen as a predetermined set of functions or designed to iteratively getting updated to better fit the data. Choosing a prespecified dictionary is appealing because it is simpler and may lead to a fast algorithm. However, the dictionary that leads to the best representation for each member in this set, under strict sparsity constraints, is needed. Such dictionaries have the potential to outperform the commonly used prespecified dictionaries. Efficiency: A dictionary learning algorithm should lead to a numerically efficient and fast convergence. For example, ILSDLA has a secondorder update which makes it nearly impractical in reasonable dimensions.
KSVD algorithm is flexible and works with any pursuit algorithm. In addition, it leads to the best representation for each training vector. Given the merits of DL in overcoming heterogeneity of models, and the classification performance, here we used the idea of DL and developed the DLCHI framework using KSVD dictionary learning algorithm. Therefore, we reconstructed our model parameter of each individual sample to be linear combination of dictionary elements. We further compared our methodology with CHI and other dictionary models KSVD, ILSDLA, and RLSDLA. Note, that DLCHI formulation is personalized and not designed for average users unlike the above methods.
3 The proposed DLCHI model
3.1 Rational and formulation
From this point of view, the dictionary learning could be viewed as a tradeoff made between two extremes. In one extreme, there is only one model for all the individuals, i.e., the “onesizefitsall” model. On the other extreme, there is one distinct model for all the individuals and these models are all independent with each other. As a tradeoff, dictionary learning exploits the dependency and difference of the individuals simultaneously.
To fulfill this idea, here, we denote the set of model parameter vectors of all the individuals as \(\mathbf {W}^{\ast }=\left [ \mathbf {w}_{1}^{\ast }\ldots,\mathbf {w}_{i}^{\ast },\ldots,\mathbf {w}_{N}^{\ast }\right ]\), where \(\mathbf {w}^{*}_{i}\) represents weight coefficient vector of the i^{ t h } patient learned from the CHI model. Using dictionary learning, we aim to find an overcomplete dictionary \(\mathbf {D}\in \mathbb {R}^{d\times k }\) that contains k independent columns referred as the basis vectors, \(\left \{ d_{i}\right \}^{k}_{i=1}\). A model parameter vector w^{∗} can be represented as a linear combination of these basis vectors, satisfying the approximation condition w^{∗}≈Da, where a is the coefficient vector which can be considered as the representation of w^{∗} over the dictionary D.
Here, ∥·∥_{0} is the l^{0} norm, counting the nonzero entries of a vector, and A=[a_{1},…,a_{ N }] is the coefficient matrix of the sparse decomposition. In order to achieve sparse representations given a set of training vectors, we adapt a dictionary that leads to the best representation for each vector in this training set, under strict sparsity constraints.
3.2 Computational algorithm
In DLCHI, we used the KSVD dictionary learning algorithm [39, 40] for sparse representation as an optimization problem, which can be efficiently solved via orthogonal matching pursuit (OMP) and singular value decomposition (SVD). The KSVD approach is an iterative procedure that consists of two steps, and both steps in the algorithm are coherent with each other, working towards the minimization of the overall objective function.
In (3) D was first fixed such that we could focus on learning the coefficient matrix A using the orthogonal matching pursuit method, as long as it could supply a solution with a fixed and predetermined number of nonzero entries T_{0}. OMP is an iterative greedy algorithm that selects the column best correlated with the residual part of the signal and represents the suboptimal solution to the problem of sparse signal representation. The major advantage of the OMP is its simplicity and fast implementation. The problem in (3) consists of N distinct problems.
Hence, we updated the \(\left  \left  \mathbf {E}_{k}\mathbf {d}_{k}\mathbf {a}_{T}^{k}\right  \right _{F}^{2}\), assuming fixed coefficients A and error E_{ k }. The constraint is over the jth orthonormal basis D_{ j }. By decomposing the multiplication DA into the sum of K rank 1 matrices, we can assume that the other K−1 terms were fixed, and the kth remains unknown. Then, the singular value decomposition finds the closest K−1 terms that approximate E_{ k }, and this will effectively minimize the error in Eq. (5).
The above solution of vector \(\mathbf {a}_{T}^{k}\) is very likely to be filled, because the sparsity constraint is not enforced. To enforce the sparsity constraint, we define ω_{ k } as the group of indices pointing to examples \(\mathbf {w}^{*}_{i}\) that use basis d_{ k } and entries of \(\mathbf {a}_{T}^{k}\left (i\right)\) that are nonzero. Thus, \(\mathbf {\omega }_{k}=\left \{ i1\leq i\leq N, \mathbf {a}_{T}^{k}\left (i\right) \neq 0 \right \} \). Then, we compute \(\mathbf {E}_{k}= \left \ \mathbf {W}^{*} \sum _{j\neq k}^{k}\mathbf {d}_{j}\mathbf {a}_{T}^{j}\right \_{F}^{2} \) by only choosing columns corresponding to ω_{ k }. We then apply the SVD decomposition \(\mathbf {E}_{k}^{R}=\mathbf {U}\Lambda \mathbf {V}^{T}\). The solution for d_{ k } is the first column of U, and the updated coefficient vector is the first column of U×Λ(1,1).
3.3 Summary of DLCHI
4 Numerical studies
4.1 Realworld applications
We implement the DLCHI model on two realworld datasets that were collected in Alzheimer’s disease (AD) and surgical site infection (SSI) research. Both diseases exhibit monotonic disease progression and significant patient heterogeneity. For the Alzheimer’s disease data, we use the FDGPET images of 162 subjects (Alzheimer’s Disease: 74, Normal aging: 88) downloaded from the ADNI (www.loni.usc.edu/ADNI). For each subject, there are at least three time points and at most seven time points. The data has been preprocessed and the Automated Anatomical Labeling has been used to segment each image into 116 anatomical volumes of interest (AVOIs). We select 90 AVOIs that are in the cerebral cortex in our study. Each AVOI becomes a variable here. The measurement data of each region, according to the mechanism of FDGPET, is the regional average FDG binding counts, representing the degree of glucose metabolism. Extensive evidences in the literature have shown that the glucose metabolism will decline as a function of the aging, while the pathology of neurodegenerative diseases such as AD will further accelerate the declination, providing a perfect application example for implementing and testing the proposed DLCHI method.
4.2 Parameter tuning and validation
For each experiment, we randomly split the data into two equal parts, one for training and one for testing. For training, we used 10fold cross validation to tune the parameters. As CHI is a complex data fusion mechanism that synthesizes monotonicity of the disease progression, label information, and statistical homogeneity, we use a comprehensive scheme to compare DLCHI with CHI. Specifically, we compared the two models (1) when only monotonicity is used for model training (i.e., by setting β=0 and optimizing for α), (2) when only the label information is used for model training (i.e., by setting α=0 and optimizing for β), and (3) when a full model is used (i.e., by optimizing for both α and β). In addition, we performed in each of the settings by randomly downsampling the training data, i.e., only using a proportion of the data ranging from 15 to 75%, to train both models. A model that can maintain good performances with less training data in obviously more promising in healthcare applications while data collection is relatively more costly than other realworld applications.
4.3 Results
AUC performance for ADNI and SSI data across different ratio of training and testing datasets obtained by 10fold crossvalidation
α=0, β^{∗}  

Data  Ratio (%)  CHI  DLCHI 
ADNI  15  0.870 ± 0.024  0.887 ± 0.021 
20  0.883 ± 0.021  0.890 ± 0.016  
35  0.889 ± 0.014  0.936 ± 0.051  
50  0.890 ± 0.031  0.940 ± 0.047  
75  0.927 ± 0.012  0.959 ± 0.036  
SSI  15  0.850 ± 0.055  0.867 ± 0.039 
20  0.861 ± 0.036  0.877 ± 0.020  
35  0.871 ± 0.012  0.886 ± 0.020  
50  0.862 ± 0.015  0.892 ± 0.041  
75  0.889 ± 0.024  0.914 ± 0.027  
α^{∗}, β=0  
ADNI  15  0.780 ± 0.016  0.863 ± 0.034 
20  0.799 ± 0.054  0.873 ± 0.024  
35  0.804 ± 0.012  0.844 ± 0.034  
50  0.818 ± 0.019  0.869 ± 0.064  
75  0.855 ± 0.064  0.905 ± 0.024  
SSI  15  0.829 ± 0.064  0.860 ± 0.023 
20  0.860 ± 0.021  0.879 ± 0.016  
35  0.870 ± 0.034  0.883 ± 0.034  
50  0.880 ± 0.042  0.892 ± 0.036  
75  0.883 ± 0.026  0.895 ± 0.016  
α^{∗}, β^{∗}  
ADNI  15  0.865 ± 0.021  0.872 ± 0.025 
20  0.871 ± 0.023  0.881 ± 0.014  
35  0.874 ± 0.032  0.890 ± 0.026  
50  0.891 ± 0.021  0.910 ± 0.041  
75  0.901 ± 0.020  0.919 ± 0.036  
SSI  15  0.741 ± 0.032  0.814 ± 0.041 
20  0.758 ± 0.034  0.820 ± 0.030  
35  0.770 ± 0.013  0.831 ± 0.036  
50  0.791 ± 0.026  0.887 ± 0.015  
75  0.806 ± 0.010  0.862 ± 0.036 
AUC performance comparison for ADNI and SSI data for CHI, DLCHI, KSVD, ILSDLA and RLSDLA models obtained by 10fold crossvalidation
Data  ADNI  SSI 

DLCHI  0.951 ± 0.025  0.902 ± 0.032 
CHI  0.920 ± 0.021  0.880 ± 0.010 
RLSDLA  0.903 ± 0.030  0.873 ± 0.065 
KSVD  0.850 ± 0.043  0.803 ± 0.014 
ILSDLA  0.723 ± 0.012  0.653 ± 0.063 
4.4 Representation capacity of dictionary learning
5 Conclusion
In this paper, we presented a DLCHI formulation to help build personalized contemporary health index (CHI) to monitor patient condition over time. Through applications on two realworld datasets of AD and SSI, the DLCHI model is shown to be better than the CHI model in patient prediction and can achieve robust results with small sample sizes. In the future, we may further enhance the DLCHI method in the following directions. First, note that, in the current DLCHI formulation, the individual models have to be learned via the CHI formulation without information from the dictionary. Only with a learned dictionary, the representations of the individual’s models are identified and further used as the final individual models. This is a possibility that a joint learning of both steps could further enhance the performance of DLCHI by incorporating the dictionary into the CHI formulation. Second, the need of transfer learning when the supply of training data is limited is vital. One way to tackle this problem is by exploring the transfer learning through modelbased transfer, where the prior knowledge from the generic recognizer enters through a modified regularization term in the CHI model. Last but not least, we can also consider an integration between databased and modelbased transfer learning. Where, by reweighting the input source data, we can minimize the discrepancy between the source and the target distributions, and then allowing CHI to be biased toward the parameters of another model.
6 \thelikesection Appendix
6.1 \thelikesubsection CHI model formulation
Then the solution w^{∗} to Eq. (9) can be obtained by: \(\mathbf {w}^{*} = Q^{1} \left (\mathbf {s}^{*} +Z\mathbf {u}^{*} + \widehat {X}\mathbf {v}^{*}\right)\).
6.2 \thelikesubsection Algorithm
Declarations
Acknowledgements
The authors acknowledge funding support from the National Science Foundation under Grants CMMI1536398 and CCF1715027. Authors are also grateful for ADNI, Heather Evans, and Bill Lober for data to demonstrate our method.
Authors’ contributions
SH and AS conceived the project. AS and SH completed the algorithm development, data analysis, and interpretation. Both authors contributed to the manuscript writing and approved the final manuscript.
Competing interests
Both authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 R Thomson, D Luettel, F Healey, S Scobie, Safer care for the acutely ill patient: learning from serious incidents. Natl. Patient Saf. Agency (2007).Google Scholar
 RP Gaynes, DH Culver, TC Horan, JR Edwards, C Richards, JS Tolson, National Nosocomial Infections Surveillance System, Surgical site infection (SSI) rates in the United States, 1992–1998: the national nosocomial infections surveillance system basic SSI risk index. Clin. Infect. Dis. 33(Supplement_2), S69–S77 (2001).View ArticleGoogle Scholar
 B Spring, M Gotsis, A Paiva, D SpruijtMetz, Healthy apps: mobile devices for continuous monitoring and intervention. IEEE Pulse. 4(6), 34–40 (2013).View ArticleGoogle Scholar
 DE Rivera, Optimized behavioral interventions: What does system identification and control engineering have to offer?IFAC Proc. Vol.45(16), 882–893 (2012).View ArticleGoogle Scholar
 S Deshpande, DE Rivera, JW Younger, NN Nandola, A control systems engineering approach for adaptive behavioral interventions: illustration with a fibromyalgia intervention. Transl. Behav. Med.4(3), 275–289 (2014).View ArticleGoogle Scholar
 G Zen, L Porzi, E Sangineto, E Ricci, N Sebe, Learning personalized models for facial expression analysis and gesture recognition. IEEE Trans. Multimedia. 18(4), 775–788 (2016).View ArticleGoogle Scholar
 Y Huang, Q Meng, H Evans, W Lober, Y Cheng, X Qian, J Liu, S Huang, CHI: A contemporaneous health index for degenerative disease monitoring using longitudinal measurements. J. Biomed. Inform. 73:, 115–124 (2017).View ArticleGoogle Scholar
 JL Cummings, Cognitive and behavioral heterogeneity in Alzheimer’s disease: seeking the neurobiological basis. Neurobiol. Aging. 21(6), 845–861 (2000).View ArticleGoogle Scholar
 MF Folstein, Heterogeneity in Alzheimer’s disease. Neurobiol. Aging. 10(5), 434–435 (1989).View ArticleGoogle Scholar
 E Friedland, JV Koss, RP Haxby, CL Grady, J Luxenberg, J Schapiro, MB Kaye, Annals Intern. Med. 109(4), 298–311 (1988).Google Scholar
 BA Olshausen, DJ Field, Emergence of simplecell receptive field properties by learning a sparse code for natural images. Nature. 381(6583), 607 (1996).View ArticleGoogle Scholar
 J Wright, A Yang, AY Ganesh, SS Sastry, Y Ma, Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009).View ArticleGoogle Scholar
 M Elad, M Aharon, Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans. Image Process.15(12), 3736–3745 (2006).MathSciNetView ArticleGoogle Scholar
 M Yang, L Zhang, J Yang, D Zhang, in Image Processing (ICIP), 2010 17th IEEE International Conference On. Metaface learning for sparse representation based face recognition (IEEE, 2010), pp. 1601–1604.Google Scholar
 Q Xu, H Yu, X Mou, L Zhang, J Hsieh, G Wang, Lowdose Xray CT reconstruction via dictionary learning. IEEE Trans. Med. Imaging.31(9), 1682–1697 (2012).View ArticleGoogle Scholar
 Y Chen, X Yin, L Shi, H Shu, L Luo, C Coatrieux, JL Toumoulin, Phys. Med. Biol.58(16), 5803 (2013).Google Scholar
 I Ramirez, P Sprechmann, G Sapiro, in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference On. Classification and clustering via dictionary learning with structured incoherence and shared features (IEEE, 2010), pp. 3501–3508.Google Scholar
 R Raina, A Battle, H Lee, B Packer, AY Ng, in Proceedings of the 24th International Conference on Machine Learning. Selftaught learning: transfer learning from unlabeled data (ACM, 2007), pp. 759–766.Google Scholar
 SG Mueller, MW Weiner, LJ Thal, RC Petersen, C Jack, W Jagust, JQ Trojanowski, L Toga, W ABeckett, The Alzheimer’s disease neuroimaging initiative. Neuroimaging Clin. N. Am.15(4), 869–877 (2005).View ArticleGoogle Scholar
 JR Petrella, RE Coleman, PM Doraiswamy, Neuroimaging and early diagnosis of Alzheimer disease: a look to the future. Radiology. 226(2), 315–336 (2003).View ArticleGoogle Scholar
 J Zhou, J Liu, J Narayan, VA Ye, in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Modeling disease progression via fused sparse group lasso (ACM, 2012), pp. 1095–1103.Google Scholar
 J Zhou, J Liu, J Narayan, Ye VA, ADN Initiative, et al., Modeling disease progression via multitask learning. NeuroImage. 78:, 233–248 (2013).Google Scholar
 J Mairal, M Elad, G Sapiro, Sparse representation for color image restoration. IEEE Trans. Image Process.17(1), 53–69 (2008).MathSciNetView ArticleMATHGoogle Scholar
 Z Jiang, Z Lin, LS Davis, Label consistent ksvd: Learning a discriminative dictionary for recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2651–2664 (2013).View ArticleGoogle Scholar
 M Elad, Y Figueiredo, MA Ma, On the role of sparse and redundant representations in image processing. Proc. IEEE. 98(6), 972–982 (2010).View ArticleGoogle Scholar
 B Cheng, J Yang, S Yan, Y Fu, TS Huang, Learning with l1graph for image analysis. IEEE Trans. Image Process.19(4), 858–866 (2010).MathSciNetView ArticleMATHGoogle Scholar
 J Wright, Y Ma, J Mairal, G Sapiro, S Huang, TS Yan, Sparse representation for computer vision and pattern recognition. Proc. IEEE. 98(6), 1031–1044 (2010).View ArticleGoogle Scholar
 JA Bagnell, DM Bradley, in Advances in Neural Information Processing Systems. Differentiable sparse coding (Curran Associates, Inc., 2009), pp. 113–120.Google Scholar
 J Mairal, J Ponce, G Sapiro, A Zisserman, FR Bach, in Advances in Neural Information Processing Systems. Supervised dictionary learning (Curran Associates, Inc., 2009), pp. 1033–1040. http://papers.nips.cc/paper/3448superviseddictionarylearning.pdf.Google Scholar
 S Bahrampour, A Nasrabadi, NM Ray, WK Jenkins, Multimodal taskdriven dictionary learning for image classification. IEEE Trans. Image Process. 25(1), 24–38 (2016).MathSciNetView ArticleGoogle Scholar
 K Engan, SO Aase, JH Husoy, in Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference On, 5. Method of optimal directions for frame design (IEEE, 1999), pp. 2443–2446.Google Scholar
 K Engan, SO Aase, JH Husøy, Multiframe compression: theory and design. Signal Process.80(10), 2121–2140 (2000).View ArticleMATHGoogle Scholar
 Q Zhang, B Li, in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference On. Discriminative ksvd for dictionary learning in face recognition (IEEE, 2010), pp. 2691–2698.Google Scholar
 K Engan, K Skretting, JH Husøy, Family of iterative lsbased dictionary learning algorithms, ilsdla, for sparse signal representation. Digit. Signal Process.17(1), 32–49 (2007).View ArticleGoogle Scholar
 J Mairal, G Sapiro, M Elad, Learning multiscale sparse representations for image and video restoration. Multiscale Model. Simul.7(1), 214–241 (2008).MathSciNetView ArticleMATHGoogle Scholar
 K KreutzDelgado, JF Murray, BD Rao, K Engan, TW Lee, TJ Sejnowski, Dictionary learning algorithms for sparse representation. Neural Comput.15(2), 349–396 (2003).View ArticleMATHGoogle Scholar
 M Donoho, DL Elad, Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization. Proc. Natl. Acad. Sci.100(5), 2197–2202 (2003).MathSciNetView ArticleMATHGoogle Scholar
 Z Mallat, SG Zhang, Matching pursuits with timefrequency dictionaries. IEEE Trans. Signal Process.41(12), 3397–3415 (1993).View ArticleMATHGoogle Scholar
 Z Jiang, Z Lin, LS Davis, in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference On. Learning a discriminative dictionary for sparse coding via label consistent ksvd (IEEE, 2011), pp. 1697–1704.Google Scholar
 M Aharon, M Elad, A Bruckstein, r m ksvd: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process.54(11), 4311–4322 (2006).View ArticleMATHGoogle Scholar
 A Dipiro, RG Martindale, JT Bakst, PF Vacani, P Watson, MT Miller, Infection in surgical patients: effects on mortality, hospitalization, and postdischarge care. Am. J. HealthSyst. Pharmacy. 55(8), 777–781 (1998).Google Scholar
 E Lawson, BL Hall, CY Ko, Risk factors for superficial vs deep/organspace surgical site infections: implications for quality improvement initiatives. JAMA Surg. 148(9), 849–858 (2013).View ArticleGoogle Scholar
 L Saunders, M PerennecOlivier, P Jarno, F L’Hériteau, AG Venier, L Simon, M Giard, JM Thiolet, JF Viel, et al, Improving prediction of surgical site infection risk with multilevel modeling. PloS ONE. 9(5), e95295 (2014).View ArticleGoogle Scholar
 P Tseng, S Yun, A coordinate gradient descent method for nonsmooth separable minimization. Math. Prog. 117(12), 387–423 (2009).MathSciNetView ArticleMATHGoogle Scholar