 Research
 Open Access
 Published:
DLCHI: a dictionary learningbased contemporaneous health index for degenerative disease monitoring
EURASIP Journal on Advances in Signal Processing volume 2018, Article number: 17 (2018)
Abstract
Effective monitoring of degenerative patient conditions is crucial for many clinical decisionmaking problems. Leveraging the nowadays datarich environments in many clinical settings, in this paper, we propose a novel clinical data fusion framework that can build a contemporaneous health index (CHI) for degenerative disease monitoring to quantify the severity of deterioration process over time. Our framework specifically exploits the monotonic progression patterns of the target degenerative disease conditions such as the Alzheimer’s disease (AD) and articulate these patterns with a systematic optimization formulation. Further, to address the patient heterogeneity, we integrate CHI with dictionary learning to build sets of overcomplete bases to represent the personalized models efficiently. Numerical performances on two realworld applications show the promising capability of the proposed DLCHI model.
Introduction
In this paper, we concern the problem of patient risk monitoring that is to characterize the trajectory over the course of progression. Although there is no universal definition of the concept “patient condition,” it has been a crucial concept in the communications between clinicians and frequently referenced by healthcare providers. Developing a precise contemporaneous longitudinal index (CHI) that can faithfully reflect the underlying patient condition across the course of the condition’s progression holds great value for facilitating a range of clinical decisionmakings. For instance, it will help early detection of patient deterioration to help reduce the number of serious incidents, i.e., it is reported that 11% of serious incidents are a function of deterioration not acted upon mainly due to the failure to recognize the sign of deterioration [1, 2]. It will also help enhance the continuity of care since a longitudinal perspective of the patient condition can be provided for clinicians and healthcare providers. Also, it may ultimately lead to development of control system engineering that can implement adaptive interventions for better healthcare management [3–6], with a global representation of the dynamic condition in evolution.
Towards this goal, technological innovations are emerging in many healthcare applications, which have given rise to a datarich environment where an abundance of longitudinal clinical measurements that reflect the degeneration of the health condition can be continuously collected. For example, to monitor the surgical site infection (SSI), daily wound measurements, such as the temperature, granularity, and distance of the wound, could be acquired to assess the condition of the wound, together with other nonwoundrelated but important clinical signals such as heart rate, morning body temperature, NG tube presence, etc. However, particular data characteristics present challenges that call for specialized data fusion models to predict patient conditions using the multivariate longitudinal data. For instance, as these multivariate longitudinal data are actually temporal realizations of an underlying disease progression in different dimensions, how to leverage our knowledge of the disease progression process to fuse the data is a challenge. Also, the fact that these data are usually sampled at irregular time points adds in another layer of complexity. And even if we could fuse the data properly, the existence of patient heterogeneity multiplies the complexity of the problem that calls for a generic framework to personalize the model based on individual’s characteristics implicitly embedded in data.
To tackle those challenges, we propose a novel framework, named as DLCHI, that focuses on a particular category of disease conditions that follow a monotonic disease progression process. In our previous work [7], we have developed a contemporaneous health index (CHI) that fuses the irregular multivariate longitudinal time series data to quantify the severity of degenerative disease conditions to fit the monotonic degradation process of the disease condition. However, CHI is designed for average user and ignores the patient heterogeneity and therefore limits their applicability in realworld applications. For example, it is known that patients of Alzheimer’s disease (AD) suffer from very diverse and heterogeneous progression processes [8–10]. A possible remedy is to build personalized model on an individual’s basis. However, this demands a great amount of labeled training samples, which are very likely not feasible in many clinical settings.
Thus, this motivates us to develop the DLCHI framework by integrating CHI with dictionary learning [8, 11]. The basic idea shared by the dictionary learning algorithms is that the input signal is approximated with a sparse linear combination of a few dictionary elements or basis [12]. DL has been used in many signal processing applications, such as signal reconstruction [13], face recognition [14], and healthcare [15, 16]. The dictionary basis provides a succinct representation that can span the space of the personalized models to capture the patient heterogeneity and reveal the hidden structures in the data (in a similar spirit as principal component analysis). It has been shown that the performance of a classification task can be improved by learning a sparsifying dictionary from the data set. [17, 18]. The reason is that the sparsifying dictionary actually plays a role in the regularization of the model learning, as the dictionary basis vectors are numerical representations of patient heterogeneity. Translating this wisdom into DLCHI, our basic idea is to first learn individual models through the CHI formulation and then reconstruct the model parameters of the learned individual models via supervised dictionary learning. Each column of the dictionary represents a basis vector. As such, each individual model is represented as a sparse linear combination of the basis vectors.
The paper is organized as follows. In Section 2, related work in the literature will be reviewed and discussed. In Section 3, the proposed analytic framework will be presented, and the corresponding computational algorithm will be derived. In Section 4, the proposed method will be implemented and validated using two realworld applications; one is for monitoring of brain health in AD and the other is monitoring of SSI. We will conclude the study in Section 5. Note that, in this paper, we use lowercase letters, e.g., x, to represent scalars, boldface lowercase letters, e.g., v, to represent vectors, and boldface uppercase letters, e.g., W, to represent matrices.
Related works
The CHI model
The CHI model is developed in [7] which specifically utilizes the monotonicity of disease progression to enhance the data fusion of multivariate clinical measurements taken at irregular time points. In this section, we will first briefly present the basic formulation of the CHI model and then present the DLCHI model that integrates CHI with dictionary learning for personalized models.
The CHI model was motivated by the common characteristics of many degenerative conditions such as AD which shows monotonic progression trajectory. For example, for AD, a number of biomarkers have been developed to measure the degeneration of the neural systems, including the neuroimaging modalities such as PET and MRI scans [19, 20]. It is typical to see that, along with the disease progression, the brain volumes shown in the MRI scans continue to shrink over time. The same phenomenon could be observed on the PET scans with the persistent decrease of metabolic activities. Those monotonic patterns indicate that the disease progression, once started, tends to be worse and worse.
The task of CHI is to translate multivariate longitudinal measurements into a contemporaneous health index h_{n,t} that captures patient condition changing over the course of progression. Note that different individuals could be measured with different length of time and at different time locations. As we target degenerative conditions, CHI should be monotonic, i.e., \(h_{n,t_{1}}\geq h_{n,t_{2}}\) if t_{1}≥t_{2}, if we assume that higher index represents more severe condition. Since CHI is a latent construct that is not directly measurable, clinical variables associated with it can be measured over time, which provide us data to learn it.
Denote the training set by \(\mathbf {x}_{n, t} = \left [x_{n,1,t},..., x_{n,d,t}\right ]^ T\in \mathbb {R}^{d}\) collected from N patients. Here, each measurement x_{n,i,t} is the value of the ith variable for the nth subject for a given time t, where t∈{1,…,T_{ n }} is the time index. Converting the measurements x_{n,t} into h_{n,t} needs a mathematical model for h_{n,t}=f(x_{n,t}). Here, for simplicity and interpretability, we start with the linear models, i.e., h_{n,t}=x_{n,t}·w, where w∈R^{d} is a vector of weight coefficients to combine the d variables. Denote the total number of positive and negative samples by N^{+} and N^{−} respectively, i.e., N^{+}:={ny_{ n }=1} and N^{−}:={ny_{ n }=−1}.
The formulation of the CHI learning framework is shown in below:
Items in (1) can be explained as follows:

The first term (1a) and second term (1b) are the SVM formulation that aims to utilize the label information to enhance the discriminatory power of CHI. Here, y_{ n }∈{1,−1} is the label of the nth sample that indicates if the nth subject is diseased or not.

The term (1c) is invented to enforce the monotonicity of the learned health index, i.e., \(h_{n,t_{1}}\geq h_{n,t_{2}}\) if t_{1}≥t_{2}. Here, z_{n,t} is the difference of two successive data vectors z_{n,t}:=x_{n,t+1}−x_{n,t}.

Items (1d) and (1f) are invented to encourage the homogeneity of CHI within the group that has the same health status. Here, \(\bar {\mathbf {x}}^{+}_{T_{n}}\) and \(\bar {\mathbf {x}}^{}_{T_{n}}\) represent the center of data vectors at time T_{ n } for all positive and negative samples, respectively, that are:
$$\begin{array}{*{20}l} \bar{\mathbf{x}}^{+}_{T_{n}}:=&{\frac{1}{N^{+}}}\sum_{n\in \{ny_{n}=1\}}\mathbf{x}_{n, T_{n}}\\ \bar{\mathbf{x}}^{}_{T_{n}}:=&{\frac{1}{N^{}}}\sum_{n\in \{ny_{n}=1\}}\mathbf{x}_{n, T_{n}}. \end{array} $$ 
The last term, (1f), is the L_{1}norm penalty that is used to encourage sparsity of the features.
Note that the proposed formulation generalized many existing models, such as SVM, sparse SVM, LASSO, etc. The CHI model could be efficiently solved by using the block coordinate descent algorithm that is illustrated in Appendix: “CHI model formulation” section.
Dictionary learning
Developing models like CHI helps us to capture changes in various aspects of the disease trajectory. But as CHI assumes the same model for the whole population, it ignores heterogeneity of degenerative diseases and therefore limits its applicability in realworld applications that have shown great patient heterogeneity [21, 22]. Recently, it has been shown that learning a dictionary can overcome the above limitations [14, 23, 24]. The basic idea of dictionary learning algorithms is to approximate training samples as a sparse linear combination of the few dictionary elements. Hence, dictionary learning algorithm can be considered as a way to represent lowdimensional structure of highdimensional data.
DL was applied to many applications and achieved stateoftheart performances, such as image denoising [13] and inpainting [25], clustering [26, 27], classification [28, 29] etc. It is known that the conventional DL framework was designed for a reconstruction task instead of adapting to classification. It is believed that classification performance will be further improved if we carefully learn a classificationoriented dictionary. For instance, in [12] a sparse representationbased classification (SRC) method was proposed for robust face recognition and achieved very impressive results. SRC treats the original data set as a dictionary, wherein the classspecific training sets are subdictionaries contributing to discrimination. Inspired by SRC, Yang et al. proposed a metaface learning [14] to learn an adaptive dictionary for each class, and Ramirez et al. [17] added another term to derive more delicate classificationoriented dictionaries.
The use of dictionary learning for personalization of prediction models is also achieved by proposing novel transfer learning approaches. For example, in [6] personalization task was performed in two phases: learning userspecific source classifiers and learning a distributiontoclassifier mapping via implementing dictionary learning. Another approach is to perform multimodal taskdriven dictionary learning algorithm under the joint sparsity constraint to enforce collaborations among multiple homogeneous/heterogeneous sources of information. In taskdriven formulation, the multimodal dictionaries are learned simultaneously with their corresponding classifiers. The resulting multimodal dictionaries can generate discriminative sparse codes from the data that are optimized for a given task such as binary or multiclass classification [30].
There are various dictionary learning algorithms that are effective for classification tasks [31–34]. Zhang and Li proposed discriminative KSVD to simultaneously achieve a dictionary which has a good representation power while supporting optimal discrimination of the classes [33]. The name KSVD refers to updating a dictionary with K vectors. A collection of training vectors corresponding to the dictionary vector in its approximation are taken by minimizing the Frobenius norm of the approximation error by solving for the dictionary vector at each iteration. This algorithm starts with an initial dictionary and initial sparse code coefficients, and then, one dictionary vector is updated at each iteration. The corresponding sparse coefficient is changed before proceeding to update the next dictionary vector. The minimization is done through singular value decomposition (SVD). Another example is the iterative least squares dictionary learning algorithms (ILSDLA) presented in [31, 32], where assumes known sparse code coefficients at each iteration and derives the best possible dictionary using either the orthogonal matching pursuit (OMP) or Focal Underdetermined System Solver (FOCUSS). ILSDLA method deploys a second order update which makes it nearly impractical in reasonable dimensions due to its matrix inversion step. Another example is the recursive least squares dictionary learning algorithm RLSDLA, which is an online alternation of ILSDLA. In the online alternation, each training signal is processed one at a time to improve the dictionary. One of the larger challenges with ILSDLA and KSVD is to find a good initial dictionary. The online nature of RLSDLA prevents getting stuck in a local minimum close to the initial dictionary contrary to the KSVD and ILSDLA. RLSDLA uses the forgetting factor to improve the convergence properties of the algorithm and hence makes the algorithm less dependent on the initial dictionary. However, RLSDLA method requires to permute the order of training vectors and adapt the forgetting factor to satisfy the randomness and convergence properties of the online nature of the algorithm.
There are several properties that should be considered in the search for a successful dictionary training algorithm. Flexibility: The algorithm should be flexible enough to run with various sparse approximation algorithm such as pursuit algorithm which involves finding the best projections of input signal onto the span of an overcomplete dictionary D. The flexibility property would enable different choices in favor of runtime constraints. Usually, methods that are flexible enough would separate the dictionary updates with sparse coding stage. Adaptivity: An overcomplete dictionary D either can be chosen as a predetermined set of functions or designed to iteratively getting updated to better fit the data. Choosing a prespecified dictionary is appealing because it is simpler and may lead to a fast algorithm. However, the dictionary that leads to the best representation for each member in this set, under strict sparsity constraints, is needed. Such dictionaries have the potential to outperform the commonly used prespecified dictionaries. Efficiency: A dictionary learning algorithm should lead to a numerically efficient and fast convergence. For example, ILSDLA has a secondorder update which makes it nearly impractical in reasonable dimensions.
KSVD algorithm is flexible and works with any pursuit algorithm. In addition, it leads to the best representation for each training vector. Given the merits of DL in overcoming heterogeneity of models, and the classification performance, here we used the idea of DL and developed the DLCHI framework using KSVD dictionary learning algorithm. Therefore, we reconstructed our model parameter of each individual sample to be linear combination of dictionary elements. We further compared our methodology with CHI and other dictionary models KSVD, ILSDLA, and RLSDLA. Note, that DLCHI formulation is personalized and not designed for average users unlike the above methods.
The proposed DLCHI model
Rational and formulation
To extend CHI for personalized models, our approach is built on the dictionary learning framework [35]. As we have mentioned, the dictionary learning aims to identify a set of representative vectors that could characterize the lowdimensional structure embedded in a highdimensional vector space [36–38]. Particularly, here, taking the model parameter vectors of all the individuals as the highdimensional vector space, we seek a dictionary to represent these model parameter vectors. The dictionary will be learned from data, and it helps regularize the learning of the models since it requires the model parameter vectors to be (sparse) linear combination of the dictionary bases. The whole pipeline of this DLCHI model is shown in Fig. 1.
From this point of view, the dictionary learning could be viewed as a tradeoff made between two extremes. In one extreme, there is only one model for all the individuals, i.e., the “onesizefitsall” model. On the other extreme, there is one distinct model for all the individuals and these models are all independent with each other. As a tradeoff, dictionary learning exploits the dependency and difference of the individuals simultaneously.
To fulfill this idea, here, we denote the set of model parameter vectors of all the individuals as \(\mathbf {W}^{\ast }=\left [ \mathbf {w}_{1}^{\ast }\ldots,\mathbf {w}_{i}^{\ast },\ldots,\mathbf {w}_{N}^{\ast }\right ]\), where \(\mathbf {w}^{*}_{i}\) represents weight coefficient vector of the i^{th} patient learned from the CHI model. Using dictionary learning, we aim to find an overcomplete dictionary \(\mathbf {D}\in \mathbb {R}^{d\times k }\) that contains k independent columns referred as the basis vectors, \(\left \{ d_{i}\right \}^{k}_{i=1}\). A model parameter vector w^{∗} can be represented as a linear combination of these basis vectors, satisfying the approximation condition w^{∗}≈Da, where a is the coefficient vector which can be considered as the representation of w^{∗} over the dictionary D.
In order for D to be flexible and robust to noise, we set the dictionary to be overcomplete (k>d). On the other hand, given any w^{∗} with a overcomplete dictionary, we need to find the smallest set of basis vectors from the dictionary to represent it. When we set the dictionary to be overcomplete, an infinite number of solutions are available for the representation; hence, constraints on the solution must be set. The solution with the fewest number of nonzero coefficients in a to represent w^{∗} is certainly an appealing representation. This strategy is called sparse coding that is often used in dictionary learning representations. In this setting, sparse coding amounts to computing the following:
Here, ∥·∥_{0} is the l^{0} norm, counting the nonzero entries of a vector, and A=[a_{1},…,a_{ N }] is the coefficient matrix of the sparse decomposition. In order to achieve sparse representations given a set of training vectors, we adapt a dictionary that leads to the best representation for each vector in this training set, under strict sparsity constraints.
Computational algorithm
In DLCHI, we used the KSVD dictionary learning algorithm [39, 40] for sparse representation as an optimization problem, which can be efficiently solved via orthogonal matching pursuit (OMP) and singular value decomposition (SVD). The KSVD approach is an iterative procedure that consists of two steps, and both steps in the algorithm are coherent with each other, working towards the minimization of the overall objective function.
First, we considered the sparse coding stage where we assumed that D was fixed and considered the optimization problem in (2) as a search for sparse representation with coefficients summarized in the matrix A. The sparsity term of the constraint was relaxed so that the number of nonzero entries of each column a_{ i } could be more than 1 and less than a number T_{0}. In doing so, the relaxed objective function becomes:
In (3) D was first fixed such that we could focus on learning the coefficient matrix A using the orthogonal matching pursuit method, as long as it could supply a solution with a fixed and predetermined number of nonzero entries T_{0}. OMP is an iterative greedy algorithm that selects the column best correlated with the residual part of the signal and represents the suboptimal solution to the problem of sparse signal representation. The major advantage of the OMP is its simplicity and fast implementation. The problem in (3) consists of N distinct problems.
With a learned A, we searched for the best dictionary D. The search process is to update only one column of the dictionary, d_{ k }, at each time corresponding to i^{th} row in A, denoted as \(\mathbf {a}_{T}^{j}\) (this is not the vector a_{ i } which is the i^{th} column in A). The process of updating only one column of D at a time has a straightforward solution based on the singular value decomposition (SVD). The problem becomes looking only at the training vectors that uses only one column of the dictionary vector in its approximation, minimizing the approximation error E_{ k }. The matrix \(\mathbf {E}_{k} = \mathbf {W}^{*} \sum _{j\neq k}^{k}\mathbf {d}_{j}\mathbf {a}_{T}^{j}\) stands for the error for all the training samples when the kth basis is removed, and \(\mathbf {a}_{T}^{k}\) is the kth row in A. The SVD finds the closest rank1 matrix (in Frobenius norm) that approximates E_{ k }. Hence, we rewrote the penalty term in (3) as:
The notation ∥A∥_{ F } stands for the Frobenius norm, defined as \(\left \ A\right \_{F}= \sqrt {\sum _{ij}A^{2}_{ij}}\). Then, the penalty term in (2) can be rewritten as:
Hence, we updated the \(\left  \left  \mathbf {E}_{k}\mathbf {d}_{k}\mathbf {a}_{T}^{k}\right  \right _{F}^{2}\), assuming fixed coefficients A and error E_{ k }. The constraint is over the jth orthonormal basis D_{ j }. By decomposing the multiplication DA into the sum of K rank 1 matrices, we can assume that the other K−1 terms were fixed, and the kth remains unknown. Then, the singular value decomposition finds the closest K−1 terms that approximate E_{ k }, and this will effectively minimize the error in Eq. (5).
The above solution of vector \(\mathbf {a}_{T}^{k}\) is very likely to be filled, because the sparsity constraint is not enforced. To enforce the sparsity constraint, we define ω_{ k } as the group of indices pointing to examples \(\mathbf {w}^{*}_{i}\) that use basis d_{ k } and entries of \(\mathbf {a}_{T}^{k}\left (i\right)\) that are nonzero. Thus, \(\mathbf {\omega }_{k}=\left \{ i1\leq i\leq N, \mathbf {a}_{T}^{k}\left (i\right) \neq 0 \right \} \). Then, we compute \(\mathbf {E}_{k}= \left \ \mathbf {W}^{*} \sum _{j\neq k}^{k}\mathbf {d}_{j}\mathbf {a}_{T}^{j}\right \_{F}^{2} \) by only choosing columns corresponding to ω_{ k }. We then apply the SVD decomposition \(\mathbf {E}_{k}^{R}=\mathbf {U}\Lambda \mathbf {V}^{T}\). The solution for d_{ k } is the first column of U, and the updated coefficient vector is the first column of U×Λ(1,1).
Summary of DLCHI
Putting all together, an overview of the DLCHI method can be seen from Fig. 2. A full description of the DLCHI algorithm is also given in Algorithm 1. It can be seen in Algorithm 1 that we have to learn W,A, and D. We split the algorithm into two phases for learning personalized CHI and dictionary learning. In the phase I, we intend to solve w^{∗} via CHI using the Algorithm 2 described in Appendix: “Algorithm” section. In this phase, we learn the model parameter vectors of all individuals, which lead to the construction of the matrix W^{∗}. In the phase II, we use the KSVD method to learn the dictionary by first computing the best representation matrix A via (3) using the matching pursuit algorithm and then searching for the best dictionary. With a learned dictionary, the representations of the individual’s models could be identified and further used as the final individual models. Specifically, from the dictionary algorithm we can find the the lowdimensional structure of the model parameter matrix W^{∗}≈DA, where each column of W^{∗} is a reconstructed model parameter vector of each individual to be linear combination of dictionary elements.
Numerical studies
Realworld applications
We implement the DLCHI model on two realworld datasets that were collected in Alzheimer’s disease (AD) and surgical site infection (SSI) research. Both diseases exhibit monotonic disease progression and significant patient heterogeneity. For the Alzheimer’s disease data, we use the FDGPET images of 162 subjects (Alzheimer’s Disease: 74, Normal aging: 88) downloaded from the ADNI (www.loni.usc.edu/ADNI). For each subject, there are at least three time points and at most seven time points. The data has been preprocessed and the Automated Anatomical Labeling has been used to segment each image into 116 anatomical volumes of interest (AVOIs). We select 90 AVOIs that are in the cerebral cortex in our study. Each AVOI becomes a variable here. The measurement data of each region, according to the mechanism of FDGPET, is the regional average FDG binding counts, representing the degree of glucose metabolism. Extensive evidences in the literature have shown that the glucose metabolism will decline as a function of the aging, while the pathology of neurodegenerative diseases such as AD will further accelerate the declination, providing a perfect application example for implementing and testing the proposed DLCHI method.
The SSI data exhibit similar characteristics as the AD data. There have been many models developed to monitor individuals who are subject to developing SSI [41–43], based on daily wound measurements, such as the temperature, granularity, and distance of the wound, together with other nonwoundrelated but important clinical signals such as heart rate, morning body temperature, and NG tube presence, etc. Figure 3 shows the longitudinal trend of a woundrelated variable collected in our data, which clearly shows the monotonic degradation process of the SSI patients. The SSI data include longitudinal wound measurements from 857 patients, among which 169 are SSI patients and 539 are normal control. The data include wound measurement variables, for example, wound edge distance, temperature, include exudate amount, etc. Some other physiological variables such as heart rate are also provided in the data. Subjects were measured in time length ranging from 3 days to 21 days.
Parameter tuning and validation
For each experiment, we randomly split the data into two equal parts, one for training and one for testing. For training, we used 10fold cross validation to tune the parameters. As CHI is a complex data fusion mechanism that synthesizes monotonicity of the disease progression, label information, and statistical homogeneity, we use a comprehensive scheme to compare DLCHI with CHI. Specifically, we compared the two models (1) when only monotonicity is used for model training (i.e., by setting β=0 and optimizing for α), (2) when only the label information is used for model training (i.e., by setting α=0 and optimizing for β), and (3) when a full model is used (i.e., by optimizing for both α and β). In addition, we performed in each of the settings by randomly downsampling the training data, i.e., only using a proportion of the data ranging from 15 to 75%, to train both models. A model that can maintain good performances with less training data in obviously more promising in healthcare applications while data collection is relatively more costly than other realworld applications.
Results
Comparison between CHI and DLCHI across a wide range of scenarios aforementioned are reported in Table 1. In general, it is observed that the DLCHI model could significantly improve CHI model by accounting for the patient heterogeneity. This makes sense, since enforcing the constraint that the individual CHI model should be represented by a dictionary plays a role in the regularization of the model learning, as the dictionary basis vectors are numerical representations of patient heterogeneity. It is shown that in all of the three scenarios, using only monotonicity (β=0), using only the label information α=0, or the fullmodel DLCHI model, achieve satisfying results. Another observation is that enforcing monotonicity constraint alone leads to satisfactory performance for the DLCHI model. As shown in Table 1, the DLCHI method is also robust to small sample size. We investigate DLCHI model’s capability by selecting only 15% of the data as the training data, while the 10folder cross validation was used to identify the optimal parameters in the model. The results show that our method achieves better prediction performance than the CHI model that uses the same ratio of the training data. Overall, the results show that the DLCHI has a great potential for clinical applications to overcome the limitation of the CHI method in mitigating patient heterogeneity.
Table 2 shows the performance comparison of personalized DLCHI method with the CHI model and and three dictionary methods: KSVD, ILSDLA, and RLSDLA. While for each model training, 10fold cross validation is used on the training data and the AUC is evaluated on the testing data. Results in Table 2 show that the integration of dictionary learning with the CHI model improves the performance of the algorithm. The performance of RLSDLA is in general considerable better than that of ILSDLA and KSVD. However, interestingly DLCHI model performance demonstrates that it is superior to the RLSDLA despite its convergence as an online algorithm and its ability for reconstruction purposes.
Representation capacity of dictionary learning
Figure 4 provides the results regarding the number of basis vectors needed for a sufficient representation of patient heterogeneity from AD. Apparently, the larger the dictionary size, the lower the representation error. On the other hand, we can also observe that the error of representation drops quickly with the increasing number of basis vectors in the dictionary. As the optimal dictionary size is not known in advance; hence, we first obtained it through an initial dictionary D_{0} of large size K. The initial dictionary \(\mathbf {D}_{0}\in \mathbb {R}^{d\times k }\) is obtained by selecting K samples randomly from input signals. The dictionary D_{0} helps minimizing the reconstruction error, and it is not yet optimal. For our experiment, we selected the number of basis based on the minimum error of representation given various dictionary sizes. To satisfy the overcompleteness, we choose the size of D_{0} to be sufficiently larger than the dimension of an input signal.
Conclusion
In this paper, we presented a DLCHI formulation to help build personalized contemporary health index (CHI) to monitor patient condition over time. Through applications on two realworld datasets of AD and SSI, the DLCHI model is shown to be better than the CHI model in patient prediction and can achieve robust results with small sample sizes. In the future, we may further enhance the DLCHI method in the following directions. First, note that, in the current DLCHI formulation, the individual models have to be learned via the CHI formulation without information from the dictionary. Only with a learned dictionary, the representations of the individual’s models are identified and further used as the final individual models. This is a possibility that a joint learning of both steps could further enhance the performance of DLCHI by incorporating the dictionary into the CHI formulation. Second, the need of transfer learning when the supply of training data is limited is vital. One way to tackle this problem is by exploring the transfer learning through modelbased transfer, where the prior knowledge from the generic recognizer enters through a modified regularization term in the CHI model. Last but not least, we can also consider an integration between databased and modelbased transfer learning. Where, by reweighting the input source data, we can minimize the discrepancy between the source and the target distributions, and then allowing CHI to be biased toward the parameters of another model.
\thelikesection Appendix
\thelikesubsection CHI model formulation
For completeness of DLCHI, here we present more details of the CHI formulation (1). The CHI formulation is convex but contains multiple nonsmooth terms such as (1b), (1c), and (1f). To solve this formulation, we could merge the smooth terms and derive the dual optimization problem, and finally train it via the block coordinate descent algorithm. Specifically, we can simplify Eq. (1) in a quadratic form by defining:
where Q is defined as
With that, Eq. (6) is simplified to Eq. (7) as follows:
By introducing two relaxation variables ξ and ε, Eq. (7) is equivalent to Eq. (8) as follows:
where
We then can derive the dual formulation of (8) by substituting the ℓ_{1}norm penalty in (8) by its conjugate norm ∥w∥_{1}= max∥s∥_{ ∞ }≤1〈s,w〉= max∥s∥_{ ∞ }≤1−〈s,w〉, and then introducing two new dual variables u and v which leads to the following formulation:
This can be rewritten as the following constrained smooth convex optimization problem, which can be solved efficiently:
Then the solution w^{∗} to Eq. (9) can be obtained by: \(\mathbf {w}^{*} = Q^{1} \left (\mathbf {s}^{*} +Z\mathbf {u}^{*} + \widehat {X}\mathbf {v}^{*}\right)\).
\thelikesubsection Algorithm
The block coordinate descent algorithm [44] to solve the dual problem in Eq. (9) is an iterative procedure as follows:
References
 1
R Thomson, D Luettel, F Healey, S Scobie, Safer care for the acutely ill patient: learning from serious incidents. Natl. Patient Saf. Agency (2007).
 2
RP Gaynes, DH Culver, TC Horan, JR Edwards, C Richards, JS Tolson, National Nosocomial Infections Surveillance System, Surgical site infection (SSI) rates in the United States, 1992–1998: the national nosocomial infections surveillance system basic SSI risk index. Clin. Infect. Dis. 33(Supplement_2), S69–S77 (2001).
 3
B Spring, M Gotsis, A Paiva, D SpruijtMetz, Healthy apps: mobile devices for continuous monitoring and intervention. IEEE Pulse. 4(6), 34–40 (2013).
 4
DE Rivera, Optimized behavioral interventions: What does system identification and control engineering have to offer?IFAC Proc. Vol.45(16), 882–893 (2012).
 5
S Deshpande, DE Rivera, JW Younger, NN Nandola, A control systems engineering approach for adaptive behavioral interventions: illustration with a fibromyalgia intervention. Transl. Behav. Med.4(3), 275–289 (2014).
 6
G Zen, L Porzi, E Sangineto, E Ricci, N Sebe, Learning personalized models for facial expression analysis and gesture recognition. IEEE Trans. Multimedia. 18(4), 775–788 (2016).
 7
Y Huang, Q Meng, H Evans, W Lober, Y Cheng, X Qian, J Liu, S Huang, CHI: A contemporaneous health index for degenerative disease monitoring using longitudinal measurements. J. Biomed. Inform. 73:, 115–124 (2017).
 8
JL Cummings, Cognitive and behavioral heterogeneity in Alzheimer’s disease: seeking the neurobiological basis. Neurobiol. Aging. 21(6), 845–861 (2000).
 9
MF Folstein, Heterogeneity in Alzheimer’s disease. Neurobiol. Aging. 10(5), 434–435 (1989).
 10
E Friedland, JV Koss, RP Haxby, CL Grady, J Luxenberg, J Schapiro, MB Kaye, Annals Intern. Med. 109(4), 298–311 (1988).
 11
BA Olshausen, DJ Field, Emergence of simplecell receptive field properties by learning a sparse code for natural images. Nature. 381(6583), 607 (1996).
 12
J Wright, A Yang, AY Ganesh, SS Sastry, Y Ma, Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009).
 13
M Elad, M Aharon, Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans. Image Process.15(12), 3736–3745 (2006).
 14
M Yang, L Zhang, J Yang, D Zhang, in Image Processing (ICIP), 2010 17th IEEE International Conference On. Metaface learning for sparse representation based face recognition (IEEE, 2010), pp. 1601–1604.
 15
Q Xu, H Yu, X Mou, L Zhang, J Hsieh, G Wang, Lowdose Xray CT reconstruction via dictionary learning. IEEE Trans. Med. Imaging.31(9), 1682–1697 (2012).
 16
Y Chen, X Yin, L Shi, H Shu, L Luo, C Coatrieux, JL Toumoulin, Phys. Med. Biol.58(16), 5803 (2013).
 17
I Ramirez, P Sprechmann, G Sapiro, in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference On. Classification and clustering via dictionary learning with structured incoherence and shared features (IEEE, 2010), pp. 3501–3508.
 18
R Raina, A Battle, H Lee, B Packer, AY Ng, in Proceedings of the 24th International Conference on Machine Learning. Selftaught learning: transfer learning from unlabeled data (ACM, 2007), pp. 759–766.
 19
SG Mueller, MW Weiner, LJ Thal, RC Petersen, C Jack, W Jagust, JQ Trojanowski, L Toga, W ABeckett, The Alzheimer’s disease neuroimaging initiative. Neuroimaging Clin. N. Am.15(4), 869–877 (2005).
 20
JR Petrella, RE Coleman, PM Doraiswamy, Neuroimaging and early diagnosis of Alzheimer disease: a look to the future. Radiology. 226(2), 315–336 (2003).
 21
J Zhou, J Liu, J Narayan, VA Ye, in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Modeling disease progression via fused sparse group lasso (ACM, 2012), pp. 1095–1103.
 22
J Zhou, J Liu, J Narayan, Ye VA, ADN Initiative, et al., Modeling disease progression via multitask learning. NeuroImage. 78:, 233–248 (2013).
 23
J Mairal, M Elad, G Sapiro, Sparse representation for color image restoration. IEEE Trans. Image Process.17(1), 53–69 (2008).
 24
Z Jiang, Z Lin, LS Davis, Label consistent ksvd: Learning a discriminative dictionary for recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2651–2664 (2013).
 25
M Elad, Y Figueiredo, MA Ma, On the role of sparse and redundant representations in image processing. Proc. IEEE. 98(6), 972–982 (2010).
 26
B Cheng, J Yang, S Yan, Y Fu, TS Huang, Learning with l1graph for image analysis. IEEE Trans. Image Process.19(4), 858–866 (2010).
 27
J Wright, Y Ma, J Mairal, G Sapiro, S Huang, TS Yan, Sparse representation for computer vision and pattern recognition. Proc. IEEE. 98(6), 1031–1044 (2010).
 28
JA Bagnell, DM Bradley, in Advances in Neural Information Processing Systems. Differentiable sparse coding (Curran Associates, Inc., 2009), pp. 113–120.
 29
J Mairal, J Ponce, G Sapiro, A Zisserman, FR Bach, in Advances in Neural Information Processing Systems. Supervised dictionary learning (Curran Associates, Inc., 2009), pp. 1033–1040. http://papers.nips.cc/paper/3448superviseddictionarylearning.pdf.
 30
S Bahrampour, A Nasrabadi, NM Ray, WK Jenkins, Multimodal taskdriven dictionary learning for image classification. IEEE Trans. Image Process. 25(1), 24–38 (2016).
 31
K Engan, SO Aase, JH Husoy, in Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference On, 5. Method of optimal directions for frame design (IEEE, 1999), pp. 2443–2446.
 32
K Engan, SO Aase, JH Husøy, Multiframe compression: theory and design. Signal Process.80(10), 2121–2140 (2000).
 33
Q Zhang, B Li, in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference On. Discriminative ksvd for dictionary learning in face recognition (IEEE, 2010), pp. 2691–2698.
 34
K Engan, K Skretting, JH Husøy, Family of iterative lsbased dictionary learning algorithms, ilsdla, for sparse signal representation. Digit. Signal Process.17(1), 32–49 (2007).
 35
J Mairal, G Sapiro, M Elad, Learning multiscale sparse representations for image and video restoration. Multiscale Model. Simul.7(1), 214–241 (2008).
 36
K KreutzDelgado, JF Murray, BD Rao, K Engan, TW Lee, TJ Sejnowski, Dictionary learning algorithms for sparse representation. Neural Comput.15(2), 349–396 (2003).
 37
M Donoho, DL Elad, Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization. Proc. Natl. Acad. Sci.100(5), 2197–2202 (2003).
 38
Z Mallat, SG Zhang, Matching pursuits with timefrequency dictionaries. IEEE Trans. Signal Process.41(12), 3397–3415 (1993).
 39
Z Jiang, Z Lin, LS Davis, in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference On. Learning a discriminative dictionary for sparse coding via label consistent ksvd (IEEE, 2011), pp. 1697–1704.
 40
M Aharon, M Elad, A Bruckstein, r m ksvd: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process.54(11), 4311–4322 (2006).
 41
A Dipiro, RG Martindale, JT Bakst, PF Vacani, P Watson, MT Miller, Infection in surgical patients: effects on mortality, hospitalization, and postdischarge care. Am. J. HealthSyst. Pharmacy. 55(8), 777–781 (1998).
 42
E Lawson, BL Hall, CY Ko, Risk factors for superficial vs deep/organspace surgical site infections: implications for quality improvement initiatives. JAMA Surg. 148(9), 849–858 (2013).
 43
L Saunders, M PerennecOlivier, P Jarno, F L’Hériteau, AG Venier, L Simon, M Giard, JM Thiolet, JF Viel, et al, Improving prediction of surgical site infection risk with multilevel modeling. PloS ONE. 9(5), e95295 (2014).
 44
P Tseng, S Yun, A coordinate gradient descent method for nonsmooth separable minimization. Math. Prog. 117(12), 387–423 (2009).
Acknowledgements
The authors acknowledge funding support from the National Science Foundation under Grants CMMI1536398 and CCF1715027. Authors are also grateful for ADNI, Heather Evans, and Bill Lober for data to demonstrate our method.
Author information
Affiliations
Contributions
SH and AS conceived the project. AS and SH completed the algorithm development, data analysis, and interpretation. Both authors contributed to the manuscript writing and approved the final manuscript.
Corresponding author
Correspondence to Aven Samareh.
Ethics declarations
Competing interests
Both authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Samareh, A., Huang, S. DLCHI: a dictionary learningbased contemporaneous health index for degenerative disease monitoring. EURASIP J. Adv. Signal Process. 2018, 17 (2018) doi:10.1186/s1363401805388
Received
Accepted
Published
DOI
Keywords
 Dictionary learning
 Patient monitoring
 Convex optimization
 Personalized healthcare