Skip to main content

DeConFuse: a deep convolutional transform-based unsupervised fusion framework


This work proposes an unsupervised fusion framework based on deep convolutional transform learning. The great learning ability of convolutional filters for data analysis is well acknowledged. The success of convolutive features owes to the convolutional neural network (CNN). However, CNN cannot perform learning tasks in an unsupervised fashion. In a recent work, we show that such shortcoming can be addressed by adopting a convolutional transform learning (CTL) approach, where convolutional filters are learnt in an unsupervised fashion. The present paper aims at (i) proposing a deep version of CTL, (ii) proposing an unsupervised fusion formulation taking advantage of the proposed deep CTL representation, and (iii) developing a mathematically sounded optimization strategy for performing the learning task. We apply the proposed technique, named DeConFuse, on the problem of stock forecasting and trading. A comparison with state-of-the-art methods (based on CNN and long short-term memory network) shows the superiority of our method for performing a reliable feature extraction.


In the last decade, convolutional neural network (CNN) has enjoyed tremendous success in different types of data analysis. It was initially applied for images in computer vision tasks. The operations within the CNN were believed to mimic the human visual system. Although such a link between human vision and CNN may be present, it has been observed that deep CNNs are not exact models for human vision [1]. For instance, biologists consider that the human visual system would consist of 6 layers [2, 3] and not 20+ layers used in GoogleNet [4].

Neural network models have also been used for analyzing time series data. Until recently, long short-term memory (LSTM) networks were the almost exclusively used neural network models for time series analysis as they were supposed to mimic memory and hence were deemed suitable for such tasks. However, LSTM are not able to model very long sequences, and their training is hardware intensive. Owing to these shortcomings, LSTMs are being replaced by CNNs. The reason for the great results of CNN methods for time series analysis (1D data processing in general) is not well understood. One possibility may lie in the universal function approximation capacity of deep neural networks [5, 6] rather than its biological semblance. The research in this area is primarily led by its success rather than its understanding.

An important point to mention is that the performance of CNN is largely driven by the availability of very large labeled datasets. This probably explains their tremendous success in facial recognition tasks. Google’s FaceNet [7] and Facebook’s DeepFace [8] architectures are trained on 400 million facial images, a significant proportion of world’s population. These companies are easily equipped with gigantic labeled facial images data as these are “tagged” by their respective users. In the said problem, deep networks reach almost 100% accuracy, even surpassing human capabilities. However, when it comes to tasks that require expert labeling, such as facial recognition from sketches (requiring forensic expertise) [8] or ischemic attack detection from EEG (requiring medical expertise) [9], the accuracies become modest. Indeed, such tasks require expert labeling that is difficult to acquire, thus limiting the size of available labeled dataset.

The same is believed by a number of machine learning researchers, including Hinton himself, who are wary of supervised learning. In an interview with AxiosFootnote 1, Hinton mentioned his “deep suspicion” on backpropagation, the workhorse behind all supervised deep neural networks. He even added that “I don’t think it’s how the brain works,” and “We clearly don’t need all the labeled data.” It seems that Hinton is hinting towards unsupervised learning frameworks. Unsupervised learning technique does not require targets/labels to learn from data. This approach typically takes benefit from the fact that data is inherently very rich in its structure, unlike targets that are sparse in nature. Thus, it does not take into account the task to be performed while learning about the data, saving from the need of human expertise that is required in supervised learning. More on the topic of unsupervised versus supervised learning can be found in a blog by DeepMindFootnote 2.

In this work, we would like to keep the best of both worlds, i.e., the success of convolutive models from CNN and the promises of unsupervised learning formulations. With this goal in mind, we developed convolutional transform learning (CTL) [10]. This is a representation learning technique that learns a set of convolutional filters from the data without label information. Instead of learning the filters (by backpropagating) from data labels, CTL learns them by minimizing a data fidelity loss, thus making the technique unsupervised. CTL has been shown to outperform several supervised and unsupervised learning schemes in the context of image classification. In the present work, we propose to extend the shallow CTL version to deeper layers, with the aim to generate a feature extraction strategy that is well suited for 1D time series analysis. This is the first major contribution of this work—deep convolutional transform learning.

In most applications, time series signals are multivariate, as they arise from multiple sources/sensors. For example, biomedical signals like ECG and EEG come from multiple leads; financial data from stocks are recorded with different inputs (open, close, low, high, and net asset value) and demand forecasting problems in smartgrids come with multiple types of data (power consumption, temperature, humidity, occupancy, etc.). In all such cases, the final goal is to perform prediction/classification task from such multivariate time series. We propose to address such problem as one of feature fusion. The information from each of the sources will be processed by the proposed deep CTL pipeline, and the generated deep features will be finally fused by an unsupervised fully connected layer. This is the second major contribution of this work—an unsupervised fusion framework with deep CTL.

The resulting features can be used for different applicative tasks. In this paper, we will focus on the applicative problem of financial stock analysis. The ultimate goal may be either to forecast the stock price (regression problem) or to decide whether to buy or sell (classification problem). Depending on the considered task, we can pass the generated features into suitable machine learning tool that may not be as data hungry as deep neural networks. Therefore, by adopting such a processing architecture, we expect to yield better results than traditional deep learning especially in cases where access to labeled data is limited.

Literature review

CNN for time series analysis

Let us briefly review and discuss CNN-based methods for time series analysis. For a more detailed review, the interested reader can peruse [11]. We mainly focus on studies on stock forecasting as it will be our use case for experimental validation.

The traditional choice for processing time series with neural network is to adopt a recurrent neural network (RNN) architecture. Variants of RNN like long short-term memory (LSTM) [12] and gated recurrent unit (GRU) [13] have been proposed. However, due to the complexity of training such networks via backpropagation through time, they have been progressively replaced with 1D CNN [14]. For example, in [15], a generic time series analysis framework was built based on LSTM, with assessed performance on the UCR time series classification datasets The later study from the same group [17], based on 1D CNN, showed considerable improvement over the prior model on the same datasets.

There are also several studies that convert 1D time series data into a matrix form so as to be able to use 2D CNNs [16, 18, 19]. Each column of the matrix corresponds to a subset of the 1D series within a given time window, and the resulting matrix is processed as an image. The 2D CNN model has been especially popular in stock forecasting. In [19], the said techniques have been used on stock prices for forecasting. A slightly different input is used in [20]: instead of using the standard stock variables (open, close, high, low, and NAV), it uses high frequency data for forecasting major points of inflection in the financial market. In another work [21], a similar approach is used for modeling exchange -traded fund (ETF). It has been seen that the 2D CNN model performs the same as LSTM or the standard multi-layer perceptron [22, 23]. The apparent lack of performance improvement in the aforementioned studies may be due to an incorrect choice of CNN model, since an inherently 1D time series is modeled as an image.

Deep learning and fusion

We now review existing works for processing multivariate data inputs, within the deep learning framework. Since the present work aims at being applied to stock price forecasting/trading, we will mostly focus our review on the multi-channel/multi-sensor fusion framework. Multimodal data and fusion for image processing, less related to our work, will be mentioned at the end of this subsection for the sake of completeness.

Deep learning has been widely used recently for analyzing multi-channel/multi-sensor signals. In several of such studies, all the sensors are stacked one after the other to form a matrix and 2D CNN is used for analyzing these signals. For example, [24] uses this strategy for analyzing human activity recognition from multiple body sensors. It is important to distinguish such an approach from the aforementioned studies [1923]. Here, the images are not formed from stacking windowed signals from the same signal one after the other, but by stacking signals from different sensors. The said study [24] does not account for any temporal modeling; this is rectified in [25]. In there, 2D CNN is used on a time series window; but the different windows are finally processed by GRU, thus explicitly incorporating time series modeling. There is however no explicit fusion framework in [24, 25]. The information from raw multivariate signals is simply fused to form matrices and treated by 2D convolutions. A true fusion framework was proposed in [26]. Each signal channel is processed by a deep 1D CNN, and the output from the different signal processing pipelines are then fused by a fully connected layer. Thus, the fusion is happening at the feature level and not in the raw signal level as it was in [24, 25].

Another area that routinely uses deep learning based fusion is multi-modal data processing. This area is not as well defined as multi-channel data processing; nevertheless, we will briefly discuss some studies on this topic. In [27], a fusion scheme is shown for audio-visual analysis that uses a fusion scheme for deep belief network (DBN) and stacked autoencoder (SAE) for fusing audio and video channels. Each channel is processed separately and connected by a fully connected layer to produce fused features. These fused features are further processed for inference. We can also mention the work on video-based action recognition addressed in [28], which proposes a fusion scheme for incorporating temporal information (processed by CNN) and spatial information (also processed by CNN).

There are several other such works on image analysis [2931]. In [29], a fusion scheme is proposed for processing color and depth information (via 3D and 2D convolutions, respectively) with the objective of action recognition. In [30], it was shown that by fusing hyperspectral data (high spatial resolution) with Lidar (depth information), better classification results can be achieved. In [31], it was shown that fusing deeply learnt features (from CNN) with handcrafted features via a fully connected layer can improve analysis tasks. In this work, our interest lies in the first problem; that of inference from 1D/time-series multi-channel signals. To the best of our knowledge, all prior deep learning-based studies on this topic are supervised. In keeping with the vision of Hinton and others, our goal is to develop an unsupervised fusion framework using deeply learnt convolutive filters.

Convolutional transform learning

Convolutional transform learning (CTL) has been introduced in our seminal paper [10]. Since it is a recent work, we present it in detail in the current paper, to make it self-content. CTL learns a set of filters (tm)1≤mM operated on observed samples (s(k))1≤kK to generate a set of features \(\left (x_{m}^{(k)}\right)_{1 \leq m \leq M,1 \leq k \leq K}\). Formally, the inherent learning model is expressed through convolution operations defined as

$$ (\forall m \in \{ 1, \ldots,M\}\;, \forall k \in \{ 1, \ldots, K\})\qquad {t_{m}} * {s^{(k)}} = x_{m}^{(k)}. $$

Following the original study on transform learning [32], a sparsity penalty is imposed on the features for improving representation ability and limit overfitting issues. Moreover, in the same line as CNN models, the non-negativity constraint is imposed on the features. Training then consists of learning the convolutional filters and the representation coefficients from the data. This is expressed as the following optimization problem

$$\begin{array}{*{20}l} \underset{{(t_{m})_{m}},(x_{m}^{(k)})_{m,k}}{\text{minimize}}\ \ \ \frac{1}{2}\sum\limits_{k = 1}^{K} \sum_{m = 1}^{M} \left(\left\| {{t_{m}} * {s^{(k)}} - x_{m}^{(k)}} \right\|_{2}^{2} + \psi(x_{m}^{(k)}) \right) \\ + \mu \sum_{m = 1}^{M} \left\| t_{m} \right\|_{2}^{2} - \lambda \log \det \left([ {{t_{1}}|\ldots|{t_{M}}} ] \right), \end{array} $$

where ψ is a suitable penalization function. Note that the regularization term “\(\mu \left \| \cdot \right \|_{F}^{2} - \lambda \log \det \)” ensures that the learnt filters are unique, something that is not guaranteed in CNN. Let us introduce the matrix notation

$$ T*S-X = \left[\begin{array}{ccc} t_{1} * s^{(1)} - x_{1}^{(1)} & \dots & t_{M} * s^{(1)} - x_{M}^{(1)}\\ \vdots & \ddots&\vdots\\ t_{1} * s^{(K)} - x_{1}^{(K)} & \dots & t_{M} * s^{(K)} - x_{M}^{(K)}\\ \end{array}\right] $$

where \(T=\left [\begin {array}{ccc}t_{1} & \dots & t_{M}\end {array}\right ]\), \(S=\left [\begin {array}{ccc}s^{(1)} & \dots & s^{(K)}\end {array}\right ]^{\top }\), and \(X=\left [\begin {array}{ccc} x_{1}^{(k)} & \dots & x_{M}^{(k)} \end {array}\right ]_{1\le k\le K}\). The cost function in problem (2) can be compactly rewritten asFootnote 3

$$ F(T,X) = \frac{1}{2}\left\| T*S - X \right\|_{F}^{2} + \Psi(X) + \mu \left\| T \right\|_{F}^{2} - \lambda \log \det \left(T \right), $$

where Ψ applies the penalty term ψ column-wise on X.

A local minimizer to (4) can be reached efficiently using the alternating proximal algorithm [3335], which alternates between proximal updates on variables T and X. More precisely, set a Hilbert space \((\mathcal {H},\|\cdot \|)\) and define the proximity operator [23] at \(\tilde x \in \mathcal {H}\) of a proper lower-semi-continuous convex function \(\varphi : \mathcal {H} \to ] - \infty, + \infty ]\) as

$$ \operatorname{prox}_{\varphi}(\tilde x) = \mathop {\arg \min }\limits_{x \in \mathcal{H}} \varphi (x) + \frac{1}{2}\left\| {x - \tilde x} \right\|^{2}. $$

Then, the alternating proximal algorithm reads

$$ \begin{array}{l} {\rm{For\ }}{n} = 0,1,...\\ \left\lfloor \begin{array}{rl} T^{[n + 1]} &= \operatorname{prox}_{\gamma_{1} F(\cdot,X^{[n]})} \left(T^{[n]}\right)\\ X^{[n + 1]} &= \operatorname{prox}_{\gamma_{2} F(T^{[n + 1]},\cdot)}\left(X^{[n]} \right) \end{array} \right. \end{array} $$

with initializations T[0], X[0] and γ1,γ2 positive constants. For more details on the derivations and the convergence guarantees, the readers can refer to [10].

Fusion based on deep convolutional transform learning

In this section, we discuss our proposed formulation. First, we extend the aforementioned CTL formulation to a deeper version. Next, we develop the fusion framework based on transform learning, leading to our DeConFuseFootnote 4 strategy.

Deep convolutional transform learning

Deep CTL consists of stacking multiple convolutional layers on top of each other to generate the features, as shown in Fig. 1. To learn all the variables in an end-to-end fashion, deep CTL relies on the key property that the solution \(\widehat {X}\) to the CTL problem, assuming fixed filters T, can be reformulated as the simple application of an element-wise activation function, that is

$$ \operatorname*{argmin}_{X} F(T,X) = \phi(T * S), $$
Fig. 1

Deep CTL architecture. The illustration is given for L=2 layers, with the first layer T1 composed of M1=4 filters of size 5×1, and the second layer composed of M2=8 filters of size 3×1

with ϕ the proximity operator of Ψ [36]. For example, if Ψ is the indicator function of the positive orthant, then ϕ identifies with the famous rectified linear unit (ReLU) activation function. Many other examples are provided in [36]. Consequently, deep features can be computed by stacking many such layers

$$ (\forall \ell\in\{1,\dots,L-1\})\qquad X_{\ell} = \phi_{\ell}(T_{\ell} *X_{\ell-1}), $$

where X0=S and ϕ a given activation function for layer .

Putting all together, deep CTL amounts to

$$\begin{array}{*{20}l} \underset{T_{1},\dots,T_{L},X}{\text{minimize}}\ \ \ F_{\text{conv}}(T_{1},\dots,T_{L},X\,|\,S) \end{array} $$


$$\begin{array}{*{20}l} F_{\text{conv}}(T_{1},\dots,T_{L},X\,|\,S) &= \frac{1}{2} \| T_{L}*\phi_{L-1}(T_{L-1}*\dots \phi_{1}(T_{1}*S)) - X\|_{F}^{2} \\ &+ \Psi(X) + \sum_{\ell=1}^{L}\left(\mu||T_{\ell}||^{2}_{F} - \lambda\log\det(T_{\ell})\right). \end{array} $$

This is a direct extension of the one-layer formulation in (4).

Multi-channel fusion framework

We now propose a fusion framework to learn in an unsupervised fashion a suitable representation of multi-channel data that can then be utilized for a multitude of tasks. This framework takes the channels of input data samples to separate branches of convolutional layers, leading to multiple sets of channel-wise features. These decoupled features are then concatenated and passed to a fully connected layer, which yields a unique set of coupled features. The complete architecture, called DeConFuse, is shown in Fig. 2.

Fig. 2

DeConFuse architecture

Since we have multi-channel data, for each channel c{1,…,C}, we learn a different set of convolutional filters \(T^{(c)}_{1},\dots,T^{(c)}_{L}\) and features X(c). At the same time, we learn the (not convolutional) linear transform \(\widetilde {T}=(\widetilde {T}_{c})_{1\le c\le C}\) to fuse the channel-wise features X=(X(c))1≤cC, along with the corresponding fused features Z, which constitute the final output of the proposed DeConFuse model, as shown in Fig. 2. This leads to the joint optimization problem

$$ \underset{T, X, \widetilde{T}, Z}{\text{minimize}}\ \ \ \underbrace{F_{\text{fusion}}(\widetilde{T}, Z, X) +\sum_{c=1}^{C} F_{\text{conv}}(T_{1}^{(c)},\dots,T_{L}^{(c)},X^{(c)}\,|\,S^{(c)})}_{J(T, X, \widetilde{T}, Z)} $$


$$ F_{\text{fusion}}(\widetilde{T}, Z, X) = \frac{1}{2} \left\|Z - \sum_{c=1}^{C} \text{flat}(X^{(c)}) \widetilde{T}_{c} \right\|^{2}_{F} + \iota_{+}(Z) + \sum_{c=1}^{C}\left(\mu\|\widetilde{T}_{c}\|^{2}_{F} - \lambda\log\det(\widetilde{T}_{c})\right), $$

where the operator “ flat” transforms X(c) into a matrix where each row contains the features of a sample flattened as a vector.

To summarize, our formulation aims to jointly train the channel-wise convolutional filters \(T_{\ell }^{(c)}\) and the fusion coefficients \(\widetilde {T}\) in an end-to-end fashion. We explicitly learn the features X and Z subject to non-negativity constraints so as to avoid trivial solutions and make our approach completely unsupervised. Moreover, the “log-det” regularization on both \(T_{\ell }^{(c)}\) and \(\widetilde {T}\) breaks symmetry and forces diversity in the learnt transforms, whereas the Frobenius regularization ensures that the transform coefficients are bounded.

Optimization algorithm

As for the solution of problem (11), we remark that all terms of the cost function are differentiable, except the indicator function of the non-negativity constraint. We can, therefore, find a local minimizer to (11) by employing the projected gradient descent, whose iterations read

$$ \begin{array}{l} {\rm{For\ }}{n} = 0,1,...\\ \;\left\lfloor \begin{array}{rl} T^{[n + 1]} &= T^{[n]} - \gamma \nabla_{T} J(T^{[n]}, X^{[n]}, \widetilde{T}^{[n]}, Z^{[n]})\\ X^{[n + 1]} &= \mathcal{P}_{+}\left(X^{[n]} - \gamma\nabla_{X} J(T^{[n]}, X^{[n]}, \widetilde{T}^{[n]}, Z^{[n]})\right)\\ \widetilde{T}^{[n + 1]} &= \widetilde{T}^{[n]} - \gamma\nabla_{\widetilde{T}} J(T^{[n]}, X^{[n]}, \widetilde{T}^{[n]}, Z^{[n]})\\ Z^{[n + 1]} &= \mathcal{P}_{+}\left(Z^{[n]} - \gamma\nabla_{Z} J(T^{[n]}, X^{[n]}, \widetilde{T}^{[n]}, Z^{[n]})\right)\\ \end{array} \right. \end{array} $$

with initialization \(T^{[0]}, X^{[0]}, \widetilde {T}^{[0]}, Z^{[0]}\), γ>0, and \(\mathcal {P}_{+} = \max \{\cdot,0\}\). In practice, we make use of accelerated strategies [37] within each step of this algorithm to speed up learning.

There are two notable advantages with the proposed optimization approach. Firstly, we rely on automatic differentiation [38] and stochastic gradient approximations to efficiently solve problem (11). Secondly, we are not limited to ReLU activation in (8), but rather we can use more advanced ones, such as SELU [39]. This is beneficial for the performance, as shown by our numerical results.

Computational complexity of proposed framework—DeConFuse

Table 1 summarizes the computational complexity of DeconFuse architecture, both for training and test phases. Specifically, it is reported the cost incurred for every input sample at each iteration of gradient descent in the training phase and for the output computation in testing phase. The computational complexity of DeConFuse architecture is comparable to a regular CNN. The only addition is the log-det regularization, which requires to compute the truncated singular value decomposition of \(T_{\ell }^{(c)}\) and \(\widetilde {T}_{c}\). However, as the size of these matrices is determined by the filter size, the number of filters, and the number of output features per sample, the training complexity is not worse than that of a CNN.

Table 1 Time complexity in training and test phases (for one input sample)

Experimental evaluation

We carry out experiments on the real-world problem of stock forecasting and trading. The problem of stock forecasting is a regression problem aiming at estimating the price of a stock at a future date (next day for our problem) given inputs till the current date. Stock trading is a classification problem, where the decision whether to buy or sell a stock has to be taken at each time. The two problems are related by the fact that simple logic dictates that if the price of a stock at a later date is expected to increase, the stock must be bought; and if the stock price is expected to go down, the stock must be sold.

We will use the five raw inputs for both the tasks, namely open price, close price, high, low, and net asset value (NAV). One could compute technical indicators based on the raw inputs [19], but in keeping with the essence of true representation learning, we chose to stay with those raw values. Each of the five inputs is processed by a separate 1D processing pipeline. Each of the pipelines produces a flattened output (Fig. 1). The flattened outputs are then concatenated and fed into the transform learning layer acting as the fully connected layer (Fig. 2) for fusion. While our processing pipeline ends here (being unsupervised), the benchmark techniques are supervised and have an output node. The node is binary (buy/sell) for classification and real valued for regression. More precisely, we will compare with two state-of-the-art time series analysis models, namely TimeNet [15] and ConvTimeNet [17]. In the former, the processing individual processing pipelines are based on LSTM and in the later they use 1D CNN.

We make use of a real dataset from the National Stock Exchange (NSE) of India. The dataset contains information of 150 symbols between 2014 and 2018; these stocks were chosen after filtering out stocks that had less than 3 years of data. The companies available in the dataset are from various sectors such as IT (e.g., TCS, INFY), automobile (e.g., HEROMOTOCO, TATAMOTORS), bank (e.g., HDFCBANK, ICICIBANK), coal and petroleum (e.g., OIL, ONGC), steel (e.g., JSWSTEEL, TATASTEEL), construction (e.g., ABIRLANUVO, ACC), and public sector units (e.g., POWERGRID, GAIL). The detailed architectures for each tested techniques, namely DeConFuse, ConvTimeNet, and TimeNet, are presented in Table 2. For DeConFuse, TimeNet, and ConvTimeNet, we have tuned the architectures to yield the best performance and have randomly initialized the weights for each stock’s training.

Table 2 Description of compared models

Stock forecasting—regression

Let us start with the stock forecasting problem. We feed the generated unsupervised features from the proposed architecture into an external regressor, namely ridge regression. Evaluation is carried out in terms of mean absolute error (MAE) between the predicted and actual stock prices for all 150 stocks. The stock forecasting results are shown in Table 5 in Appendix 1 section. The MAE for individual stocks are presented for each of close price, open price, high price, low price, and net asset value.

From Table 5 in Appendix 1 section, it can be seen that the MAE values reached for the proposed DeConFuse solution for the four first prices (open, close, high, low) are exceptionally good for all of the 150 stocks. Regarding NAV prediction, the proposed method performs extremely well for 128 stocks. For the remaining 22 stocks, there are 13 stocks, highlighted in red, for which DeConFuse does not give the lowest MAE but it is still very close to the best results given by the TimeNet approach.

For a concise summary of the results, the average values over all stocks are shown in Table 3.

Table 3 Summary of forecasting results

From the summary Table 3, it can be observed that our error is more than an order of magnitude better than the state of the arts. The plots for one of the regressed prices (close price) for some examples of stocks in Fig. 3 show that the predicted close prices from DeConFuse are closer to the true close prices than benchmark predictions.

Fig. 3

Stock forecasting performance

Stock trading—classification

We now focus on the stock trading task. In this case, the generated unsupervised features from DeConFuse are inputs to an external classifier based on random decision forest (RDF) with 5 decision tree classifiers and depth 3. Even though we used this architecture, we found that the results from RDF are robust to changes in architecture. This is a well known phenomenon about RDFs [40]. We evaluate the results in terms of precision, recall, F1 score, and area under the ROC curve (AUC). From the financial viewpoint, we also calculate annualized returns (AR) using the predicted trading signals/labels as well as using true trading signals/labels named as predicted AR and true AR, respectively. The starting capital used for calculating AR values for every stock is Rs. 100,000 and the transaction charges are Rs 10. The stock trading results are shown in Table 6 in Appendix 2 section.

Certain results from Table 6 in Appendix 2 section are highlighted in bold or red. The first set of results, marked in bold, are the ones where one of the techniques for each metric gives the best performance for each stock. The proposed solution DeConFuse gives the best results for 89 stocks for precision score, 85 stocks for recall score, 125 stocks for F1 score, 91 stocks for AUC measure, and 56 stocks in case of the AR metric. The other set marked in red highlights the cases where DeConfuse has not performed the best but performs nearly equal (here, a difference of maximum 0.05 in the metric is considered) to the best performance given by one of the benchmarks, i.e., DeConFuse gives the next best performance. We noticed that there are 24 stocks for which DeConFuse gives the next best precision metric value. Likewise, 18 stocks in case of recall, 22 stocks for F1 score, 26 stocks for AUC values, and 1 stock in case of AR. Overall, DeConfuse reaches a very satisfying performance over the benchmark techniques. This is also corroborated from the summary of trading results in Table 4.

Table 4 Summary of trading results

We also display empirical convergence plots for few stocks, namely RELIANCE, ONGC, HINDUNILVR, and ICICIBANK, in Fig. 4. We can see that the training loss decreases to a point of stability for each example.

Fig. 4

Empirical convergence plots


In this work, we propose DeConFuse, a deep fusion end-to-end framework for the processing of 1D multi-channel data. Unlike other deep learning models, our framework is unsupervised. It is based on a novel deep version of our recently proposed convolutional transform learning model. We have applied the proposed model for stock forecasting/trading leading to very good performance. The framework is generic enough to handle other multi-channel fusion problems as well.

The advantage of our framework is its ability to learn in an unsupervised fashion. For example, consider the problem we address. For traditional deep learning-based models, we need to retrain to deep networks for regression and classification. But we can reuse our features for both the tasks, without the requirement of re-training, for specific tasks. This has advantages in other areas as well. For example, one can either do ischemia detection, i.e., detect whether one is having a stroke at the current time instant (from EEG); or one can do ischemia prediction, i.e., forecast if a stroke is going to happen. In standard deep learning, two networks need to be retrained and tuned to tackle these two problems. With our proposed method, there is no need for this double effort.

In the future, we would work on extending the framework for supervised/semi-supervised formulations. We believe that the semi-supervised formulation will be of immense practical importance. We would also like to extend it to 2D convolutions in order to handle image data.

Appendix 1: Detailed stock forecasting results

Table 5 Stock-wise forecasting results

Appendix 2: Detailed stock trading results

Table 6 Stock-wise trading results

Availability of data and materials

The dataset used is a real dataset of the Indian National Stock Exchange (NSE) of past 4 years and is publicly available. We have shared the data with our implementation available at


  1. 1. artificial-intelligence-pioneer-says-we-need-to-start-over-1513305524-f619efbd-9db0-4947-a9b2-7a4c310a28fe. html

  2. 2.

  3. 3.

    Note that T is not necessarily a square matrix. By an abuse of notation, we define the “log-det” of a rectangular matrix as the sum of logarithms of its singular values.

  4. 4.

    Code available at:



Transform learning


Convolutional transform learning


Convolutional neural network


Long short-term memory


Gated recurrent unit


Rectified linear unit


Scaled exponential linear units


National Stock Exchange


Area under curve


Receiver operating characteristics


Net asset value


Random decision forest






Annualized returns


Mean absolute error


  1. 1

    N. Kriegeskorte, Deep neural networks: a new framework for modeling biological vision and brain information processing. Annual Rev. Vis. Sci.1:, 417–446 (2015).

    Google Scholar 

  2. 2

    R. W. Guillery, S. M. Sherman, Thalamic relay functions and their role in corticocortical communication: generalizations from the visual system. Neuron. 33(2), 163–175 (2002).

    Google Scholar 

  3. 3

    J. Cudeiro, A. M. Sillito, Looking back: corticothalamic feedback and early visual processing. Trends Neurosci.29(6), 298–306 (2006).

    Google Scholar 

  4. 4

    C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, in Proceedings of the IEEE conference on computer vision and pattern recognition. Going deeper with convolutions, (2015), pp. 1–9.

  5. 5

    I. Daubechies, R. DeVore, S. Foucart, B. Hanin, G. Petrova, Nonlinear approximation and (deep) ReLU networks. arXiv preprint arXiv:1905.02199 (2019).

  6. 6

    P. Petersen, F. Voigtlaender, Optimal approximation of piecewise smooth functions using deep ReLU neural networks. Neural Netw.108:, 296–330 (2018).

    MATH  Google Scholar 

  7. 7

    F. Schroff, D. Kalenichenko, J. Philbin, in Proceedings of the IEEE conference on computer vision and pattern recognition. FaceNet: a unified embedding for face recognition and clustering, (2015), pp. 815–823.

  8. 8

    Y. Taigman, M. Yang, M. A. Ranzato, L. Wolf, in Proceedings of the IEEE conference on computer vision and pattern recognition. DeepFace: close the gap to human-level performance in face verification, (2014), pp. 1701–1708.

  9. 9

    S. Nagpal, M. Singh, R. Singh, M. Vatsa, A. Noore, A. Majumdar, in Proceedings of the IEEE International Conference on Computer Vision. Face sketch matching via coupled deep transform learning, (2017), pp. 5419–5428.

  10. 10

    J. Maggu, E. Chouzenoux, G. Chierchia, A. Majumdar, in International Conference on Neural Information Processing. Convolutional transform learning (SpringerCham, 2018), pp. 162–174.

    Google Scholar 

  11. 11

    H. I. Fawaz, G. Forestier, J. Weber, L. Idoumghar, P. A. Muller, Deep learning for time series classification: a review. Data Min. Knowl. Discov.33(4), 917–963 (2019).

    MathSciNet  Google Scholar 

  12. 12

    S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput.9(8), 1735–1780 (1997).

    Google Scholar 

  13. 13

    J. Chung, C. Gulcehre, K. Cho, Y. Bengio, in International Conference on Machine Learning. Gated feedback recurrent neural networks, (2015), pp. 2067–2075.

  14. 14

    Z. Wang, W. Yan, T. Oates, in 2017 international joint conference on neural networks (IJCNN). Time series classification from scratch with deep neural networks: a strong baseline (IEEE, 2017), pp. 1578–1585.

  15. 15

    P. Malhotra, V. TV, L. Vig, P. Agarwal, G. Shroff, TimeNet: pre-trained deep recurrent neural network for time series classification. arXiv preprint arXiv:1706.08838 (2017).

  16. 16

    N. Hatami, Y. Gavet, J Debayle, in Tenth International Conference on Machine Vision (ICMV 2017), vol 10696. Classification of time-series images using deep convolutional neural networks (International Society for Optics and Photonics, 2018), p. 106960Y.

  17. 17

    K. Kashiparekh, J. Narwariya, P. Malhotra, L. Vig, G. Shroff, ConvTimeNet: a pre-trained deep convolutional neural network for time series classification. arXiv preprint arXiv:1904.12546 (2019).

  18. 18

    Z. Wang, T. Oates, in Twenty-Fourth International Joint Conference on Artificial Intelligence. Imaging time-series to improve classification and imputation, (2015).

  19. 19

    O. B. Sezer, A. M. Ozbayoglu, Algorithmic financial trading with deep convolutional neural networks: time series to image conversion approach. Appl. Soft Comput.70:, 525–538 (2018).

    Google Scholar 

  20. 20

    A. Tsantekidis, N. Passalis, A. Tefas, J. Kanniainen, M. Gabbouj, A. Iosifidis, in 2017 IEEE 19th Conference on Business Informatics (CBI) vol. 1. Forecasting stock prices from the limit order book using convolutional neural networks (IEEE, 2017), pp. 7–12.

  21. 21

    M. U. Gudelek, S. A. Boluk, A. M. Ozbayoglu, in 2017 IEEE Symposium Series on Computational Intelligence (SSCI). A deep learning based stock trading model with 2-D CNN trend detection (IEEE, 2017), pp. 1–8.

  22. 22

    S. Ravishankar, Y. Bresler, Sparsifying transform learning with efficient optimal updates and convergence guarantees. IEEE Trans. Sig. Process.63(9), 2389–2404 (2015).

    MathSciNet  MATH  Google Scholar 

  23. 23

    P. L. Combettes, JC. Pesquet, in Fixed-Point Algorithms for Inverse Problems in Science and Engineering. Springer Optimization and Its Applications, vol 49, ed. by H. Bauschke, R. Burachik, P. Combettes, V. Elser, D. Luke, and H. Wolkowicz. Proximal splitting methods in signal processing (SpringerNew York, 2011).

    Google Scholar 

  24. 24

    J. Yang, M. N. Nguyen, P. P. San, X. L. Li, S. Krishnaswamy, in Twenty-Fourth International Joint Conference on Artificial Intelligence. Deep convolutional neural networks on multichannel time series for human activity recognition, (2015).

  25. 25

    S. Yao, S. Hu, Y. Zhao, A. Zhang, T. Abdelzaher, in Proceedings of the 26th International Conference on World Wide Web. DeepSense: a unified deep learning framework for time-series mobile sensing data processing, (2017), pp. 351–360.

  26. 26

    Y. Zheng, Q. Liu, E. Chen, Y. Ge, J. L. Zhao, in International Conference on Web-Age Information Management. Time series classification using multi-channels deep convolutional neural networks (SpringerCham, 2014), pp. 298–310.

    Google Scholar 

  27. 27

    J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, A. Y. Ng, in Proceedings of the 28th international conference on machine learning (ICML-11). Multimodal deep learning, (2011), pp. 689–696.

  28. 28

    C. Feichtenhofer, A. Pinz, A. Zisserman, in Proceedings of the IEEE conference on computer vision and pattern recognition. Convolutional two-stream network fusion for video action recognition, (2016), pp. 1933–1941.

  29. 29

    A. Eitel, J. T. Springenberg, L. Spinello, M. Riedmiller, W. Burgard, in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Multimodal deep learning for robust RGB-D object recognition (IEEE, 2015), pp. 681–687.

  30. 30

    Y. Chen, C. Li, P. Ghamisi, X. Jia, Y. Gu, Deep fusion of remote sensing data for accurate classification. IEEE Geosci. Remote Sens. Lett.14(8), 1253–1257 (2017).

    Google Scholar 

  31. 31

    N. Antropova, B. Q. Huynh, M. L. Giger, A deep feature fusion methodology for breast cancer diagnosis demonstrated on three imaging modality datasets. Med. Phys.44(10), 5162–5171 (2017).

    Google Scholar 

  32. 32

    S. Ravishankar, Y. Bresler, Learning sparsifying transforms. IEEE Trans. Sig. Process.61(5), 1072–1086 (2012).

    MathSciNet  MATH  Google Scholar 

  33. 33

    H. Attouch, J. Bolte, B. F. Svaiter, Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program.137:, 91–129 (2011).

    MathSciNet  MATH  Google Scholar 

  34. 34

    E. Chouzenoux, J. C. Pesquet, A. Repetti, A block coordinate variable metric forward-backward algorithm. J. Glob. Optim.66(3), 457–485 (2016).

    MathSciNet  MATH  Google Scholar 

  35. 35

    J. Bolte, S. Sabach, M. Teboulle, Proximal alternating linearized minimization for nonconvex and non-smooth problems. Math. Program. 146(1-2), 459–494 (2014).

    MathSciNet  MATH  Google Scholar 

  36. 36

    P. L. Combettes, J. -C. Pesquet, Deep neural network structures solving variational inequalities. Set-valued variational anal. (2018).

  37. 37

    S. J. Reddi, S. Kale, S. Kumar, On the convergence of adam and beyond. Proc. ICLR (2018).

  38. 38

    A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, A. Lerer, Automatic differentiation in PyTorch. NIPS Autodiff Workshop (2017).

  39. 39

    G. Klambauer, T. Unterthiner, A. Mayr, S. Hochreiter, Self-normalizing neural networks. Adv. Neural Inf. Process. Syst.30:, 971–980 (2017).

    Google Scholar 

  40. 40

    A. Criminisi, J. Shotton, E. Konukoglu, Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Found. Trends Comput. Graph. Vis.7(2-3), 81–227 (2012).

    MATH  Google Scholar 

Download references


This work was supported by the CNRS-CEFIPRA project under grant NextGenBP PRC2017.

Author information




Ms. Pooja Gupta has introduced the CTL within the fusion framework and performed all the numerical experiments. Ms. Jyoti Maggu originally formulated the transform learning model and the deep version for it. Dr. Angshul Majumdar has helped with the model formulation and the assessment of the experimental part. Dr. Emilie Chouzenoux and Dr. Giovanni Chierchia have contributed in the formulation of the model and the optimization algorithms. All the authors have contributed to the writing and proofreading of the paper. The authors read and approved the final manuscript.

Corresponding author

Correspondence to Pooja Gupta.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gupta, P., Maggu, J., Majumdar, A. et al. DeConFuse: a deep convolutional transform-based unsupervised fusion framework. EURASIP J. Adv. Signal Process. 2020, 26 (2020).

Download citation


  • Information fusion
  • Deep learning
  • Convolution
  • Stock trading
  • Financial forecasting