 Research
 Open access
 Published:
DeConFuse: a deep convolutional transformbased unsupervised fusion framework
EURASIP Journal on Advances in Signal Processing volume 2020, Article number: 26 (2020)
Abstract
This work proposes an unsupervised fusion framework based on deep convolutional transform learning. The great learning ability of convolutional filters for data analysis is well acknowledged. The success of convolutive features owes to the convolutional neural network (CNN). However, CNN cannot perform learning tasks in an unsupervised fashion. In a recent work, we show that such shortcoming can be addressed by adopting a convolutional transform learning (CTL) approach, where convolutional filters are learnt in an unsupervised fashion. The present paper aims at (i) proposing a deep version of CTL, (ii) proposing an unsupervised fusion formulation taking advantage of the proposed deep CTL representation, and (iii) developing a mathematically sounded optimization strategy for performing the learning task. We apply the proposed technique, named DeConFuse, on the problem of stock forecasting and trading. A comparison with stateoftheart methods (based on CNN and long shortterm memory network) shows the superiority of our method for performing a reliable feature extraction.
1 Introduction
In the last decade, convolutional neural network (CNN) has enjoyed tremendous success in different types of data analysis. It was initially applied for images in computer vision tasks. The operations within the CNN were believed to mimic the human visual system. Although such a link between human vision and CNN may be present, it has been observed that deep CNNs are not exact models for human vision [1]. For instance, biologists consider that the human visual system would consist of 6 layers [2, 3] and not 20+ layers used in GoogleNet [4].
Neural network models have also been used for analyzing time series data. Until recently, long shortterm memory (LSTM) networks were the almost exclusively used neural network models for time series analysis as they were supposed to mimic memory and hence were deemed suitable for such tasks. However, LSTM are not able to model very long sequences, and their training is hardware intensive. Owing to these shortcomings, LSTMs are being replaced by CNNs. The reason for the great results of CNN methods for time series analysis (1D data processing in general) is not well understood. One possibility may lie in the universal function approximation capacity of deep neural networks [5, 6] rather than its biological semblance. The research in this area is primarily led by its success rather than its understanding.
An important point to mention is that the performance of CNN is largely driven by the availability of very large labeled datasets. This probably explains their tremendous success in facial recognition tasks. Google’s FaceNet [7] and Facebook’s DeepFace [8] architectures are trained on 400 million facial images, a significant proportion of world’s population. These companies are easily equipped with gigantic labeled facial images data as these are “tagged” by their respective users. In the said problem, deep networks reach almost 100% accuracy, even surpassing human capabilities. However, when it comes to tasks that require expert labeling, such as facial recognition from sketches (requiring forensic expertise) [8] or ischemic attack detection from EEG (requiring medical expertise) [9], the accuracies become modest. Indeed, such tasks require expert labeling that is difficult to acquire, thus limiting the size of available labeled dataset.
The same is believed by a number of machine learning researchers, including Hinton himself, who are wary of supervised learning. In an interview with Axios^{Footnote 1}, Hinton mentioned his “deep suspicion” on backpropagation, the workhorse behind all supervised deep neural networks. He even added that “I don’t think it’s how the brain works,” and “We clearly don’t need all the labeled data.” It seems that Hinton is hinting towards unsupervised learning frameworks. Unsupervised learning technique does not require targets/labels to learn from data. This approach typically takes benefit from the fact that data is inherently very rich in its structure, unlike targets that are sparse in nature. Thus, it does not take into account the task to be performed while learning about the data, saving from the need of human expertise that is required in supervised learning. More on the topic of unsupervised versus supervised learning can be found in a blog by DeepMind^{Footnote 2}.
In this work, we would like to keep the best of both worlds, i.e., the success of convolutive models from CNN and the promises of unsupervised learning formulations. With this goal in mind, we developed convolutional transform learning (CTL) [10]. This is a representation learning technique that learns a set of convolutional filters from the data without label information. Instead of learning the filters (by backpropagating) from data labels, CTL learns them by minimizing a data fidelity loss, thus making the technique unsupervised. CTL has been shown to outperform several supervised and unsupervised learning schemes in the context of image classification. In the present work, we propose to extend the shallow CTL version to deeper layers, with the aim to generate a feature extraction strategy that is well suited for 1D time series analysis. This is the first major contribution of this work—deep convolutional transform learning.
In most applications, time series signals are multivariate, as they arise from multiple sources/sensors. For example, biomedical signals like ECG and EEG come from multiple leads; financial data from stocks are recorded with different inputs (open, close, low, high, and net asset value) and demand forecasting problems in smartgrids come with multiple types of data (power consumption, temperature, humidity, occupancy, etc.). In all such cases, the final goal is to perform prediction/classification task from such multivariate time series. We propose to address such problem as one of feature fusion. The information from each of the sources will be processed by the proposed deep CTL pipeline, and the generated deep features will be finally fused by an unsupervised fully connected layer. This is the second major contribution of this work—an unsupervised fusion framework with deep CTL.
The resulting features can be used for different applicative tasks. In this paper, we will focus on the applicative problem of financial stock analysis. The ultimate goal may be either to forecast the stock price (regression problem) or to decide whether to buy or sell (classification problem). Depending on the considered task, we can pass the generated features into suitable machine learning tool that may not be as data hungry as deep neural networks. Therefore, by adopting such a processing architecture, we expect to yield better results than traditional deep learning especially in cases where access to labeled data is limited.
2 Literature review
2.1 CNN for time series analysis
Let us briefly review and discuss CNNbased methods for time series analysis. For a more detailed review, the interested reader can peruse [11]. We mainly focus on studies on stock forecasting as it will be our use case for experimental validation.
The traditional choice for processing time series with neural network is to adopt a recurrent neural network (RNN) architecture. Variants of RNN like long shortterm memory (LSTM) [12] and gated recurrent unit (GRU) [13] have been proposed. However, due to the complexity of training such networks via backpropagation through time, they have been progressively replaced with 1D CNN [14]. For example, in [15], a generic time series analysis framework was built based on LSTM, with assessed performance on the UCR time series classification datasets https://www.cs.ucr.edu/~eamonn/time_series_data/. The later study from the same group [17], based on 1D CNN, showed considerable improvement over the prior model on the same datasets.
There are also several studies that convert 1D time series data into a matrix form so as to be able to use 2D CNNs [16, 18, 19]. Each column of the matrix corresponds to a subset of the 1D series within a given time window, and the resulting matrix is processed as an image. The 2D CNN model has been especially popular in stock forecasting. In [19], the said techniques have been used on stock prices for forecasting. A slightly different input is used in [20]: instead of using the standard stock variables (open, close, high, low, and NAV), it uses high frequency data for forecasting major points of inflection in the financial market. In another work [21], a similar approach is used for modeling exchange traded fund (ETF). It has been seen that the 2D CNN model performs the same as LSTM or the standard multilayer perceptron [22, 23]. The apparent lack of performance improvement in the aforementioned studies may be due to an incorrect choice of CNN model, since an inherently 1D time series is modeled as an image.
2.2 Deep learning and fusion
We now review existing works for processing multivariate data inputs, within the deep learning framework. Since the present work aims at being applied to stock price forecasting/trading, we will mostly focus our review on the multichannel/multisensor fusion framework. Multimodal data and fusion for image processing, less related to our work, will be mentioned at the end of this subsection for the sake of completeness.
Deep learning has been widely used recently for analyzing multichannel/multisensor signals. In several of such studies, all the sensors are stacked one after the other to form a matrix and 2D CNN is used for analyzing these signals. For example, [24] uses this strategy for analyzing human activity recognition from multiple body sensors. It is important to distinguish such an approach from the aforementioned studies [19–23]. Here, the images are not formed from stacking windowed signals from the same signal one after the other, but by stacking signals from different sensors. The said study [24] does not account for any temporal modeling; this is rectified in [25]. In there, 2D CNN is used on a time series window; but the different windows are finally processed by GRU, thus explicitly incorporating time series modeling. There is however no explicit fusion framework in [24, 25]. The information from raw multivariate signals is simply fused to form matrices and treated by 2D convolutions. A true fusion framework was proposed in [26]. Each signal channel is processed by a deep 1D CNN, and the output from the different signal processing pipelines are then fused by a fully connected layer. Thus, the fusion is happening at the feature level and not in the raw signal level as it was in [24, 25].
Another area that routinely uses deep learning based fusion is multimodal data processing. This area is not as well defined as multichannel data processing; nevertheless, we will briefly discuss some studies on this topic. In [27], a fusion scheme is shown for audiovisual analysis that uses a fusion scheme for deep belief network (DBN) and stacked autoencoder (SAE) for fusing audio and video channels. Each channel is processed separately and connected by a fully connected layer to produce fused features. These fused features are further processed for inference. We can also mention the work on videobased action recognition addressed in [28], which proposes a fusion scheme for incorporating temporal information (processed by CNN) and spatial information (also processed by CNN).
There are several other such works on image analysis [29–31]. In [29], a fusion scheme is proposed for processing color and depth information (via 3D and 2D convolutions, respectively) with the objective of action recognition. In [30], it was shown that by fusing hyperspectral data (high spatial resolution) with Lidar (depth information), better classification results can be achieved. In [31], it was shown that fusing deeply learnt features (from CNN) with handcrafted features via a fully connected layer can improve analysis tasks. In this work, our interest lies in the first problem; that of inference from 1D/timeseries multichannel signals. To the best of our knowledge, all prior deep learningbased studies on this topic are supervised. In keeping with the vision of Hinton and others, our goal is to develop an unsupervised fusion framework using deeply learnt convolutive filters.
2.3 Convolutional transform learning
Convolutional transform learning (CTL) has been introduced in our seminal paper [10]. Since it is a recent work, we present it in detail in the current paper, to make it selfcontent. CTL learns a set of filters (t_{m})_{1≤m≤M} operated on observed samples (s^{(k)})_{1≤k≤K} to generate a set of features \(\left (x_{m}^{(k)}\right)_{1 \leq m \leq M,1 \leq k \leq K}\). Formally, the inherent learning model is expressed through convolution operations defined as
Following the original study on transform learning [32], a sparsity penalty is imposed on the features for improving representation ability and limit overfitting issues. Moreover, in the same line as CNN models, the nonnegativity constraint is imposed on the features. Training then consists of learning the convolutional filters and the representation coefficients from the data. This is expressed as the following optimization problem
where ψ is a suitable penalization function. Note that the regularization term “\(\mu \left \ \cdot \right \_{F}^{2}  \lambda \log \det \)” ensures that the learnt filters are unique, something that is not guaranteed in CNN. Let us introduce the matrix notation
where \(T=\left [\begin {array}{ccc}t_{1} & \dots & t_{M}\end {array}\right ]\), \(S=\left [\begin {array}{ccc}s^{(1)} & \dots & s^{(K)}\end {array}\right ]^{\top }\), and \(X=\left [\begin {array}{ccc} x_{1}^{(k)} & \dots & x_{M}^{(k)} \end {array}\right ]_{1\le k\le K}\). The cost function in problem (2) can be compactly rewritten as^{Footnote 3}
where Ψ applies the penalty term ψ columnwise on X.
A local minimizer to (4) can be reached efficiently using the alternating proximal algorithm [33–35], which alternates between proximal updates on variables T and X. More precisely, set a Hilbert space \((\mathcal {H},\\cdot \)\) and define the proximity operator [23] at \(\tilde x \in \mathcal {H}\) of a proper lowersemicontinuous convex function \(\varphi : \mathcal {H} \to ]  \infty, + \infty ]\) as
Then, the alternating proximal algorithm reads
with initializations T^{[0]}, X^{[0]} and γ_{1},γ_{2} positive constants. For more details on the derivations and the convergence guarantees, the readers can refer to [10].
3 Fusion based on deep convolutional transform learning
In this section, we discuss our proposed formulation. First, we extend the aforementioned CTL formulation to a deeper version. Next, we develop the fusion framework based on transform learning, leading to our DeConFuse^{Footnote 4} strategy.
3.1 Deep convolutional transform learning
Deep CTL consists of stacking multiple convolutional layers on top of each other to generate the features, as shown in Fig. 1. To learn all the variables in an endtoend fashion, deep CTL relies on the key property that the solution \(\widehat {X}\) to the CTL problem, assuming fixed filters T, can be reformulated as the simple application of an elementwise activation function, that is
with ϕ the proximity operator of Ψ [36]. For example, if Ψ is the indicator function of the positive orthant, then ϕ identifies with the famous rectified linear unit (ReLU) activation function. Many other examples are provided in [36]. Consequently, deep features can be computed by stacking many such layers
where X_{0}=S and ϕ_{ℓ} a given activation function for layer ℓ.
Putting all together, deep CTL amounts to
where
This is a direct extension of the onelayer formulation in (4).
3.2 Multichannel fusion framework
We now propose a fusion framework to learn in an unsupervised fashion a suitable representation of multichannel data that can then be utilized for a multitude of tasks. This framework takes the channels of input data samples to separate branches of convolutional layers, leading to multiple sets of channelwise features. These decoupled features are then concatenated and passed to a fully connected layer, which yields a unique set of coupled features. The complete architecture, called DeConFuse, is shown in Fig. 2.
Since we have multichannel data, for each channel c∈{1,…,C}, we learn a different set of convolutional filters \(T^{(c)}_{1},\dots,T^{(c)}_{L}\) and features X^{(c)}. At the same time, we learn the (not convolutional) linear transform \(\widetilde {T}=(\widetilde {T}_{c})_{1\le c\le C}\) to fuse the channelwise features X=(X^{(c)})_{1≤c≤C}, along with the corresponding fused features Z, which constitute the final output of the proposed DeConFuse model, as shown in Fig. 2. This leads to the joint optimization problem
where
where the operator “ flat” transforms X^{(c)} into a matrix where each row contains the features of a sample flattened as a vector.
To summarize, our formulation aims to jointly train the channelwise convolutional filters \(T_{\ell }^{(c)}\) and the fusion coefficients \(\widetilde {T}\) in an endtoend fashion. We explicitly learn the features X and Z subject to nonnegativity constraints so as to avoid trivial solutions and make our approach completely unsupervised. Moreover, the “logdet” regularization on both \(T_{\ell }^{(c)}\) and \(\widetilde {T}\) breaks symmetry and forces diversity in the learnt transforms, whereas the Frobenius regularization ensures that the transform coefficients are bounded.
3.3 Optimization algorithm
As for the solution of problem (11), we remark that all terms of the cost function are differentiable, except the indicator function of the nonnegativity constraint. We can, therefore, find a local minimizer to (11) by employing the projected gradient descent, whose iterations read
with initialization \(T^{[0]}, X^{[0]}, \widetilde {T}^{[0]}, Z^{[0]}\), γ>0, and \(\mathcal {P}_{+} = \max \{\cdot,0\}\). In practice, we make use of accelerated strategies [37] within each step of this algorithm to speed up learning.
There are two notable advantages with the proposed optimization approach. Firstly, we rely on automatic differentiation [38] and stochastic gradient approximations to efficiently solve problem (11). Secondly, we are not limited to ReLU activation in (8), but rather we can use more advanced ones, such as SELU [39]. This is beneficial for the performance, as shown by our numerical results.
3.4 Computational complexity of proposed framework—DeConFuse
Table 1 summarizes the computational complexity of DeconFuse architecture, both for training and test phases. Specifically, it is reported the cost incurred for every input sample at each iteration of gradient descent in the training phase and for the output computation in testing phase. The computational complexity of DeConFuse architecture is comparable to a regular CNN. The only addition is the logdet regularization, which requires to compute the truncated singular value decomposition of \(T_{\ell }^{(c)}\) and \(\widetilde {T}_{c}\). However, as the size of these matrices is determined by the filter size, the number of filters, and the number of output features per sample, the training complexity is not worse than that of a CNN.
4 Experimental evaluation
We carry out experiments on the realworld problem of stock forecasting and trading. The problem of stock forecasting is a regression problem aiming at estimating the price of a stock at a future date (next day for our problem) given inputs till the current date. Stock trading is a classification problem, where the decision whether to buy or sell a stock has to be taken at each time. The two problems are related by the fact that simple logic dictates that if the price of a stock at a later date is expected to increase, the stock must be bought; and if the stock price is expected to go down, the stock must be sold.
We will use the five raw inputs for both the tasks, namely open price, close price, high, low, and net asset value (NAV). One could compute technical indicators based on the raw inputs [19], but in keeping with the essence of true representation learning, we chose to stay with those raw values. Each of the five inputs is processed by a separate 1D processing pipeline. Each of the pipelines produces a flattened output (Fig. 1). The flattened outputs are then concatenated and fed into the transform learning layer acting as the fully connected layer (Fig. 2) for fusion. While our processing pipeline ends here (being unsupervised), the benchmark techniques are supervised and have an output node. The node is binary (buy/sell) for classification and real valued for regression. More precisely, we will compare with two stateoftheart time series analysis models, namely TimeNet [15] and ConvTimeNet [17]. In the former, the processing individual processing pipelines are based on LSTM and in the later they use 1D CNN.
We make use of a real dataset from the National Stock Exchange (NSE) of India. The dataset contains information of 150 symbols between 2014 and 2018; these stocks were chosen after filtering out stocks that had less than 3 years of data. The companies available in the dataset are from various sectors such as IT (e.g., TCS, INFY), automobile (e.g., HEROMOTOCO, TATAMOTORS), bank (e.g., HDFCBANK, ICICIBANK), coal and petroleum (e.g., OIL, ONGC), steel (e.g., JSWSTEEL, TATASTEEL), construction (e.g., ABIRLANUVO, ACC), and public sector units (e.g., POWERGRID, GAIL). The detailed architectures for each tested techniques, namely DeConFuse, ConvTimeNet, and TimeNet, are presented in Table 2. For DeConFuse, TimeNet, and ConvTimeNet, we have tuned the architectures to yield the best performance and have randomly initialized the weights for each stock’s training.
4.1 Stock forecasting—regression
Let us start with the stock forecasting problem. We feed the generated unsupervised features from the proposed architecture into an external regressor, namely ridge regression. Evaluation is carried out in terms of mean absolute error (MAE) between the predicted and actual stock prices for all 150 stocks. The stock forecasting results are shown in Table 5 in Appendix 1 section. The MAE for individual stocks are presented for each of close price, open price, high price, low price, and net asset value.
From Table 5 in Appendix 1 section, it can be seen that the MAE values reached for the proposed DeConFuse solution for the four first prices (open, close, high, low) are exceptionally good for all of the 150 stocks. Regarding NAV prediction, the proposed method performs extremely well for 128 stocks. For the remaining 22 stocks, there are 13 stocks, highlighted in red, for which DeConFuse does not give the lowest MAE but it is still very close to the best results given by the TimeNet approach.
For a concise summary of the results, the average values over all stocks are shown in Table 3.
From the summary Table 3, it can be observed that our error is more than an order of magnitude better than the state of the arts. The plots for one of the regressed prices (close price) for some examples of stocks in Fig. 3 show that the predicted close prices from DeConFuse are closer to the true close prices than benchmark predictions.
4.2 Stock trading—classification
We now focus on the stock trading task. In this case, the generated unsupervised features from DeConFuse are inputs to an external classifier based on random decision forest (RDF) with 5 decision tree classifiers and depth 3. Even though we used this architecture, we found that the results from RDF are robust to changes in architecture. This is a well known phenomenon about RDFs [40]. We evaluate the results in terms of precision, recall, F1 score, and area under the ROC curve (AUC). From the financial viewpoint, we also calculate annualized returns (AR) using the predicted trading signals/labels as well as using true trading signals/labels named as predicted AR and true AR, respectively. The starting capital used for calculating AR values for every stock is Rs. 100,000 and the transaction charges are Rs 10. The stock trading results are shown in Table 6 in Appendix 2 section.
Certain results from Table 6 in Appendix 2 section are highlighted in bold or red. The first set of results, marked in bold, are the ones where one of the techniques for each metric gives the best performance for each stock. The proposed solution DeConFuse gives the best results for 89 stocks for precision score, 85 stocks for recall score, 125 stocks for F1 score, 91 stocks for AUC measure, and 56 stocks in case of the AR metric. The other set marked in red highlights the cases where DeConfuse has not performed the best but performs nearly equal (here, a difference of maximum 0.05 in the metric is considered) to the best performance given by one of the benchmarks, i.e., DeConFuse gives the next best performance. We noticed that there are 24 stocks for which DeConFuse gives the next best precision metric value. Likewise, 18 stocks in case of recall, 22 stocks for F1 score, 26 stocks for AUC values, and 1 stock in case of AR. Overall, DeConfuse reaches a very satisfying performance over the benchmark techniques. This is also corroborated from the summary of trading results in Table 4.
We also display empirical convergence plots for few stocks, namely RELIANCE, ONGC, HINDUNILVR, and ICICIBANK, in Fig. 4. We can see that the training loss decreases to a point of stability for each example.
5 Conclusion
In this work, we propose DeConFuse, a deep fusion endtoend framework for the processing of 1D multichannel data. Unlike other deep learning models, our framework is unsupervised. It is based on a novel deep version of our recently proposed convolutional transform learning model. We have applied the proposed model for stock forecasting/trading leading to very good performance. The framework is generic enough to handle other multichannel fusion problems as well.
The advantage of our framework is its ability to learn in an unsupervised fashion. For example, consider the problem we address. For traditional deep learningbased models, we need to retrain to deep networks for regression and classification. But we can reuse our features for both the tasks, without the requirement of retraining, for specific tasks. This has advantages in other areas as well. For example, one can either do ischemia detection, i.e., detect whether one is having a stroke at the current time instant (from EEG); or one can do ischemia prediction, i.e., forecast if a stroke is going to happen. In standard deep learning, two networks need to be retrained and tuned to tackle these two problems. With our proposed method, there is no need for this double effort.
In the future, we would work on extending the framework for supervised/semisupervised formulations. We believe that the semisupervised formulation will be of immense practical importance. We would also like to extend it to 2D convolutions in order to handle image data.
6 Appendix 1: Detailed stock forecasting results
7 Appendix 2: Detailed stock trading results
Availability of data and materials
The dataset used is a real dataset of the Indian National Stock Exchange (NSE) of past 4 years and is publicly available. We have shared the data with our implementation available at https://github.com/pooja290992/DeConFuse.git.
Notes
Note that T is not necessarily a square matrix. By an abuse of notation, we define the “logdet” of a rectangular matrix as the sum of logarithms of its singular values.
Code available at: https://github.com/pooja290992/DeConFuse.git
Abbreviations
 TL:

Transform learning
 CTL:

Convolutional transform learning
 CNN:

Convolutional neural network
 LSTM:

Long shortterm memory
 GRU:

Gated recurrent unit
 ReLU:

Rectified linear unit
 SELU:

Scaled exponential linear units
 NSE:

National Stock Exchange
 AUC:

Area under curve
 ROC:

Receiver operating characteristics
 NAV:

Net asset value
 RDF:

Random decision forest
 EEG:

Electroencephalogram
 ECG:

Electrocardiogram
 AR:

Annualized returns
 MAE:

Mean absolute error
References
N. Kriegeskorte, Deep neural networks: a new framework for modeling biological vision and brain information processing. Annual Rev. Vis. Sci.1:, 417–446 (2015).
R. W. Guillery, S. M. Sherman, Thalamic relay functions and their role in corticocortical communication: generalizations from the visual system. Neuron. 33(2), 163–175 (2002).
J. Cudeiro, A. M. Sillito, Looking back: corticothalamic feedback and early visual processing. Trends Neurosci.29(6), 298–306 (2006).
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, in Proceedings of the IEEE conference on computer vision and pattern recognition. Going deeper with convolutions, (2015), pp. 1–9. https://doi.org/10.1109/cvpr.2015.7298594.
I. Daubechies, R. DeVore, S. Foucart, B. Hanin, G. Petrova, Nonlinear approximation and (deep) ReLU networks. arXiv preprint arXiv:1905.02199 (2019).
P. Petersen, F. Voigtlaender, Optimal approximation of piecewise smooth functions using deep ReLU neural networks. Neural Netw.108:, 296–330 (2018).
F. Schroff, D. Kalenichenko, J. Philbin, in Proceedings of the IEEE conference on computer vision and pattern recognition. FaceNet: a unified embedding for face recognition and clustering, (2015), pp. 815–823. https://doi.org/10.1109/cvpr.2015.7298682.
Y. Taigman, M. Yang, M. A. Ranzato, L. Wolf, in Proceedings of the IEEE conference on computer vision and pattern recognition. DeepFace: close the gap to humanlevel performance in face verification, (2014), pp. 1701–1708. https://doi.org/10.1109/cvpr.2014.220.
S. Nagpal, M. Singh, R. Singh, M. Vatsa, A. Noore, A. Majumdar, in Proceedings of the IEEE International Conference on Computer Vision. Face sketch matching via coupled deep transform learning, (2017), pp. 5419–5428. https://doi.org/10.1109/iccv.2017.579.
J. Maggu, E. Chouzenoux, G. Chierchia, A. Majumdar, in International Conference on Neural Information Processing. Convolutional transform learning (SpringerCham, 2018), pp. 162–174.
H. I. Fawaz, G. Forestier, J. Weber, L. Idoumghar, P. A. Muller, Deep learning for time series classification: a review. Data Min. Knowl. Discov.33(4), 917–963 (2019).
S. Hochreiter, J. Schmidhuber, Long shortterm memory. Neural Comput.9(8), 1735–1780 (1997).
J. Chung, C. Gulcehre, K. Cho, Y. Bengio, in International Conference on Machine Learning. Gated feedback recurrent neural networks, (2015), pp. 2067–2075.
Z. Wang, W. Yan, T. Oates, in 2017 international joint conference on neural networks (IJCNN). Time series classification from scratch with deep neural networks: a strong baseline (IEEE, 2017), pp. 1578–1585.
P. Malhotra, V. TV, L. Vig, P. Agarwal, G. Shroff, TimeNet: pretrained deep recurrent neural network for time series classification. arXiv preprint arXiv:1706.08838 (2017).
N. Hatami, Y. Gavet, J Debayle, in Tenth International Conference on Machine Vision (ICMV 2017), vol 10696. Classification of timeseries images using deep convolutional neural networks (International Society for Optics and Photonics, 2018), p. 106960Y.
K. Kashiparekh, J. Narwariya, P. Malhotra, L. Vig, G. Shroff, ConvTimeNet: a pretrained deep convolutional neural network for time series classification. arXiv preprint arXiv:1904.12546 (2019).
Z. Wang, T. Oates, in TwentyFourth International Joint Conference on Artificial Intelligence. Imaging timeseries to improve classification and imputation, (2015).
O. B. Sezer, A. M. Ozbayoglu, Algorithmic financial trading with deep convolutional neural networks: time series to image conversion approach. Appl. Soft Comput.70:, 525–538 (2018).
A. Tsantekidis, N. Passalis, A. Tefas, J. Kanniainen, M. Gabbouj, A. Iosifidis, in 2017 IEEE 19th Conference on Business Informatics (CBI) vol. 1. Forecasting stock prices from the limit order book using convolutional neural networks (IEEE, 2017), pp. 7–12.
M. U. Gudelek, S. A. Boluk, A. M. Ozbayoglu, in 2017 IEEE Symposium Series on Computational Intelligence (SSCI). A deep learning based stock trading model with 2D CNN trend detection (IEEE, 2017), pp. 1–8.
S. Ravishankar, Y. Bresler, Sparsifying transform learning with efficient optimal updates and convergence guarantees. IEEE Trans. Sig. Process.63(9), 2389–2404 (2015).
P. L. Combettes, JC. Pesquet, in FixedPoint Algorithms for Inverse Problems in Science and Engineering. Springer Optimization and Its Applications, vol 49, ed. by H. Bauschke, R. Burachik, P. Combettes, V. Elser, D. Luke, and H. Wolkowicz. Proximal splitting methods in signal processing (SpringerNew York, 2011).
J. Yang, M. N. Nguyen, P. P. San, X. L. Li, S. Krishnaswamy, in TwentyFourth International Joint Conference on Artificial Intelligence. Deep convolutional neural networks on multichannel time series for human activity recognition, (2015).
S. Yao, S. Hu, Y. Zhao, A. Zhang, T. Abdelzaher, in Proceedings of the 26th International Conference on World Wide Web. DeepSense: a unified deep learning framework for timeseries mobile sensing data processing, (2017), pp. 351–360.
Y. Zheng, Q. Liu, E. Chen, Y. Ge, J. L. Zhao, in International Conference on WebAge Information Management. Time series classification using multichannels deep convolutional neural networks (SpringerCham, 2014), pp. 298–310.
J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, A. Y. Ng, in Proceedings of the 28th international conference on machine learning (ICML11). Multimodal deep learning, (2011), pp. 689–696.
C. Feichtenhofer, A. Pinz, A. Zisserman, in Proceedings of the IEEE conference on computer vision and pattern recognition. Convolutional twostream network fusion for video action recognition, (2016), pp. 1933–1941.
A. Eitel, J. T. Springenberg, L. Spinello, M. Riedmiller, W. Burgard, in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Multimodal deep learning for robust RGBD object recognition (IEEE, 2015), pp. 681–687.
Y. Chen, C. Li, P. Ghamisi, X. Jia, Y. Gu, Deep fusion of remote sensing data for accurate classification. IEEE Geosci. Remote Sens. Lett.14(8), 1253–1257 (2017).
N. Antropova, B. Q. Huynh, M. L. Giger, A deep feature fusion methodology for breast cancer diagnosis demonstrated on three imaging modality datasets. Med. Phys.44(10), 5162–5171 (2017).
S. Ravishankar, Y. Bresler, Learning sparsifying transforms. IEEE Trans. Sig. Process.61(5), 1072–1086 (2012).
H. Attouch, J. Bolte, B. F. Svaiter, Convergence of descent methods for semialgebraic and tame problems: proximal algorithms, forwardbackward splitting, and regularized GaussSeidel methods. Math. Program.137:, 91–129 (2011).
E. Chouzenoux, J. C. Pesquet, A. Repetti, A block coordinate variable metric forwardbackward algorithm. J. Glob. Optim.66(3), 457–485 (2016).
J. Bolte, S. Sabach, M. Teboulle, Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(12), 459–494 (2014).
P. L. Combettes, J. C. Pesquet, Deep neural network structures solving variational inequalities. Setvalued variational anal. (2018). https://arxiv.org/abs/1808.07526.
S. J. Reddi, S. Kale, S. Kumar, On the convergence of adam and beyond. Proc. ICLR (2018).
A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, A. Lerer, Automatic differentiation in PyTorch. NIPS Autodiff Workshop (2017).
G. Klambauer, T. Unterthiner, A. Mayr, S. Hochreiter, Selfnormalizing neural networks. Adv. Neural Inf. Process. Syst.30:, 971–980 (2017).
A. Criminisi, J. Shotton, E. Konukoglu, Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semisupervised learning. Found. Trends Comput. Graph. Vis.7(23), 81–227 (2012).
Funding
This work was supported by the CNRSCEFIPRA project under grant NextGenBP PRC2017.
Author information
Authors and Affiliations
Contributions
Ms. Pooja Gupta has introduced the CTL within the fusion framework and performed all the numerical experiments. Ms. Jyoti Maggu originally formulated the transform learning model and the deep version for it. Dr. Angshul Majumdar has helped with the model formulation and the assessment of the experimental part. Dr. Emilie Chouzenoux and Dr. Giovanni Chierchia have contributed in the formulation of the model and the optimization algorithms. All the authors have contributed to the writing and proofreading of the paper. The authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Gupta, P., Maggu, J., Majumdar, A. et al. DeConFuse: a deep convolutional transformbased unsupervised fusion framework. EURASIP J. Adv. Signal Process. 2020, 26 (2020). https://doi.org/10.1186/s13634020006845
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13634020006845