A survey of machine learning for big data processing
 Junfei Qiu^{1},
 Qihui Wu^{1},
 Guoru Ding^{1}Email author,
 Yuhua Xu^{1} and
 Shuo Feng^{1}
https://doi.org/10.1186/s136340160355x
© Qiu et al. 2016
Received: 31 August 2015
Accepted: 22 April 2016
Published: 28 May 2016
The Erratum to this article has been published in EURASIP Journal on Advances in Signal Processing 2016 2016:85
Abstract
There is no doubt that big data are now rapidly expanding in all science and engineering domains. While the potential of these massive data is undoubtedly significant, fully making sense of them requires new ways of thinking and novel learning techniques to address the various challenges. In this paper, we present a literature survey of the latest advances in researches on machine learning for big data processing. First, we review the machine learning techniques and highlight some promising learning methods in recent studies, such as representation learning, deep learning, distributed and parallel learning, transfer learning, active learning, and kernelbased learning. Next, we focus on the analysis and discussions about the challenges and possible solutions of machine learning for big data. Following that, we investigate the close connections of machine learning with signal processing techniques for big data processing. Finally, we outline several open issues and research trends.
Keywords
1 Review
1.1 Introduction
It is obvious that we are living in a data deluge era, evidenced by the phenomenon that enormous amount of data have been being continually generated at unprecedented and ever increasing scales. Largescale data sets are collected and studied in numerous domains, from engineering sciences to social networks, commerce, biomolecular research, and security [1]. Particularly, digital data, generated from a variety of digital devices, are growing at astonishing rates. According to [2], in 2011, digital information has grown nine times in volume in just 5 years and its amount in the world will reach 35 trillion gigabytes by 2020 [3]. Therefore, the term “Big Data” was coined to capture the profound meaning of this data explosion trend.
To clarify what the big data refers to, several good surveys have been presented recently and each of them views the big data from different perspectives, including challenges and opportunities [4], background and research status [5], and analytics platforms [6]. Among these surveys, a comprehensive overview of the big data from three different angles, i.e., innovation, competition, and productivity, was presented by the McKinsey Global Institute (MGI) [7]. Besides describing the fundamental techniques and technologies of big data, a number of more recent studies have investigated big data under particular context. For example, [8, 9] gave a brief review of the features of big data from Internet of Things (IoT). Some authors also analyzed the new characteristics of big data in wireless networks, e.g., in terms of 5G [10]. In [11, 12], the authors proposed various big data processing models and algorithms from the data mining perspective.
Over the past decade, machine learning techniques have been widely adopted in a number of massive and complex dataintensive fields such as medicine, astronomy, biology, and so on, for these techniques provide possible solutions to mine the information hidden in the data. Nevertheless, as the time for big data is coming, the collection of data sets is so large and complex that it is difficult to deal with using traditional learning methods since the established process of learning from conventional datasets was not designed to and will not work well with high volumes of data. For instance, most traditional machine learning algorithms are designed for data that would be completely loaded into memory [13], which does not hold any more in the context of big data. Therefore, although learning from these numerous data is expected to bring significant science and engineering advances along with improvements in quality of our life [14], it brings tremendous challenges at the same time.

We first give a brief review of the traditional machine learning techniques, followed by several advanced learning methods in recent researches that are either promising or much needed for solving the big data problems.

We then present a systematic analysis of the challenges and possible solutions for learning with big data, which are in terms of the five big data characteristics such as volume, variety, velocity, veracity, and value.

We next discuss the great ties of machine learning with SP techniques for the big data processing.

We finally provide several open issues and research trends.
1.2 Brief review of machine learning techniques
In this section, we first present some essential concepts and classification of machine learning and then highlight a list of advanced learning techniques.
1.2.1 Definition and classification of machine learning
Machine leaning is a field of research that formally focuses on the theory, performance, and properties of learning systems and algorithms. It is a highly interdisciplinary field building upon ideas from many different kinds of fields such as artificial intelligence, optimization theory, information theory, statistics, cognitive science, optimal control, and many other disciplines of science, engineering, and mathematics [15–18]. Because of its implementation in a wide range of applications, machine learning has covered almost every scientific domain, which has brought great impact on the science and society [19]. It has been used on a variety of problems, including recommendation engines, recognition systems, informatics and data mining, and autonomous control systems [20].
Comparison of machine learning technologies
Learning types  Data processing tasks  Distinction norm  Learning algorithms  Representative references 

Supervised learning  Classification/Regression/Estimation  Computational classifiers  Support vector machine  [120] 
Statistical classifiers  Naïve Bayes  [15]  
Hidden Markov model  [121]  
Bayesian networks  [122]  
Connectionist classifiers  Neural networks  [123]  
Unsupervised learning  Clustering/Prediction  Parametric  Kmeans  [124] 
Gaussian mixture model  [125]  
Nonparametric  Dirichlet process mixture model  [125]  
Xmeans  [124]  
Reinforcement learning  Decisionmaking  Modelfree  Qlearning  [126] 
Rlearning  [127]  
Modelbased  TD learning  [128]  
Sarsa learning  [129] 
1.2.2 Advanced learning methods
 1.
Representation Learning: Datasets with highdimensional features have become increasingly common nowadays, which challenge the current learning algorithms to extract and organize the discriminative information from the data. Fortunately, representation learning [25, 26], a promising solution to learn the meaningful and useful representations of the data that make it easier to extract useful information when building classifiers or other predictors, has been presented and achieved impressive performance on many dimensionality reduction tasks [27]. Representation learning aims to achieve that a reasonably sized learned representation can capture a huge number of possible input configurations, which can greatly facilitate improvements in both computational efficiency and statistical efficiency [25].
There are mainly three subtopics on representation learning: feature selection, feature extraction, and distance metric learning [27]. In order to give impetus to the multidomain learning ability of representation learning, automatic representation learning [28], biased representation learning [26], crossdomain representation learning [27], and some other related techniques [29] have been proposed in recent years. The rapid increase in the scientific activity on representation learning has been accompanied and nourished by a remarkable string of empirical successes in realworld applications, such as speech recognition, natural language processing, and intelligent vehicle systems [30–32].
 2.
Deep learning: Nowadays, there is no doubt that deep learning is one of the hottest research trends in machine learning field. In contrast to most traditional learning techniques, which are considered using shallowstructured learning architectures, deep learning mainly uses supervised and/or unsupervised strategies in deep architectures to automatically learn hierarchical representations [33]. Deep architectures can often capture more complicated, hierarchically launched statistical patterns of inputs for achieving to be adaptive to new areas than traditional learning methods and often outperform state of the art achieved by handmade features [34]. Deep belief networks (DBNs) [33, 35] and convolutional neural networks (CNNs) [36, 37] are two mainstream deep learning approaches and research directions proposed over the past decade, which have been well established in the deep learning field and shown great promise for future work [13].
Due to the stateoftheart performance of deep learning, it has attracted much attention from the academic community in recent years such as speech recognition, computer vision, language processing, and information retrieval [33, 38–40]. As the data keeps getting bigger, deep learning is coming to play a pivotal role in providing predictive analytics solutions for largescale data sets, particularly with the increased processing power and the advances in graphics processors [13]. For example, IBM’s brainlike computer [22] and Microsoft’s realtime language translation in Bing voice search [41] have used techniques like deep learning to leverage big data for competitive advantage.
 3.
Distributed and parallel learning: There is often exciting information hidden in the unprecedented volumes of data. Learning from these massive data is expected to bring significant science and engineering advances which can facilitate the development of more intelligent systems. However, a bottleneck preventing such a big blessing is the inability of learning algorithms to use all the data to learn within a reasonable time. In this context, distributed learning seems to be a promising research since allocating the learning process among several workstations is a natural way of scaling up learning algorithms [42]. Different from the classical learning framework, in which one requires the collection of that data in a database for central processing, in the framework of distributed learning, the learning is carried out in a distributed manner [43].
In the past years, several popular distributed machine learning algorithms have been proposed, including decision rules [44], stacked generalization [45], metalearning [46], and distributed boosting [47]. With the advantage of distributed computing for managing big volumes of data, distributed learning avoids the necessity of gathering data into a single workstation for central processing, saving time and energy. It is expected that more widespread applications of the distributed learning are on the way [42]. Similar to distributed learning, another popular learning technique for scaling up traditional learning algorithms is parallel machine learning [48]. With the power of multicore processors and cloud computing platforms, parallel and distributed computing systems have recently become widely accessible [42]. A more detailed description about distributed and parallel learning can be found in [49].
 4.
Transfer learning: A major assumption in many traditional machine learning algorithms is that the training and test data are drawn from the same feature space and have the same distribution. However, with the data explosion from variety of sources, great heterogeneity of the collected data destroys the hypothesis. To tackle this issue, transfer learning has been proposed to allow the domains, tasks, and distributions to be different, which can extract knowledge from one or more source tasks and apply the knowledge to a target task [50, 51]. The advantage of transfer learning is that it can intelligently apply knowledge learned previously to solve new problems faster.
Based on different situations between the source and target domains and tasks, transfer learning is categorized into three subsettings: inductive transfer learning, transductive transfer learning, and unsupervised transfer learning [51]. In terms of inductive transfer learning, the source and target tasks are different, no matter when the source and target domains are the same or not. Transductive transfer learning, in contrast, the target domain is different from the source domain, while the source and target tasks are the same. Finally, in the unsupervised transfer learning setting, the target task is different from but related to the source task. Furthermore, approaches to transfer learning in the above three different settings can be classified into four contexts based on “What to transfer,” such as the instance transfer approach, the feature representation transfer approach, the parameter transfer approach, and the relational knowledge transfer approach [51–54]. Recently, transfer learning techniques have been applied successfully in many realworld data processing applications, such as crossdomain text classification, constructing informative priors, and largescale document classification [55–57].
 5.
Active learning: In many realworld applications, we have to face such a situation: data may be abundant but labels are scarce or expensive to obtain. Frequently, learning from massive amounts of unlabeled data is difficult and timeconsuming. Active learning attempts to address this issue by selecting a subset of most critical instances for labeling [58]. In this way, the active learner aims to achieve high accuracy using as few labeled instances as possible, thereby minimizing the cost of obtaining labeled data [59]. It can obtain satisfactory classification performance with fewer labeled samples via query strategies than those of conventional passive learning [60].
There are three main active learning scenarios, comprising membership query synthesis, streambased selective sampling and poolbased sampling [59]. Popular active learning approaches can be found in [61]. They have been studied extensively in the field of machine learning and applied to many data processing problems such as image classification and biological DNA identification [61, 62].
 6.Kernelbased learning: Over the last decade, kernelbased learning has established itself as a very powerful technique to increase the computational capability based on a breakthrough in the design of efficient nonlinear learning algorithms [63]. The outstanding advantage of kernel methods is their elegant property of implicitly mapping samples from the original space into a potentially infinitedimensional feature space, in which inner products can be calculated directly via a kernel function [64]. For example, in kernelbased learning theory, data x in the input space \( \mathcal{X} \) is projected onto a potentially much higher dimensional feature space \( \mathcal{F} \) via a nonlinear mapping Φ as follows:$$ \varPhi :\kern0.3em \mathcal{X}\to \mathrm{\mathcal{F}},\kern0.3em \mathrm{x}\mapsto \varPhi \left(\mathrm{x}\right) $$(1)In this context, for a given learning problem, one now works with the mapped data Φ(x) ∈ ℱ instead of \( \mathrm{x}\in \mathcal{X} \) [63]. The data in the input space can be projected onto different feature spaces with different mappings. The diversity of feature spaces gives us more choices to gain better performance, while in practice, the choice itself of a proper mapping for any given realworld problem may generally be nontrivial. Fortunately, the kernel trick provides an elegant mathematical means to construct powerful nonlinear variants of most wellknown statistical linear techniques, without knowing the mapping explicitly. Indeed, one only needs to replace the inner product operator of a linear technique with an appropriate kernel function k (i.e., a positive semidefinite symmetric function), which arises as a similarity measure that can be thought as an inner product between pairs of data in the feature space. Here, the original nonlinear problem can be transformed into a linear formulation in a higher dimensional space ℱ with an appropriate kernel k [65]:$$ k\left(\mathrm{x},\kern0.2em {\mathrm{x}}^{\prime}\right)={\left\langle \Phi \left(\mathrm{x}\right),\kern0.2em \Phi \left({\mathrm{x}}^{\prime}\right)\kern0.2em \right\rangle}_{\mathcal{F}},\forall \mathrm{x},\kern0.2em {\mathrm{x}}^{\prime}\in \mathcal{X} $$(2)
The most widely used kernel functions include Gaussian kernels and Polynomial kernels. These kernels implicitly map the data onto highdimensional spaces, even infinitedimensional spaces [63]. Kernel functions provide the nonlinear means to infuse correlation or side information in big data, which can obtain significant performance improvement over their linear counterparts at the price of generally higher computational complexity. Moreover, for a specific problem, the selection of the best kernel function is still an open issue, although ample experimental evidence in the literature supports that the popular kernel functions such as Gaussian kernels and polynomial kernels perform well in most cases.
At the root of the success of kernelbased learning, the combination of high expressive power with the possibility to perform the numerous analyses has been developed in many challenging applications [65], e.g., online classification [66], convexly constrained parameter/function estimation [67], beamforming problems [68], and adaptive multiregression [69]. One of the most popular surveys about introducing kernelbased learning algorithms is [70], in which an introduction of the exciting field of kernelbased learning methods and applications was given.
1.3 The critical issues of machine learning for big data
1.3.1 Critical issue one: learning for large scale of data
Critical issue
It is obvious that data volume is the primary attribute of big data, which presents a great challenge for machine learning. Taking only the digital data as an instance, every day, Google alone needs to process about 24 petabytes (petabyte = 2^{10} × 2^{10} × 2^{10} × 2^{10} × 2^{10} bytes) of data [71]. Moreover, if we further take into consideration other data sources, the data scale will become much bigger. Under current development trends, data stored and analyzed by big organizations will undoubtedly reach the petabyte to exabyte (exa byte = 2^{10}petabytes) magnitude soon [6].
Possible remedies
There is no doubt that we are now swimming in an expanding sea of data that is too voluminous to train a machine learning algorithm with a central processor and storage. Instead, distributed frameworks with parallel computing are preferred. Alternating direction method of multipliers (ADMM) [72, 73] serving as a promising computing framework to develop distributed, scalable, online convex optimization algorithms is well suited to accomplish parallel and distributed largescale data processing. The key merits of ADMM is its ability to split or decouple multiple variables in optimization problems, which enables one to find a solution to a largescale global optimization problem by coordinating solutions to smaller subproblems. Generally, ADMM is convergent for convex optimization, but it is lack of convergence and theoretical performance guarantees for nonconvex optimization. However, vast experimental evidence in the literature supports empirical convergence and good performance of ADMM [74]. A wide variety of applications of ADMM to machine learning problems for largescale datasets have been discussed in [74].
In addition to distributed theoretical framework for machine learning to mitigate the challenges related to high volumes, some practicable parallel programming methods are also proposed and applied to learning algorithms to deal with largescale data sets. MapReduce [75, 76], a powerful programming framework, enables the automatic paralleling and distribution of computation applications on large clusters of commodity machines. What is more, MapReduce can also provide great fault tolerance ability, which is important for tackling the large data sets. The core idea of MapReduce is to divide massive data into small chunks firstly, then, deal with these chunks in parallel and in a distributed manner to generate intermediate results. By aggregating all the intermediate results, the final result is derived. A general means of programming machine learning algorithms on multicore with the advantage of MapReduce has been investigated in [77]. Cloudcomputingassisted learning method is another impressive progress which has been made for data systems to deal with the volume challenge of big data. Cloud computing [78, 79] has already demonstrated admirable elasticity that bears the hope of realizing the needed scalability for machine learning algorithms. It can enhance computing and storage capacity through cloud infrastructure. In this context, distributed GraphLab, a framework for machine learning in the cloud, has been proposed in [80].
1.3.2 Critical issue two: learning for different types of data
Critical issue
The enormous variety of data is the second dimension that makes big data both interesting and challenging. This is resulted from the phenomenon that data generally come from various sources and are of different types. Structured, semistructured, and even entirely unstructured data sources stimulate the generation of heterogeneous, highdimensional, and nonlinear data with different representation forms. Learning with such a dataset, the great challenge is perceivable and the degree of complexity is not even imaginable before we deeply get there.
Possible remedies
In terms of heterogeneous data, data integration [81, 82], which aims to combine data residing at different sources and provide the user with a unified view of these data, is a key method. An effect solution to address the data integration problem is to learn good data representations from each individual data source and then to integrate the learned features at different levels [13]. Thus, representation learning is preferred in this issue. In [83], the authors proposed a data fusion theory based on statistical learning for the twodimensional spectrum heterogeneous data. In addition, deep learning methods have also been shown to be very effective in integrating data from different sources. For example, Srivastava and Salakhutdinov [84] developed a novel application of deep learning algorithms to learn a unified representation by integrating realvalued dense image data and text data.
where ε is a noise related parameter, ‖ ⋅ ‖_{*} and ‖ ⋅ ‖_{ F } is defined by the nuclear norm and the Frobenious norm of a matrix, respectively. The problem formulated in (3) shows the fundamental task of the research on matrix recovery for highdimensional data processing, which can be efficiently solved by some existing algorithms including augmented Lagrange multipliers (ALM) algorithm and accelerated proximal gradient (APG) algorithm [90]. As for nonlinear properties of data related to high variety, kernelbased learning methods can provide commendable solutions which have been discussed in Section 1.2.2; thus, the repetitious details will not be given here. Of course, in terms of challenges brought by different types, transfer learning is also a very good choice owning to its powerful knowledge transfer ability which enables multidomain learning to be possible.
1.3.3 Critical issue three: learning for high speed of streaming data
Critical issue
For big data, speed or velocity really matters, which is another emerging challenge for learning. In many realworld applications, we have to finish a task within a certain period of time; otherwise, the processing results become less valuable or even worthless, such as earthquake prediction, stock market prediction and agentbased autonomous exchange (buying/selling) systems, and so on. In these timesensitive cases, the potential value of data depends on data freshness that needs to be processed in a realtime manner.
Possible remedies
One promising solution for learning from such high speed of data is online learning approaches. Online learning [91–94] is a wellestablished learning paradigm whose strategy is learning one instance at a time, instead of in an offline or batch learning fashion, which needs to collect the full information of training data. This sequential learning mechanism works well for big data as current machines cannot hold the entire dataset in memory. To speed up learning, recently, a novel learning algorithm for single hiddenlayer feed forward neural networks (SLFNs) named extreme learning machine (ELM) [95] was proposed. Compared with some other traditional learning algorithms, ELM provides extremely faster learning speed, better generalization performance, and with least human intervention [96]. Thus, ELM has strong advantages in dealing with high velocity of data.
Another challenging issue associated with the high velocity is that data are often nonstationary [13], i.e., data distribution is changing over time, which needs the learning algorithms to learn the data as a stream. To tackle this problem, the potential superiority of streaming processing theory and technology [97] have been found out compared with batchprocessing paradigm, as they aim to analyze data as soon as possible to derive its results. Representative streaming processing systems include Borealis [98], S4 [99], Kafka [100], and many other recent architectures proposed to provide realtime analytics over big data [101, 102]. A scalable machine learning online service with the power of streaming processing for big data realtime analysis is introduced in [103]. In addition, the professor G. B. Giannakis have paid more attention to the realtime processing of streaming data by using machine learning techniques in recent studies; more details can be referred to in [87, 104].
1.3.4 Critical issue four: learning for uncertain and incomplete data
Critical issue
In the past, machine learning algorithms were typically fed with relatively accurate data from wellknown and quite limited sources, so the learning results tend to be unerring, too; thus, veracity has never been a serious issue for concern. However, with the sheer size of data available today, the precision and trust of the source data quickly become an issue, due to the data sources are often of many different origins and data quality is not all verifiable. Therefore, we include veracity as the fourth critical issue for learning with big data to emphasize the importance of addressing and managing the uncertainty and incompleteness on data quality.
Possible remedies
Uncertain data are a special type of data reality where data readings and collections are no longer deterministic but are subject to some random or probability distributions. In many applications, data uncertainty is common. For example, in wireless networks, some spectrum data are inherently uncertain resulted from ubiquitous noise, fading, and shadowing and the technology barrier of the GPS sensor equipment also limits the accuracy of the data to certain levels. For uncertain data, the major challenge is that the data feature or attribute is captured not by a single point value but represented as sample distributions [11]. A simple way to handle data uncertainty is to apply summary statistics such as means and variances to abstract sample distributions. Another approach is to utilize the complete information carried by the probability distributions to construct a decision tree, which is called distributionbased approach in [105]. In [105], the authors firstly discussed the sources of data uncertainty and gave some examples and then devised an algorithm for building decision trees from uncertain data using the distributionbased approach. At last, a theoretical foundation was established on which pruning techniques were derived which can significantly improve the computational efficiency of the distributionbased algorithms for uncertain data.
To efficiently solve the problem (5), existing algorithms have been explained in [90] in detail. Furthermore, in terms of the abnormal data, the authors in [107] also investigated to use the statistical learning theory of sparse matrix with data cleansing for the robust spectrum sensing.
1.3.5 Critical issue five: learning for data with low value density and meaning diversity
Critical issue
In fact, by exploiting a variety of learning methods to analyze big datasets, the final purpose is to extract valuable information from massive amounts of data in the form of deep insight or commercial benefits. Therefore, value is also characterized as a salient feature of big data [2, 6]. However, to derive significant value from high volumes of data with a low value density is not straightforward. For example, the police often need to look through some surveillance videos to handle criminal cases. Unfortunately, a few valuable data frames are frequently hidden in a large amount of video sources.
Possible remedies
To handle this challenge, knowledge discovery in databases (KDD) and data mining technologies [9, 11, 108] come into play, for these technologies provide possible solutions to find out the required information hidden in the massive data. In [9], the authors reviewed studies on applying data mining and KDD technologies to the IoT. Particularly, utilizing clustering, classification, and frequent patterns technologies to mine value from massive data in IoT, from the perspective of infrastructures and from the perspective of services were discussed in detail. In [11], Wu et al. characterized the features of the big data revolution and proposed big data processing methods with machine learning and data mining algorithms.
Another challenging problem associated with the value of big data is the diversity of data meaning, i.e., the economic value of different data varies significantly, even the same data have different value if considering from different perspectives or contexts. Therefore, some new cognitionassisted learning technologies should be developed to make current learning systems more flexible and intelligent. The most dramatic example of such devices is IBM’s “Watson” [109], constructed with several subsystems that use different machine learning strategies with the great power of cognitive technologies to analyze the questions and arrive at the most likely answer. With the scientists’ ingenuity, it is possible for this system to excel at a game which requires both encyclopedic knowledge and lightningquick recall. Some humanlike characteristics—learning, adapting, interacting, and understanding enable Watson to be smarter and gain more computing power to deal with complexity and big data. It is expected that the era of cognitive computing will come [109].
1.3.6 Discussions
In fact, for big data processing, most machine learning techniques are not universal, that is to say, we often need to use specific learning methods according to different data. For example, in terms of highdimensional datasets, representation learning seems to be a promising solution, which can learn the meaningful representations of the data that make it easier to extract useful information for achieving impressive performance on many dimensionality reduction tasks. While for large volumes of data, distributed and parallel learning methods have stronger advantages. If the data needed to be processed are drawn from different feature spaces and have different distributions, transfer learning will be a good choice which can intelligently apply knowledge learned previously to solve new problems faster. Frequently, in the context of big data, we have to face such a situation: data may be abundant but labels are scarce or expensive to obtain. To tackle this issue, active learning can achieve high accuracy using as few labeled instances as possible. In addition, nonlinear data processing is also another thorny problem, at this moment, kernelbased learning will be here with its powerful computational capability. Of course, if we want to deal with some data in a timely or (nearly) realtime manner, online learning and extreme learning machine can give us more help.
Therefore, such a context is needed to be clear, in other words, what are the data tasks, data analysis or decision making?; what are the data types, video data or text data?; what are the data characteristics, high volume or high velocity?; and so on. In terms of different data tasks, types, and characteristics, the required learning techniques are different, even a machine learning methods base is needed for big data processing. The learning systems can fast refer to the algorithm base to handle data. What is more, in order to improve the effectiveness of data processing, the combination of machine learning with some other techniques have been proposed in recent years. For example, in [80], the authors presented a cloudassisted learning framework to enhance store and computing abilities. A general means of programming machine learning algorithms on multicore with the advantage of MapReduce were investigated to enable the parallel and distributed processing to be possible [77]. IBM’s brainlike computer, Watson, applied cognition techniques to machine learning field to make learning systems more intelligent [109]. Such enabling technologies have brought great benefits for machine learning, especially for large data processing, which are more worthy of study.
1.4 Connection of machine learning with SP techniques for big data
There is no doubt that SP is of uttermost relevance to timely big data applications such as realtime medical imaging, sentiment analysis from online social media, smart cities, and so on [110]. The interest in bigdatarelated research from the SP community is evident from the increasing number of papers submitted on this topic to SPoriented journals, workshops, and conferences. In this section, we mainly discuss the close connections of machine learning with SP techniques for big data processing. Specifically, in Section 1.4.1, we analyze the existing studies on SP for big data from four different perspectives. Several representative literatures are presented. In Section 1.4.2, we provide a review of the latest research progress which is based on these typical works.
1.4.1 An overview of representative work

Statistical learning for big data analysis: There is no doubt this is an era of data deluge where learning from these large volumes of data by central processors and storage units seems infeasible. Therefore, the SP and statistical learning tools have to be reexamined. It is preferable to perform learning in real time for the advent of streaming data sources, typically without a chance to revisit past entries. In [14], the authors mainly focused on the modeling and optimization for big data analysis by using statistical learning tools. We can conclude from [14] that, from the SP and learning perspective, big data themes in terms of tasks, challenges, models, and optimization can be revealed as follows. SPrelevant big data tasks mainly comprise massive scale, outliers and missing values, realtime constraints, and cloud storage. There are great big data challenges we have to face, such as prediction and forecasting, cleansing and imputation, dimensionality reduction, regression, classification, and clustering. In terms of these tasks and challenges, outstanding models and optimization with the SP and learning techniques for big data include parallel and decentralized, time or data adaptive, robust, succinct, and sparse technologies.

Convex optimization for big data analytics: While the importance of convex formulations and optimization has increased dramatically in the last decade and these formulations have been employed in a wide variety of signal processing applications, due to the data size of optimization problems that are too large to process locally in the context of big data, thus convex optimization needs reinvent itself. Cevher et al. [111] reviewed recent advances in convex optimization algorithms tailored for big data, having as ultimate goal to markedly reduce the computational, storage, and communication bottlenecks. For example, given a big data optimization problem formulated as$$ {F}^{*}=\underset{x}{ \min}\left\{\operatorname{F}(x)=\operatorname{f}(x)+\operatorname{g}(x);\kern0.1em x\in {\mathrm{\mathbb{R}}}^p\right\} $$(6)
where f and g are convex functions. To obtain an optimal solution x ^{*} of (6) and the required assumptions on f and g, in this article, the authors presented three efficient big data approximation techniques, including firstorder methods, randomization and parallel and distributed computation. They mainly referred to the scalable, randomized, and parallel algorithms for big data analytics. In addition, for the optimization problem in (6), ADMM can provide a simple distributed algorithm to solve its composite form, by leveraging powerful augmented Lagrangian and dual decomposition techniques. Although there are two caveats for ADMM, i.e., one is that closedform solutions do not always exist and the other is that no convergence guarantees for more than two optimization objective terms, there are several recent solutions to address the two drawbacks, such as proximal gradient methods and parallel computing [111]. Specifically, from machine learning perspective, those bright techniques like scalable, parallel, and distributed mechanisms are also necessitated, and some applications of employing the recent convex optimization algorithms in learning methods such as support vector machines and graph learning have been appeared in recent years.

Stochastic approximation for big data analytics: Although many of online learning approaches were developed within the machinelearning discipline, they had strong connections with workhorse SP techniques. Reference [110] is a lecture note which presented recent advances in online learning for big data analytics, where the authors highlighted the relations and differences between online learning methods and some prominent statistical SP tools such as stochastic approximation (SA) and stochastic gradient (SG) algorithms. Through perusing [110], we can know that, on the one hand, the seminal works on SA, such as by Robbins–Monro and Widrow algorithms, and the workhorse behind several classical SP tools, such as LMS and RLS algorithms, carried rich potential in modern learning tasks for big data analytics. On the other hand, it was also demonstrated that online learning schemes together with random sampling or data sketching methods were expected to play instrumental roles in solving largescale optimization tasks. In summary, the recent advances in online learning methods and several SP techniques mentioned in this lecture note have the unique and complementary strengths with each other.

Outlying sequence detection for big data: As the data scale grows, so does the chance to involve outlying observations, which in turn motivates the demand for outlierresilient learning algorithms scaling to largescale application settings. In this context, datadriven outlying sequence detection algorithms have been proposed by some researchers. In [112], the authors investigated the robust sequential detection schemes for big data. In contrast to the aforementioned three articles [14, 110, 111] that mostly focus on big data analysis, this article paid more attention to the decision mechanisms. Outlier detection has immediate application in a broad range of contexts, particularly, for machine learning techniques, effective decision on the observations with categorizing them as normal or outlying are important for the improvement of learning performance. As mentioned in [112], the class of supervised outlier detection had been studied extensively under neural networks, naïve Bayes, and support vector machines.
1.4.2 The latest research progress

The latest progress based on [14]: Based on the statistical learning tools for big data analysis proposed by Slavakis et al. in [14], a lot of new study work has emerged. For example, in [113], two distributed learning algorithms for training random vector functionallink (RVFL) networks through interconnected nodes were presented, where training data were distributed under a decentralized information structure. To tackle the hugescale convex and nonconvex big data optimization problems, a novel parallel, hybrid random/deterministic decomposition scheme with the power of dictionary learning was investigated in [114]. In [87], the authors developed a lowcomplexity, realtime online algorithm for decomposing lowrank tensors with missing entries to deal with the incomplete streaming data, and the performance of the proposed subspace learning was also validated. All these new work presents the application of machine learning and SP technologies in processing big data well.

The latest progress based on [111]: A broad class of machine learning and SP problems can be formally stated as optimization problem. Based on the idea of convex optimization for big data analytics in [111], a randomized primaldual algorithm was proposed in [115] for composite optimization, which could be used in the framework of largescale machine learning applications. In addition, a consensusbased decentralized algorithm for a class of nonconvex optimization problems was investigated in [116], with the application to dictionary learning.

The latest progress based on [110]: Several classical SP tools such as the stochastic approximation methods, have carried rich potential for solving largescale learning tasks under low computational expense. The SP and online learning techniques for big data analytics described in [110] provides a good research direction for future work. Based on this, in [117], the authors developed online algorithms for largescale regressions with application to streaming big data. In addition, Slavakis and Giannakis further used accelerated stochastic approximation method with online and modular learning algorithms to deal with a large class of nonconvex data models [118].

The latest progress based on [112]: The outlying sequence detection approach proposed in [112] provides a desirable solution to some big data application problems. In [119], the authors mainly investigated the big data analytics over the communication system with discussions about statistical analysis and machine learning techniques. The authors pointed out that one of the critically associated challenges ahead was how to detect outliers in the context of big data. It so happened that the theoretic methodology described in [112] gave the answers.
To sum up, it can be seen from the above presented articles in Section 1.4.1 and Section 1.4.2 that the connection of machine learning with modern SP techniques is very strong. SP techniques are originally developed to analyze and handle discrete and continuous signals through using a set of methods from electrical engineering and applied mathematics. In contrast, machine learning research mainly focuses on the design and development of algorithms which allow computers to evolve behavior based on empirical data, whose major concern is to recognize complex patterns and make intelligent decisions based on data by automatically learning. Both the machine learning and SP techniques have the unique and complementary strengths for big data processing. Furthermore, combining SP and machine learning techniques to explore the emerging field of big data are expected to have a bright future. Quoting a sentence from [110], “Consequently, ample opportunities arise for the SP community to contribute in this growing and inherently crossdisciplinary field, spanning multiple areas across science and engineering”.
1.5 Research trends and open issues
 1.
Data meaning perspective: Due to the fact that, nowadays, most data are dispersed to different regions, systems, or applications, the “meaning” of the collected data from various sources may not be exactly the same, which may significantly impact the quality of the machine learning results. Although the previous mentioned techniques such as transfer learning with the power of knowledge transfer and the cognitionassisted learning methods provide some possible solutions to this problem, it is obvious that they are absolutely not catholicons owing to the limitations of these techniques for achieving contextaware. Ontology, semantic web, and other related technologies seem to be preferred on this issue. Based on ontology modeling and semantic derivation, some valuable patterns or rules can be discovered as knowledge as well, which is a necessity for learning systems to be, or appear to be intelligent. But the problem that arises now is, although the ontology and semantic web technologies can benefit the big data analysis, these two technologies are not mature enough, thus how to employ them in machine learning methods to process big data will be a meaningful research.
 2.
Pattern training perspective: In general, for most machine learning techniques, the more the training patterns are, the higher the accuracy rate of learning results is. However, a dilemma we have to face is that, on the one hand, the labeled patterns play a pivotal role for the learning algorithms; but on the other hand, labeling patterns is often expensive in terms of the computation time or cost, particularly for the largescale streaming data, which is intractable. How many patterns are needed to train the classifier depends to a large extent on the desire to achieve a balance between cost and accuracy. Therefore, the socalled overfitting is another critical open issue.
 3.
Technique integration perspective: Once mentioning big data processing, we always like to put data mining, KDD, SP, cloud computing, and machine learning techniques together, partially because these issues and their products may play principal roles for extracting valuable information from massive data, and partially because they have strong ties with each other. It is important to note that each approach has its own merits and faults. That is to say, to get more values out of the big data, a composite model is more needed. As a result, how to integrate several related techniques with machine learning will also become a further research trend.
 4.
Privacy and security perspective: The concern of data privacy has become extremely serious with using data mining and machine learning technologies to analyze personal information in order to produce relevant or accurate results. For example, in order to increase the volume and revenue of sales, some companies today try to collect as many personal data of consumers as possible from various kinds of sources or devices and then use data mining and machine learning methods to find highly interconnected information which is conducive to make marketing tactics. However, if all pieces of the information about a person were dug out through the mining and learning technologies and put together, any privacy about that individual instantly would disappear, which will make most people uncomfortable, and even frightened. Thus, an efficient and effective method needs to preserve the performance of mining and learning while protecting the personal information. Hence, how to make use of data mining and machine learning techniques for big data processing with guaranties of privacy and security is very worthy of study.
 5.
Realization and application perspective: The ultimate goal of groping for various learning methods to handle big data is to provide better environment for people; thus, more attention should be focused on building the bridge from theory to practice. For instance, how and where might the theoretical studies in big data machine learning research actually be applied?
2 Conclusions
Big data are now rapidly expanding in all science and engineering domains. Learning from these massive data is expected to bring significant opportunities and transformative potential for various sectors. However, most traditional machine learning techniques are not inherently efficient or scalable enough to handle the data with the characteristics of large volume, different types, high speed, uncertainty and incompleteness, and low value density. In response, machine learning needs to reinvent itself for big data processing. This paper began with a brief review of conventional machine learning algorithms, followed by several current advanced learning methods. Then, a discussion about the challenges of learning with big data and the corresponding possible solutions in recent researches was given. In addition, the connection of machine learning with modern signal processing technologies was analyzed through studying several latest representative research papers. To stimulate more interests for the audience of the paper, at last, open issues and research trends were presented.
Notes
Declarations
Acknowledgements
We gratefully acknowledge the financial support from the National Natural Science Foundation of China (Grant No. 61301160 and No. 61172062).
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 A Sandryhaila, JMF Moura, Big data analysis with signal processing on graphs: representation and processing of massive data sets with irregular structure. IEEE Signal Proc Mag 31(5), 80–90 (2014)View ArticleGoogle Scholar
 J Gantz, D Reinsel, Extracting value from chaos (EMC, Hopkinton, 2011)Google Scholar
 J Gantz, D Reinsel, The digital universe decade—are you ready (EMC, Hopkinton, 2010)Google Scholar
 D Che, M Safran, Z Peng, From big data to big data mining: challenges, issues, and opportunities, in Proceedings of the 18th International Conference on DASFAA (Wuhan, 2013), pp. 1–15Google Scholar
 M Chen, S Mao, Y Liu, Big data: a survey. Mobile Netw Appl 19(2), 171–209 (2014)MathSciNetView ArticleGoogle Scholar
 H Hu, Y Wen, T Chua, X Li, Toward scalable systems for big data analytics: a technology tutorial. IEEE Access 2, 652–687 (2014)View ArticleGoogle Scholar
 J Manyika, M Chui, B Brown, J Bughin, R Dobbs, C Roxburgh, AH Byers, Big data: the next frontier for innovation, competition, and productivity (McKinsey Global Institute, USA, 2011)Google Scholar
 Q Wu, G Ding, Y Xu, S Feng, Z Du, J Wang, K Long, Cognitive internet of things: a new paradigm beyond connection. IEEE Internet Things J 1(2), 129–143 (2014)View ArticleGoogle Scholar
 CW Tsai, CF Lai, MC Chiang, LT Yang, Data mining for internet of things: a survey. IEEE Commun Surv Tut 16(1), 77–97 (2014)View ArticleGoogle Scholar
 A Imran, A Zoha, Challenges in 5G: how to empower SON with big data for enabling 5G. IEEE Netw 28(6), 27–33 (2014)View ArticleGoogle Scholar
 X Wu, X Zhu, G Wu, W Ding, Data mining with big data. IEEE Trans Knowl Data Eng 26(1), 97–107 (2014)Google Scholar
 A Rajaraman, JD Ullman, Mining of massive data sets (Cambridge University Press, Oxford, 2011)View ArticleGoogle Scholar
 XW Chen, X Lin, Big data deep learning: challenges and perspectives. IEEE Access 2, 514–525 (2014)View ArticleGoogle Scholar
 K Slavakis, GB Giannakis, G Mateos, Modeling and optimization for big data analytics: (statistical) learning tools for our era of data deluge. IEEE Signal Proc Mag 31(5), 18–31 (2014)View ArticleGoogle Scholar
 TM Mitchell, Machine learning (McGrawHill, New York, 1997)MATHGoogle Scholar
 S Russell, P Norvig, Artificial intelligence: a modern approach (PrenticeHall, Englewood Cliffs, 1995)MATHGoogle Scholar
 V Cherkassky, FM Mulier, Learning from data: concepts, theory, and methods (John Wiley & Sons, New Jersey, 2007)View ArticleMATHGoogle Scholar
 TM Mitchell, The discipline of machine learning (Carnegie Mellon University, School of Computer Science, Machine Learning Department, 2006)Google Scholar
 C Rudin, KL Wagstaff, Machine learning for science and society. Mach Learn 95(1), 1–9 (2014)MathSciNetView ArticleGoogle Scholar
 CM Bishop, Pattern recognition and machine learning (Springer, New York, 2006)MATHGoogle Scholar
 B Adam, IFC Smith, F Asce, Reinforcement learning for structural control. J Comput Civil Eng 22(2), 133–139 (2008)View ArticleGoogle Scholar
 N Jones, Computer science: the learning machines. Nature 505(7482), 146–148 (2014)View ArticleGoogle Scholar
 J Langford, Tutorial on practical prediction theory for classification. J Mach Learn Res 6(3), 273–306 (2005)MathSciNetMATHGoogle Scholar
 R Bekkerman, EY Ran, N Tishby, Y Winter, Distributional word clusters vs. words for text categorization. J Mach Learn Res 3, 1183–1208 (2003)MATHGoogle Scholar
 Y Bengio, A Courville, P Vincent, Representation learning: a review and new perspectives. IEEE Trans Pattern Anal 35(8), 1798–1828 (2012)View ArticleGoogle Scholar
 F Huang, E Yates, Biased representation learning for domain adaptation, in Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (Jeju Island, 2012), pp. 1313–1323Google Scholar
 W Tu, S Sun, Crossdomain representationlearning framework with combination of classseparate and domainmerge objectives, in Proceedings of the 1st International Workshop on Cross Domain Knowledge Discovery in Web and Social Network Mining (Beijing, 2012), pp. 18–25Google Scholar
 S Li, C Huang, C Zong, Multidomain sentiment classification with classifier combination. J Comput Sci Technol 26(1), 25–33 (2011)View ArticleGoogle Scholar
 F Huang, E Yates, Exploring representationlearning approaches to domain adaptation, in Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing (Uppsala, 2010), pp. 23–30Google Scholar
 A Bordes, X Glorot, JWAY Bengio, Joint learning of words and meaning representations for opentext semantic parsing, in Proceedings of 15th International Conference on Artificial Intelligence and Statistics (La Palma, 2012), pp. 127–135Google Scholar
 N. BoulangerLewandowski, Y. Bengio, P. Vincent, Modeling temporal dependencies in highdimensional sequences: application to polyphonic music generation and transcription. arXiv preprint (2012). arXiv:1206.6392Google Scholar
 K Dwivedi, K Biswaranjan, A Sethi, Drowsy driver detection using representation learning, in Proceedings of the IEEE International Advance Computing Conference (Gurgaon, 2014), pp. 995–999Google Scholar
 D Yu, L Deng, Deep learning and its applications to signal and information processing. IEEE Signal Proc Mag 28(1), 145–154 (2011)View ArticleGoogle Scholar
 I Arel, DC Rose, TP Karnowski, Deep machine learninga new frontier in artificial intelligence research. IEEE Comput Intell Mag 5(4), 13–18 (2010)View ArticleGoogle Scholar
 Y Bengio, Learning deep architectures for AI. Foundations Trends Mach Learn 2(1), 1–127 (2009)MathSciNetView ArticleMATHGoogle Scholar
 R Collobert, J Weston, L Bottou, M Karlen, K Kavukcuoglu, P Kuksa, Natural language processing (almost) from scratch. J Mach Learn Res 12, 2493–2537 (2011)MATHGoogle Scholar
 P Le Callet, C ViardGaudin, D Barba, A convolutional neural network approach for objective video quality assessment. IEEE Trans Neural Networ 17(5), 1316–1327 (2006)View ArticleGoogle Scholar
 GE Dahl, D Yu, L Deng, A Acero, Contextdependent pretrained deep neural networks for largevocabulary speech recognition. IEEE Trans Audio Speech Lang Proc 20(1), 30–42 (2012)View ArticleGoogle Scholar
 G Hinton, L Deng, Y Dong, GE Dahl, A Mohamed, N Jaitly, A Senior, V Vanhoucke, P Nguyen, TN Sainath, B Kingsbury, Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Proc Mag 29(6), 82–97 (2012)View ArticleGoogle Scholar
 DC Ciresan, U Meier, LM Gambardella, J Schmidhuber, Deep, big, simple neural nets for handwritten digit recognition. Neural Comput 22(12), 3207–3220 (2010)View ArticleGoogle Scholar
 Y Wang, D Yu, Y Ju, A Acero, Voice search, in Language understanding: systems for extracting semantic information from speech (Wiley, New York, 2011)Google Scholar
 D PeteiroBarral, B GuijarroBerdiñas, A survey of methods for distributed machine learning. Progress in Artificial Intelligence 2(1), 1–11 (2012)View ArticleGoogle Scholar
 H Zheng, SR Kulkarni, HV Poor, Attributedistributed learning: models, limits, and algorithms. IEEE Trans Signal Process 59(1), 386–398 (2011)MathSciNetView ArticleGoogle Scholar
 H Chen, T Li, C Luo, SJ Horng, G Wang, A rough setbased method for updating decision rules on attribute values’ coarsening and refining. IEEE Trans Knowl Data Eng 26(12), 2886–2899 (2014)View ArticleGoogle Scholar
 J Chen, C Wang, R Wang, Using stacked generalization to combine SVMs in magnitude and shape feature spaces for classification of hyperspectral data. IEEE Trans Geosci Remote 47(7), 2193–2205 (2009)View ArticleGoogle Scholar
 E Leyva, A González, R Pérez, A set of complexity measures designed for applying metalearning to instance selection. IEEE Trans Knowl Data Eng 27(2), 354–367 (2014)View ArticleGoogle Scholar
 M Sarnovsky, M Vronc, Distributed boosting algorithm for classification of text documents, in Proceedings of the 12th IEEE International Symposium on Applied Machine Intelligence and Informatics (SAMI) (Herl'any, 2014), pp. 217–220Google Scholar
 SR Upadhyaya, Parallel approaches to machine learning—a comprehensive survey. J Parallel Distr Com 73(3), 284–292 (2013)View ArticleGoogle Scholar
 R Bekkerman, M Bilenko, J Langford, Scaling up machine learning: parallel and distributed approaches (Cambridge University Press, Oxford, 2011)View ArticleGoogle Scholar
 EW Xiang, B Cao, DH Hu, Q Yang, Bridging domains using world wide knowledge for transfer learning. IEEE Trans Knowl Data Eng 22(6), 770–783 (2010)View ArticleGoogle Scholar
 SJ Pan, Q Yang, A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10), 1345–1359 (2010)View ArticleGoogle Scholar
 W Fan, I Davidson, B Zadrozny, PS Yu, An improved categorization of classifier’s sensitivity on sample selection bias, in Proceedings of the 5th IEEE International Conference on Data Mining (ICDM) (Brussels, 2012), pp. 605–608Google Scholar
 J Gao, W Fan, J Jiang, J Han, Knowledge transfer via multiple model local structure mapping, in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Las Vegas, 2008), pp. 283291Google Scholar
 C Wang, S Mahadevan, Manifold alignment using procrustes analysis, in Proceedings of the 25th International Conference on Machine Learning (ICML) (Helsinki, 2008), pp. 1120–1127Google Scholar
 X Ling, W Dai, GR Xue, Q Yang, Y Yu, Spectral domaintransfer learning, in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Las Vegas, 2008), pp. 488–496Google Scholar
 R Raina, AY Ng, D Koller, 2006, Constructing informative priors using transfer learning, in Proceedings of the 23rd International Conference on Machine Learning (ICML) (Pittsburgh, 2006), pp. 713–720Google Scholar
 J Zhang, Deep transfer learning via restricted Boltzmann machine for document classification, in Proceedings of the 10th International Conference on Machine Learning and Applications and Workshops (ICMLA) (Honolulu, 2011), pp. 323–326Google Scholar
 Y Fu, B Li, X Zhu, C Zhang, Active learning without knowing individual instance labels: a pairwise label homogeneity query approach. IEEE Trans Knowl Data Eng 26(4), 808–822 (2014)View ArticleGoogle Scholar
 B Settles, Active learning literature survey (University of Wisconsin, Madison, 2010)Google Scholar
 MM Crawford, D Tuia, HL Yang, Active learning: any value for classification of remotely sensed data? P IEEE 101(3), 593–608 (2013)View ArticleGoogle Scholar
 MM Haque, LB Holder, MK Skinner, DJ Cook, Generalized querybased active learning to identify differentially methylated regions in DNA. IEEE ACM Trans Comput Bi 10(3), 632–644 (2013)Google Scholar
 D Tuia, M Volpi, L Copa, M Kanevski, J MunozMari, A survey of active learning algorithms for supervised remote sensing image classification. IEEE J Sel Top Sign Proces 5(3), 606–617 (2011)View ArticleGoogle Scholar
 G Ding, Q Wu, YD Yao, J Wang, Y Chen, Kernelbased learning for statistical signal processing in cognitive radio networks. IEEE Signal Proc Mag 30(4), 126–136 (2013)View ArticleGoogle Scholar
 C Li, M Georgiopoulos, GC Anagnostopoulos, A unifying framework for typical multitask multiple kernel learning problems. IEEE Trans Neur Net Lear Syst 25(7), 1287–1297 (2014)MathSciNetView ArticleGoogle Scholar
 G Montavon, M Braun, T Krueger, KR Muller, Analyzing local structure in kernelbased learning: explanation, complexity, and reliability assessment. IEEE Signal Proc Mag 30(4), 62–74 (2013)View ArticleGoogle Scholar
 K Slavakis, S Theodoridis, I Yamada, Online kernelbased classification using adaptive projection algorithms. IEEE Trans Signal Process 56(7), 2781–2796 (2008)MathSciNetView ArticleGoogle Scholar
 S Theodoridis, K Slavakis, I Yamada, Adaptive learning in a world of projections. IEEE Signal Proc Mag 28(1), 97–123 (2011)View ArticleGoogle Scholar
 K Slavakis, S Theodoridis, I Yamada, Adaptive constrained learning in reproducing kernel Hilbert spaces: the robust beamforming case. IEEE Trans Signal Process 57(12), 4744–4764 (2009)MathSciNetView ArticleGoogle Scholar
 K Slavakis, P Bouboulis, S Theodoridis, Adaptive multiregression in reproducing kernel Hilbert spaces: the multiaccess MIMO channel case. IEEE Trans Neural Netw Learn Syst 23(2), 260–276 (2012)View ArticleGoogle Scholar
 KR Müller, S Mika, G Rätsch, K Tsuda, B Schölkopf, An introduction to kernelbased learning algorithms. IEEE Trans Neural Networ 12(2), 181–201 (2001)View ArticleGoogle Scholar
 TH Davenport, P Barth, R Bean, How “big data” is different. MIT Sloan Manage Rev 54(1), 22–24 (2012)Google Scholar
 F Andersson, M Carlsson, JY Tourneret, H Wendt, A new frequency estimation method for equally and unequally spaced data. IEEE Trans Signal Process 62(21), 5761–5774 (2014)MathSciNetView ArticleGoogle Scholar
 F Lin, M Fardad, MR Jovanovic, Design of optimal sparse feedback gains via the alternating direction method of multipliers. IEEE Trans Automat Contr 58(9), 2426–2431 (2013)MathSciNetView ArticleGoogle Scholar
 S Boyd, N Parikh, E Chu, B Peleato, J Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations Trends Mach Learn 3(1), 1–122 (2011)View ArticleMATHGoogle Scholar
 J Dean, S Ghemawat, MapReduce: simplified data processing on large clusters. Commun ACM 51(1), 107–113 (2008)View ArticleGoogle Scholar
 J Dean, S Ghemawat, MapReduce: a flexible data processing tool. Commun ACM 53(1), 72–77 (2010)View ArticleGoogle Scholar
 C Chu, SK Kim, YA Lin, Y Yu, G Bradski, AY Ng, K Olukotun, Mapreduce for machine learning on multicore, in Proceedings of 20th Annual Conference on Neural Information Processing Systems (NIPS) (Vancouver, 2006), pp. 281–288Google Scholar
 M Armbrust, A Fox, R Griffith, AD Joseph, R Katz, A Konwinski, G Lee, D Patterson, A Rabkin, I Stoica, M Zaharia, A view of cloud computing. Commun ACM 53(4), 50–58 (2010)View ArticleGoogle Scholar
 MD Dikaiakos, D Katsaros, P Mehra, G Pallis, A Vakali, Cloud computing: distributed internet computing for IT and scientific research. IEEE Internet Comput 13(5), 10–13 (2009)View ArticleGoogle Scholar
 Y Low, D Bickson, J Gonzalez, C Guestrin, A Kyrola, JM Hellerstein, Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc VLDB Endow 5(8), 716–727 (2012)View ArticleGoogle Scholar
 M Lenzerini, Data integration: a theoretical perspective, in Proceedings of the twentyfirst ACM SIGMODSIGACTSIGART Symposium on Principles of Database Systems (Madison, 2002), pp. 233–246Google Scholar
 A Halevy, A Rajaraman, J Ordille, Data integration: the teenage years, in Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB) (Seoul, 2006), pp. 9–16Google Scholar
 Q Wu, G Ding, J Wang, YD Yao, Spatialtemporal opportunity detection for spectrumheterogeneous cognitive radio networks: twodimensional sensing. IEEE Trans Wirel Commun 12(2), 516–526 (2013)View ArticleGoogle Scholar
 N Srivastava, RR Salakhutdinov, Multimodal learning with deep boltzmann machines, in Proceedings of Neural Information Processing Systems Conference (NIPS) (Nevada, 2012), pp. 2222–2230Google Scholar
 Y Sun, S Todorovic, S Goodison, Locallearningbased feature selection for highdimensional data analysis. IEEE Trans Pattern Anal Mach Intell 32(9), 1610–1626 (2010)View ArticleGoogle Scholar
 LJP van der Maaten, EO Postma, HJ van den Herik, Dimensionality reduction: a comparative review. J Mach Learn Res 10(141), 66–71 (2009)Google Scholar
 M Mardani, G Mateos, GB Giannakis, Subspace learning and imputation for streaming big data matrices and tensors. IEEE Trans Signal Process 63(10), 2663–2677 (2015)MathSciNetView ArticleGoogle Scholar
 K Mohan, M Fazel, New restricted isometry results for noisy lowrank recovery, in Proceedings of IEEE International Symposium on Information Theory Proceedings (ISIT) (Texas, 2010), pp. 1573–1577Google Scholar
 EJ Candès, X Li, Y Ma, J Wright, Robust principal component analysis? J ACM 58(3), 1–37 (2011)MathSciNetView ArticleMATHGoogle Scholar
 Z Lin, R Liu, Z Su, Linearized alternating direction method with adaptive penalty for lowrank representation, in Proceedings of Neural Information Processing Systems Conference (NIPS) (Granada, 2011), pp. 612–620Google Scholar
 S ShalevShwartz, Online learning and online convex optimization. Foundations Trends Mach Learn 4, 107–194 (2011)View ArticleMATHGoogle Scholar
 J Wang, P Zhao, SC Hoi, R Jin, Online feature selection and its applications. IEEE Trans Knowl Data Eng 26(3), 698–710 (2014)View ArticleGoogle Scholar
 J Kivinen, AJ Smola, RC Williamson, Online learning with kernels. IEEE Trans Signal Process 52(8), 2165–2176 (2004)MathSciNetView ArticleGoogle Scholar
 M Bilenko, S Basil, M Sahami, Adaptive product normalization: using online learning for record linkage in comparison shopping, in Proceedings of the 5th IEEE International Conference on Data Mining (ICDM) (Texas, 2005), p. 8Google Scholar
 GB Huang, QY Zhu, CK Siew, Extreme learning machine: theory and applications. Neurocomputing 70(1), 489–501 (2006)View ArticleGoogle Scholar
 S Ding, X Xu, R Nie, Extreme learning machine and its applications. Neural Comput Appl 25(34), 549–556 (2014)View ArticleGoogle Scholar
 N Tatbul, Streaming data integration: challenges and opportunities, in Proceedings of the 26th IEEE International Conference on Data Engineering Workshops (ICDEW) (Long Beach, 2010), pp. 155–158Google Scholar
 DJ Abadi, Y Ahmad, M Balazinska, U Cetintemel, M Cherniack, JH Hwang, W Lindner, A Maskey, A Rasin, E Ryvkina, N Tatbul, Y Xing, SB Zdonik, The design of the borealis stream processing engine, in Proceedings of the Second Biennial Conference on Innovative Data Systems Research (CIDR) (Asilomar, 2005), pp. 277–289Google Scholar
 L Neumeyer, B Robbins, A Nair, A Kesari, S4: Distributed stream computing platform, in Proceedings of IEEE International Conference on Data Mining Workshops (ICDMW) (Sydney, 2010), pp. 170–177Google Scholar
 K Goodhope, J Koshy, J Kreps, N Narkhede, R Park, J Rao, VY Ye, Building Linkedin’s realtime activity data pipeline. IEEE Data Eng Bull 35(2), 33–45 (2012)Google Scholar
 W Yang, X Liu, L Zhang, LT Yang, Big data realtime processing based on storm, in Proceedings of the 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom) (Melbourne, 2013), pp. 1784–1787Google Scholar
 B SkieS, Streaming big data processing in datacenter clouds. IEEE Cloud Comput 1, 78–83 (2014)Google Scholar
 A Baldominos, E Albacete, Y Saez, P Isasi, A scalable machine learning online service for big data realtime analysis, in Proceedings of IEEE Symposium on Computational Intelligence in Big Data (CIBD) (Orlando, 2014), pp. 1–8Google Scholar
 NY Soltani, SJ Kim, GB Giannakis, Realtime load elasticity tracking and pricing for electric vehicle charging. IEEE Trans Smart Grid 6(3), 1303–1313 (2014)View ArticleGoogle Scholar
 S Tsang, B Kao, KY Yip, WS Ho, SD Lee, Decision trees for uncertain data. IEEE Trans Knowl Data Eng 23(1), 64–78 (2011)View ArticleGoogle Scholar
 F Nie, H Wang, X Cai, H Huang, C Ding, Robust matrix completion via joint schatten pnorm and lpnorm minimization, in Proceedings of the 12th IEEE International Conference on Data Mining (ICDM) (Brussels, 2012), p. 566Google Scholar
 G Ding, J Wang, Q Wu, L Zhang, Y Zou, YD Yao, Y Chen, Robust spectrum sensing with crowd sensors. IEEE Trans Commun 62(9), 3129–3143 (2014)View ArticleGoogle Scholar
 U Fayyad, G PiatetskyShapiro, P Smyth, From data mining to knowledge discovery in databases. AI Mag 17(3), 37–54 (1996)Google Scholar
 J Kelly III, S Hamm, Smart machines: IBM’s Watson and the era of cognitive computing (Columbia University Press, New York, 2013)Google Scholar
 K Slavakis, SJ Kim, G Mateos, GB Giannakis, Stochastic approximation visavis online learning for big data analytics. IEEE Signal Proc Mag 31(6), 124–129 (2014)View ArticleGoogle Scholar
 V Cevher, S Becker, M Schmidt, Convex optimization for big data: scalable, randomized, and parallel algorithms for big data analytics. IEEE Signal Proc Mag 31(5), 32–43 (2014)View ArticleGoogle Scholar
 A Tajer, VV Veeravalli, HV Poor, Outlying sequence detection in large data sets: a datadriven approach. IEEE Signal Proc Mag 31(5), 44–56 (2014)View ArticleGoogle Scholar
 S Scardapane, D Wang, M Panella, A Uncini, Distributed learning for random vector functionallink networks. Inf Sci 301, 271–284 (2015)MathSciNetView ArticleGoogle Scholar
 A Daneshmand, F Facchinei, V Kungurtsev, G Scutari, Hybrid random/deterministic parallel algorithms for nonconvex big data optimization. IEEE Trans Signal Process 63(15), 3914–3929 (2015)MathSciNetView ArticleGoogle Scholar
 P. Bianchi, W. Hachem, F. Iutzeler, A stochastic coordinate descent primaldual algorithm and applications to largescale composite optimization. arXiv preprint (2014). arXiv:1407.0898Google Scholar
 HT Wai, TH Chang, A Scaglione, A consensusbased decentralized algorithm for nonconvex optimization with application to dictionary learning, in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (South Brisbane, 2015), pp. 3546–3550Google Scholar
 D. Berberidis, V. Kekatos, G.B. Giannakis, Online censoring for largescale regressions with application to streaming big data. arXiv preprint (2015). arXiv:1507.07536Google Scholar
 K. Slavakis, G.B. Giannakis, Perblockconvex data modeling by accelerated stochastic approximation. arXiv preprint (2015). arXiv:1501.07315Google Scholar
 KC Chen, SL Huang, L Zheng, HV Poor, Communication theoretic data analytics. IEEE J Sel Areas Commun 33(4), 663–675 (2015)View ArticleGoogle Scholar
 J Zheng, F Shen, H Fan, J Zhao, An online incremental learning support vector machine for largescale data. Neural Comput Appl 22(5), 1023–1035 (2013)View ArticleGoogle Scholar
 C Ghosh, C Cordeiro, DP Agrawal, M Bhaskara Rao, Markov chain existence and hidden Markov models in spectrum sensing, in Proceedings of the IEEE International Conference on Pervasive Computing & Communications (PERCOM) (Galveston, 2009), pp. 1–6Google Scholar
 K Yue, Q Fang, X Wang, J Li, W Weiy, A parallel and incremental approach for dataintensive learning of Bayesian networks. IEEE Trans Cybern 99, 1–15 (2015)View ArticleGoogle Scholar
 X Dong, Y Li, C Wu, Y Cai, A learner based on neural network for cognitive radio, in Proceedings of the 12th IEEE International Conference on Communication Technology (ICCT) (Nanjing, 2010), pp. 893–896Google Scholar
 A ElHajj, L Safatly, M Bkassiny, M Husseini, Cognitive radio transceivers: RF, spectrum sensing, and learning algorithms review. Int J Antenn Propag 11(5), 479–482 (2014)Google Scholar
 M Bkassiny, SK Jayaweera, Y Li, Multidimensional dirichlet processbased nonparametric signal classification for autonomous selflearning cognitive radios. IEEE Trans Wirel Commun 12(11), 5413–5423 (2013)View ArticleGoogle Scholar
 A GalindoSerrano, L Giupponi, Distributed Qlearning for aggregated interference control in cognitive radio networks. IEEE Trans Veh Technol 59(4), 1823–1834 (2010)View ArticleGoogle Scholar
 TK Das, A Gosavi, S Mahadevan, N Marchalleck, Solving semimarkov decision problems using average reward reinforcement learning. Manage Sci 45(4), 560–574 (1999)View ArticleMATHGoogle Scholar
 RS Sutton, Learning to predict by the methods of temporal differences. Mach Learn 3(1), 9–44 (1988)Google Scholar
 S Singh, T Jaakkola, ML Littman, C Szepesvári, Convergence results for singlestep onpolicy reinforcementlearning algorithms. Mach Learn 38, 287–308 (2000)View ArticleMATHGoogle Scholar