Robust Recognition of Specific Human Behaviors in Crowded Surveillance Video Sequences
© Masaki Takahashi et al. 2010
Received: 12 November 2009
Accepted: 16 March 2010
Published: 26 April 2010
We describe a method that can detect specific human behaviors even in crowded surveillance video scenes. Our developed system recognizes specific behaviors based on the trajectories created by detecting and tracking people in a video. It detects people using an HOG descriptor and SVM classifier, and it tracks the regions by calculating the two-dimensional color histograms. Our system identifies several specific human behaviors, such as running and meeting, by analyzing the similarities to the reference trajectory of each behavior. Verification techniques such as backward tracking and calculating optical flows contributed to robust recognition. Comparative experiments showed that our system could track people more robustly than a baseline tracking algorithm even in crowded scenes. Our system precisely identified specific behaviors and achieved first place for detecting running people in the TRECVID 2009 Surveillance Event Detection Task.
There have been many studies of human motion recognition through video content analysis. This work has gained widespread interest for both academic and industrial purposes . These techniques can be applied to motion-based video searches or to man-machine interfaces that use human gestures, for example.
To achieve these goals, some fundamental technologies have to be established. For recognizing human motions, the capability to detect and track humans in video is essential. A human detector and tracker based on feature points as represented by the Kanade-Lucas-Tomasi Feature Tracker (KLT tracker) has been studied [2, 3]. Moreover, gradient-based features as represented by histograms of oriented gradient (HOG) are currently being used for human detection [4, 5].
In addition, the rapid spread of surveillance cameras has increased demand for cameras that can not only track people but can also automatically identify their specific motions. A number of technologies identify specific motions by detecting a particular image feature value that is unlike the other major features. For example, Shiraki et al. have developed a technology to detect a specific motion from a video sequence using cubic higher-order local auto correlation (CHLAC) image features . However, many of these technologies assume relatively simple videos in which it is relatively easy to detect and track people. A great deal of work has been done analyzing human behavior in simpler datasets (KTH , Weizmann ) where the motions are performed in controlled situations [9–12]. The features used in these algorithms are corner points, optical flows, and shape. These are not enough available due to the occlusion, different lighting conditions, or varying object sizes. Practical algorithms that can be applied to complicated sequences such as surveillance video from train stations or airports are required [13, 14].
Although some studies have targeted crowded surveillance video sequences, they have been limited to tracking human objects or detecting the overall motion of a great number of people [15–17]. No technology has been established that can robustly detect specific behaviors within crowded sequences in real videos.
We describe a method that can detect specific human behaviors, such as running and meeting, even within crowded sequences. Although tracking all human objects in a crowded scene is a difficult problem, it could be possible to detect specific human behaviors by searching for a certain unique feature value from major normal feature values. The trajectory of a moving person contains rich information about the person's behaviors, such as velocity or travel distance, so we used this trajectory for recognizing human behaviors. No previous technology has recognized human behaviors based on their trajectory in crowded scenes.
For tracking people in complicated sequences, we used a HOG and a support vector machine-(SVM-) based human detection algorithm [18, 19] that is known to be relatively robust. In addition, we used a Kalman filter based tracking algorithm that contributes to robust tracking, even with occlusion, by predicting the position of the person.
Our system recognizes specific behaviors by analyzing the similarities to the reference trajectory of each specific behavior in the trajectory feature space. The feature space was generated from ten-dimensional features that were extracted from a trajectory based on principal component analysis (PCA) . Our system can sensitively detect particular behaviors from minimal evidence using the feature space. Though it sometimes incorrectly identifies nontargeted behaviors as the targeted behaviors, most of these so-called false detections are rejected by the verification process. This verification technique is one of the unique features of our system.
We compared our method with a baseline tracking algorithm using two different datasets: the KTH dataset and the TRECVID dataset . The results of these comparative experiments showed that our method more robustly tracked people even in crowded scenes. In addition, the TRECVID 2009 surveillance event detection task showed that our system recognized several specific behaviors precisely and that it was highly effective.
We introduce conventional techniques in Section 2, describe our motion recognition method in Section 3, show results of several experiments in Section 4, and conclude in Section 5.
2. Conventional Techniques
Many researchers have studied human appearance and motion recognition in the field of computer vision . In particular, studies detecting specific motions using surveillance cameras have increased with the number of crimes and instances of terrorism.
Most conventional techniques that recognize particular objects or human motions follow a two-step process: ( ) cut out objects or human shapes precisely or calculate low-level image features from a video, ( ) apply the cutout objects to detailed shape or motion models prepared beforehand [22–25]. These methods can track people or detect human motions with a low error rate and can precisely recognize even small motions, such as hand waving and jumping, by considering kinematic models or spatio-temporal images. To cut out human shapes, advanced background subtraction methods and contour definition methods have also been developed for precise segmentation of the human shape [26, 27].
The KTH motion sequences have been frequently used in motion recognition papers. The dataset consists of 2391 low resolution videos ( fps) showing six types of human motions each performed 4 times by 25 persons. The motions are walking, jogging, running, boxing, hand waving, and hand clapping. Only one person appears in each sequence, which is shot by an almost-fixed camera in front of a smooth background, as shown in Figure 1.
The Weizmann dataset is 90 low-resolution video sequences ( fps) showing nine different people, each performing 10 natural motions such as running, walking, skipping, jumping, and hand waving. Though this set contains somewhat more complicated situations such as occlusions, the sequences are controlled, with only one person per sequence, and they were shot by a fixed camera as shown in Figure 2.
The TRECVID is a workshop for evaluating information retrieval technologies using a common video corpora. It is organized by the US National Institute of Standards and Technology (NIST) . Once a year, participants, such as research groups from universities and companies, are invited to the evaluation. The participants compete at tasks specified by the TRECVID. The submitted results are evaluated and compared by the TRECVID organizers. The principal tasks are high-level feature extraction, copy detection, and surveillance event detection.
The surveillance event detection is a task to detect sequences of specific human motions. The ten required specific motions are called PersonRuns, CellToEar, ObjectPut, PeopleMeet, PeopleSplitUp, Embrace, Pointing, ElevatorNoEntry, OpposingFlow, and TakePicture. Participants should output detection results for any three events from the required set. NIST annotated the videos with the correct human motion data.
We used complicated videos for detecting specific human behaviors using the TRECVID dataset. We devised a sensitive method for detecting human behaviors by evaluating the trajectory of a person. Though this method also misidentifies many nontargeted behaviors, most of these are then rejected during the verification process. This approach has not been studied in the past.
3. Proposed System
In the training phase, the functions of human region detection and human behavior recognition are created from human region images and video sequences from fixed cameras. We describe human region detection in Section 3.2 and human behavior recognition in Section 3.4. In the operation phase, the system automatically detects specific behaviors from a video sequence shot by the same camera as the training phase. As input, we assume digitized video files that can be played repeatedly. The system outputs a file that has a time sequence of detected specific behaviors. The beginning time and ending time of the specific behaviors are written in the file. Though our system needed to be trained in advance, it could be applied to any surveillance video sequences without any camera calibration.
3.2. Human Region Detection
3.2.1. Processing Flow
3.2.2. Changed Area Detection
3.2.3. Human Detection Processing
Our human detector searches for human regions by calculating image features around the changed region. We used a human detector that combines the histograms of oriented gradient (HOG) feature descriptors and a support vector machine (SVM).
HOG descriptors are used for object detection. Many studies have reported that the HOG is suitable for detecting human regions because it is robust to a wide range of variations of poses [4, 5]. The technique counts occurrences of gradient orientation in localized portions of an image. It computes on a dense grid of uniformly spaced cells and uses overlapping local contrast normalization for improved performance.
3.2.4. Clustering Human Regions
More than one candidate human region is usually detected around one person. The system should detect one human region for each person in order to track robustly and recognize human behaviors accurately. Therefore, the system clusters the candidate human regions and determines a representative human region for one person from each of the candidate human regions in each cluster. This function stabilizes the tracking of human regions.
We used HSV color space for creating color histograms. HSV stands for hue, saturation, and value. The hue and saturation are intimately related to the way human eye perceives color, so they are suitable as components of the histogram. The value is the brightness of the color. A histogram that is influenced by luminance should not be used because it changes depending on the positions of people. We used two-dimensional (2D) color histograms of HSV color space (H, S). The sum of a histogram is normalized to deal with the difference of the size of target regions.
A candidate human region located in the center of its cluster is selected as the representative human region of the cluster.
3.3. Human Region Tracking
3.3.1. Processing Flow
It is difficult for a motion vector-based tracker to track all human regions in a crowded area because large numbers of the same motion vectors are detected. In our system, each human region can be tracked relatively robustly, even in a crowded scene, because of the 2D color histograms.
3.3.2. Prediction Processing
That approach, however, does not necessarily give the position of the human region in every frame. Detections can fail when an occlusion occurs in a crowded scene. Therefore, we supplemented the system with a prediction-based retrack function. The prediction continues even after a detection failure, so it is possible to pick up the human region again after it passes through an area where extraction is difficult.
Here is a frame number, and matrix expresses the transposition of matrix . is the state estimation vector that consists of the position coordinates and speed, and is the covariance matrix of the prediction error. is the state propagator matrix whose elements express uniform straight-line motion for the linear prediction. is a covariance matrix of the process noise , and is a covariance matrix of the observation noise .
where is the measurement state (detected position coordinates), and and are updated values of and . is a matrix that converts a state vector into a measurement vector. is the covariance matrix of the measurement error. When the tracking goes wrong, the diagonal elements of become large values, and when it succeeds, the elements become small. Therefore, the process controls the size of the image area searched for people on the basis of the values of .
3.4. Human Behavior Recognition
3.4.1. Normalization of the Length of Motion Vector
A trajectory of detected human regions contains rich information about human behavior, such as moving speed and travel distance. Our system recognizes specific human behaviors from their trajectories. However, the length of the motion vector differs depending on the detected position in the image coordinates. For example, the motion vectors of people who are detected at positions near the camera are large, and the motion vectors of people who are detected at a distance are small. To address this, we devised an average velocity map to normalize the motion vectors.
By referring to this average velocity map, the system can normalize an input motion vector, so it can treat all trajectories equally regardless of their detected position in the image. This technique does not need any calibration or positional information of the camera, so it can be applied to every video sequence shot by a fixed camera.
The system can normalize the motion vectors more accurately when the blocks are set to be small. However, the suitable block size depends on the resolution of the video, the distance from the camera to people, and the number of motion vectors used for training. If the resolution was low, the camera was near people, and the training sample vectors were poor, block size should be large, because the average velocity might be influenced by unusual motion vectors.
3.4.2. Feature Extraction from a Trajectory
The total vector is calculated by accumulation of the motion vectors of each frame in a trajectory. The feature indicates the direction of the trajectory. The system calculates the vector horizontally and vertically . The travel distance means the distance from the first detected position to the last detected position. The distance is normalized in the same way as the motion vectors. The average velocity is the average length of motion vectors in a trajectory. The acceleration is a shift in velocity. We can calculate this feature by differentiating velocities. The system can detect a person who stopped or started suddenly with this feature. The linearity is the average distance from each detected position to a regression line. If a person has moved straight ahead, the linearity is close to zero.
Generally, it is difficult to precisely estimate the moving speed and direction. However, the system can calculate these features as suitable for recognizing the specific behaviors from a human region trajectory that has been normalized with the average velocity map.
3.4.3. Detecting Behaviors from the Trajectory Feature Space
To cluster trajectories, we projected each trajectory onto a trajectory feature space that was generated using principal component analysis (PCA) . We calculated eigenvalues and eigen vectors from the ten-dimensional features of each trajectory. We narrowed down the dimensions from ten to five. The first principal component was strongly influenced by the average velocity and the travel distance, the second one was influenced by the vertical position and the total vector, and the third one was influenced by the horizontal position, the total vector, and the acceleration.
In addition, trajectories of the same behaviors tended to be plotted near each other. So, we created classes called PersonRuns, PeopleMeet, and ObjectPut by calculating their average positions and standard deviations in the feature space. Ellipses denote classes of specific behaviors, and their radii denote standard deviations. A large bias is seen at the trajectories of the specific behavior PersonRuns. This showed that the PersonRuns behavior could be reliably detected from this feature space. On the contrary, the ObjectPut class was positioned near the original point and its variance was large. This showed that it is difficult to detect the ObjectPut behavior from only features of a trajectory.
The system sets decision scores according to the closeness for each behavior class. If a decision score exceeds the particular threshold for the motion, the system recognizes that the specific behavior might occur in the ID that contains the trajectory. The threshold is experimentally decided for each camera during the learning process. However, the system does not identify the behavior immediately. It verifies whether the behavior truly occurred by a verification process that we describe in the next section.
3.5. Human Behavior Verification
3.5.1. Verification of Fast Motion by Backward Tracking
People move in various ways in the surveillance video sequences. The movements can be roughly classified into three types: ( ) fast motions that have large motion vectors such as PersonRuns, ( ) big motions associated with traveling such as PeopleMeet and PeopleSplitUp, and ( ) small motions not associated with traveling such as ObjectPut, TakePicture, and CellToEar. Our proposed system uses several methods to verify behavior depending on the motion type.
The system sometimes makes a mistracking, especially when the motion is fast, because of the large motion vectors. In addition, false detections for PersonRuns frequently occurred when the system misextracted a distant region because the motion vectors also tended to be large at greater distances. Thus, the system verifies the trajectory by searching backward after the PersonRuns behavior is identified in forward tracking. Tracking results between forward tracking and backward tracking are different in a crowded scene because the predicted positions of persons in each frame are different for forward and backward tracking, so the system tends to extract different regions.
3.5.2. Verification of Big Motion by Following Trajectories
3.5.3. Verification of Small Motion by Optical Flows
It is difficult to detect small motions from only a trajectory, because a trajectory contains little information about small motions. Thus, it is hard to distinguish the trajectory from other normal trajectories in the trajectory feature space. Therefore, the system also uses optical flows  in the human region as a local feature for detecting small motions.
4.1. Comparison of Tracking Accuracy
We compared the tracking accuracy of our system with a baseline method that was created based on the KLT tracker. The KLT tracker is an algorithm that selects and keeps track of feature points that are optimal for tracking. It is widely used in visual feature tracking and the method can be used with the OpenCV video library . The KLT tracker detects motion vectors around moving objects; motion vectors that have the same length and direction tend to be detected around one person. Thus, the baseline method uses motion vectors to cluster human regions.
We used two different datasets for the experiment; the KTH dataset and the TRECVID event detection dataset. The KTH motion sequences have been used in many motion recognition studies. The dataset contains sequences of a person walking in front of a smooth background, and only one person appears in each sequence. We used 100 sequences of walking people for the experiment. The TRECVID event detection dataset is a series of video sequences from five surveillance cameras set at different angles in an airport. A large number of people appear in each sequence and many occlusions occur. We used 14 sequences of the TRECVID dataset in which 120 people appeared.
Success rate of two methods with different datasets.
is the number of trajectories that were successfully tracked, and is the total number of trajectories that the system detected. People in both datasets appeared for 3–5 seconds on average. Thus, we considered it a success when the system tracked a person correctly for more than 75 frames (3 seconds).
We can see the differences in the two methods in the results of the TRECVID dataset. Our method was 13.72% better than the baseline method. This indicates that our method can track human regions relatively robustly even in crowded scenes.
Mistracking: tracking a human region for less than 75 frames (includes tracks jumping from one person to another).
False alarm: tracking a noise region that does not contain a human for more than 75 frames.
Comparison of two methods.
The baseline method also had many false alarms. The method was very sensitive not only to human regions but also to other objects, such as shadows or baggage, because it does not distinguish people from other objects. On the other hand, our classifier can identify whether a region contains people or not because it was trained using supervised machine learning on examples of regions containing people (positive) and examples of regions not containing people (negative). This is one reason that our error rate of false alarm was 3.51% better than the baseline method.
At the same time, the processing time of our method could be a disadvantage. Table 2 shows that the proposed method takes about twice as long as the baseline method. The data is calculated by averaging the processing time for about 2 hours in the TRECVID dataset. Although the proposed method is slower due to the complexity of the processing, the performance improvement outweighs the disadvantage.
In addition, the baseline method is easier to apply to a new video sequence than our method because it does not need to learn human regions. The proposed method needs much prior knowledge such as background information and an average velocity map. However, most of the data is calculated automatically in the training phase, so the proposed system does not require much preparation.
4.2. Effectiveness of Verification Process
Occurrence of false detections.
The number of false detections without verification process
The number of false detections with verification process
Reduction rate (%)
The verification process for the PersonRuns behavior reduced false detections to less than one third, from 1572 to 440. This indicates that backward tracking avoided two of three false detections. The process was effective for the PeopleMeet behavior as well. The false detections were reduced from 2198 to 1124.
These results show the effectiveness of the verification process. Even though the process restricts our system to not being able to be applied in real time, the accuracy rather than the processing speed should be emphasized.
Though false detections were reduced to only about 2% of the first decision for the ObjectPut behavior, the detecting accuracy of the behavior was low. We should also search for effective features for identifying small motions.
4.3. Recognition Accuracy of Specific Behaviors
We evaluated the recognition accuracy of our system for specific human behaviors in the TRECVID 2009 surveillance event detection task. We trained our detecting, tracking, and recognition algorithm using 100 hours of a development dataset. NIST evaluated our algorithm from submission data that was made using 44 hours of evaluation dataset.
The results indicate that many false detections and missed detections occurred. However, our system performed better than other systems developed by participants for the TRECVID 2009 surveillance event detection task.
Comparison with other systems.
Average of all DCR
Standard deviation of all DCR
An important technique for detecting the PersonRuns behavior is robust tracking of fast moving objects. The Kalman filter based prediction process accomplished this. The process set the search area for a tracked person depending on the speed of the person and the tracking situation so that our system could track running people robustly even if an extraction failure occurred in a few frames. The recognition process also contributed to good results. We could confirm a large bias on the axis of the first principal component in the trajectory feature space. The first principal component weighed heavily on the features of the average velocity and the travel distance. These features could be sufficiently extracted from a trajectory. Thus, our system was able to detect the PersonRuns behavior. In addition to the above two processes, our verification process was also effective in helping reduce false detections.
Our system could not detect the PersonRuns behavior when a person ran temporally (for less than one second) because the averaged velocity tended to be low in that case. It was particularly difficult to detect temporal dashing by children. However, the time length of trajectories can be configured freely so that we can set an appropriate length depending on the target person.
For the PeopleMeet behavior, our result was fourth place out of eight systems. The average DCR of all systems was 2.349. Our DCR was 1.174, so our system detected the PeopleMeet behavior relatively accurately compared to the other systems.
The places where one person can meet another are limited, because people do not stop where traffic density is high. Before meeting someone, a person tends to slow down his walking speed before stopping. A slight bias in the third principal component was confirmed in the trajectory feature space. The third principal component weighed heavily on the features of position and acceleration. So, we could divide the trajectory of a suspected PeopleMeet behavior from people who are moving without stopping. However, the distance from the center of major normal trajectories to this behavior class was relatively small in the trajectory feature space, causing many false detections. Even though the verification process reduced the number, the recognition accuracy was worse than for the PersonRuns behavior.
For the ObjectPut behavior, our result was sixth place out of eight systems. Our DCR was worse than the average DCR of all the systems. Our system is not reliable for detecting the ObjectPut behavior because that behavior is small compared to the other two behaviors, and it was difficult to recognize it from only a trajectory. In addition, there are many potential actions that fall under the category of ObjectPut. The behavior includes putting heavy baggage on the floor, picking up money at a register, and leaning on a luggage cart. Though the system accounts for optical flows in the human region, it considered only downward motion. There is still room for improvement in the detection of small motions. In future work, we should set assumptions more strictly or extract more effective features for detecting small motions.
We proposed a method that can detect specific human behaviors even in crowded surveillance video sequences, and we developed a system that detects the NIST-defined behaviors PersonRuns, PeopleMeet, and ObjectPut. The system recognizes these behaviors by identifying a human region trajectory, which is created by detecting and tracking areas that contain people, which we call human regions, in video sequences. The system detects these human regions using HOG descriptors and an SVM classifier and uses a Kalman filter to track them robustly. The similarity of two human regions is evaluated by the simple Euclidean distance between them and the Bhattacharyya distance of 2D color histograms.
The system determines an occurrence of a specific behavior based on the distance from a trajectory to each class of specific behaviors in the trajectory feature space created using PCA. The system also has functions to verify detected behaviors. For example, it uses backward tracking to verify fast motions, and it calculates optical flows to verify small motions. These functions contribute to robust recognition of specific types of behavior.
We evaluated the tracking accuracy of our system by comparing it with a baseline tracking method. Our system was able to track human regions more robustly than the baseline method even for crowded scenes. We also showed that the verification process could reduce false detections effectively. In addition, the results of the TRECVID 2009 surveillance event detection task showed that our system could recognize human behaviors robustly; our recognition capability won first place for detecting the PersonRuns behavior.
The system is suitable for detecting fast motion and big motions because it recognizes behaviors based on motion trajectories, which contain rich information about these motions. We plan to analyze local features, such as optical flows, in detail to expand the range of human behaviors that can be recognized.
- Bradski G, Davis J: Modeling people: vision-based understanding of a person's shape, appearance, movement, and behavior. Computer Vision and Image Understanding 2006, 104: 87-89. 10.1016/j.cviu.2006.09.002View ArticleGoogle Scholar
- Tomasi C, Kanade T: Detection and tracking of point features. Carnegie Mellon University; 1991:91-132.Google Scholar
- Moore D: A real-world system for human motion detection and tracking, Undergraduate Thesis. California Institute of Technology; May 2003.Google Scholar
- Dalal N, Triggs B: Histograms of oriented gradients for human detection. Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '05), June 2005, San Diego, Calif, USA 886-893.Google Scholar
- Han F, Shan Y, Cekander R: A two-stage approach to people and vehicle detection with HOG-based SVM. PerMIS 2006, 133-140.Google Scholar
- Shiraki T, Saito H, Kamoshida Y, et al.: Real-time motion recognition using CHLAC features and cluster. Proceedings of IFIP International Conference on Network and Parallel Computing (NPC '06), 2006 50-56.Google Scholar
- Schüldt C, Laptev I, Caputo B: Recognizing human actions: a local SVM approach. Proceedings of the 17th International Conference on Pattern Recognition (ICPR '04), August 2004, Cambridge, UK 3: 32-36.View ArticleGoogle Scholar
- Blank M, Gorelick L, Shechtman E, Irani M, Basri R: Actions as space-time shapes. Proceedings of the IEEE International Conference on Computer Vision, October 2005 2: 1395-1402.Google Scholar
- Fathi A, Mori G: Action recognition by learning mid-level motion features. Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR '08), June 2008, Anchorage, Alaska, USA 1-8.Google Scholar
- Sun X, Chen M, Hauptmann A: Action recognition via local descriptors and holistic features. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR '09), June 2009, Miami, Fla, USA 58-65.Google Scholar
- Li Z, Fu Y, Huang TS, Yan S: Real-time human motion recognition by luminance field trajectory analysis. Proceedings of ACM International Conference on Multimedia, October 2008, Vancouver, Canada 671-676.View ArticleGoogle Scholar
- Mikolajczyk K, Uemura H: Action recognition with motion-appearance vocabulary forest. Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR '08), June 2008, Anchorage, Alaska, USAGoogle Scholar
- Efros AA, Berg AC, Mori G, Malik J: Recognizing action at a distance. Proceedings of the 9th IEEE International Conference on Computer Vision, October 2003, Nice, France 2: 726-733.View ArticleGoogle Scholar
- Niebles JC, Fei-Fei L: A hierarchical model of shape and appearance for human motion classification. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR '07), June 2007 1-8.Google Scholar
- Lien K-C, Huang C-L: Multiview-based cooperative tracking of multiple human objects. Eurasip Journal on Image and Video Processing 2008., 2008:Google Scholar
- Tsai Y-T, Shih H-C, Huang C-L: Multiple human objects tracking in crowded scenes. Proceedings of the 18th International Conference on Pattern Recognition (ICPR '06), August 2006, Hong Kong 3: 51-54.View ArticleGoogle Scholar
- Hu M, Ali S, Shah M: Learning motion patterns in crowded scenes using motion flow field. Proceedings of the 19th International Conference on Pattern Recognition (ICPR '08), December 2008, Tampa, Fla, USA 1-5.Google Scholar
- Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC: Estimating the support of a high-dimensional distribution. Neural Computation 2001, 13(7):1443-1471. 10.1162/089976601750264965View ArticleMATHGoogle Scholar
- Chen P-H, Lin C-J, Schölkopf B: A tutorial on v-support vector machines. Applied Stochastic Models in Business and Industry 2005, 21(2):111-136. 10.1002/asmb.537MathSciNetView ArticleMATHGoogle Scholar
- Besse P, Ramsay JO: Principal components analysis of sampled functions. Psychometrika 1986, 51(2):285-311. 10.1007/BF02293986MathSciNetView ArticleMATHGoogle Scholar
- TREC Video Retrieval Evaluation http://www-nlpir.nist.gov/projects/trecvid/
- Bradski GR, Davis JW: Motion segmentation and pose recognition with motion history gradients. Machine Vision and Applications 2002, 13(3):174-184. 10.1007/s001380100064View ArticleGoogle Scholar
- Dedeoğlu Y, Uğur Töreyin B, Güdükbay U, Enis Çetin A: Silhouette-based method for object classification and human motion recognition in video. Proceedings of the Computer Vision in Human-Computer Interaction, May 2006, Graz, Austria 3979: 64-77.View ArticleGoogle Scholar
- Ke Y, Sukthankar R, Hebert M: Event detection in crowded videos. Proceedings of the 11th IEEE International Conference on Computer Vision (ICCV '07), October 2007, Rio de Janeiro, BrazilGoogle Scholar
- Pavlovic V, Rehg JM, Cham T-J, Murphy KP: A dynamic Bayesian network approach to figure tracking using learned dynamic models. Proceedings of the 7th IEEE International Conference on Computer Vision (ICCV '99), September 1999, Kerkyra, Greece 1: 94-101.View ArticleGoogle Scholar
- Stauffer C, Grimson WEL: Adaptive background mixture models for real-time tracking. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '99), June 1999, Collins, Colo, USA 2: 246-252.View ArticleGoogle Scholar
- Garrett Z, Saito H: Live video object tracking and segmentation using graph cuts. Proceedings of International Conference on Image Processing (ICIP '08), October 2008, San Diego, Calif, USA 1576-1579.Google Scholar
- National Institute of Standards and Technology http://www.nist.gov/index.html
- Xuan G, Chai P, Wu M: Bhattacharyya distance feature selection. Proceedings of the International Conference on Pattern Recognition (ICPR '96), 1996 2: 195-199.View ArticleGoogle Scholar
- Grimble MJ: Robust Industrial Control: Optimal Design Approach for Polynomial Systems. Prentice-Hall, Upper Saddle River, NJ, USA; 1994.Google Scholar
- Yu X, Xu C, Tian Q, Leong HW: A ball tracking framework for broadcast soccer video. Proceedings of IEEE International Conference on Multimedia & Expo, 2003 2: 273-276.Google Scholar
- Mahalanobis P: On the generalized distance in statistics. Proceedings of the National Academy of Sciences of the United States of America 1936, 2: 49-55.MATHGoogle Scholar
- Horn BKP, Schunck BG: Determining optical flow. Artificial Intelligence 1981, 17(1–3):185-203.View ArticleGoogle Scholar
- Shi J, Tomasi C: Good features to track. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '94), June 1994, Seattle, Wash, USA 593-600.Google Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.