Basketball shooting technology based on acceleration sensor fusion motion capture technology

Computer vision recognition refers to the use of cameras and computers to replace the human eyes with computer vision, such as target recognition, tracking, measurement, and in-depth graphics processing, to process images to make them more suitable for human vision. Aiming at the problem of combining basketball shooting technology with visual recognition motion capture technology, this article mainly introduces the research of basketball shooting technology based on computer vision recognition fusion motion capture technology. This paper proposes that this technology first performs preprocessing operations such as background removal and filtering denoising on the acquired shooting video images to obtain the action characteristics of the characters in the video sequence and then uses the support vector machine (SVM) and the Gaussian mixture model to obtain the characteristics of the objects. Part of the data samples are extracted from the sample set for the learning and training of the model. After the training is completed, the other parts are classified and recognized. The simulation test results of the action database and the real shot video show that the support vector machine (SVM) can more quickly and effectively identify the actions that appear in the shot video, and the average recognition accuracy rate reaches 95.9%, which verifies the application and feasibility of this technology in the recognition of shooting actions is conducive to follow up and improve shooting techniques.


Introduction
In recent years, with the development of the era of big data, motion capture technology developed by computer vision recognition has become a hot topic in research. In the course of basketball training and competition, coaches should develop corresponding training programs for different athletes to improve their basketball skills. Traditional training methods are coaches based on their theoretical training and experience, combined with the technical level of basketball players to develop training programs. This training mode takes a long time and may cause a waste of coaching resources. Modern sports should be precise and efficient. Computer vision recognition motion capture The innovations of this paper are (1) proposing to denoise the background of images created by computer vision recognition, (2) establishing a support vector machine model for motion capture recognition of basketball shooting techniques, and (3) establishing a Gaussian mixture model to perform motion capture processing of basketball shooting techniques.
2 Method of basketball shooting technique based on computer vision recognition fusion motion capture technology

Method of recognition of shooting action
Basketball action recognition is a kind of human posture recognition. Human motion gesture recognition has always been a research hotspot in various fields. Domestic research has also been conducted on basketball motion recognition, which will increase the analysis of complex motions, such as turning to catch the ball, running, and doing a layup, and expand the recognition of basketball shooting motions. At this stage, there are two main human body gesture recognition methods: inertial sensor gesture recognition and image acquisition gesture recognition [4]. Visual recognition based on image acquisition can be divided into single video recognition and multiple video recognitions [5]. The general idea of image acquisition and visual recognition is to use sensors, such as cameras to collect images or videos of athletes, and then perform hidden functions hidden in the images and videos. Finally, design a classifier to recognize the athlete's motion posture [6]. The basic idea of sensor inertia is to install a data acquisition sensor externally from the athlete's body, send the collected data to the terminal in real time, and identify the position of the athlete based on various data [7].

Background difference
The background difference method is suitable for use when the camera is in a stationary installation, and has the characteristics of accurate detection, simple algorithm, and easy implementation [8]. In this way, through further processing, you can fully extract important data quickly and accurately extract the motion characteristics of the moving target [9]. However, in actual application scenarios, the background reference model is very sensitive to changes in external scenes, such as weather changes, lighting, and emergency situations. In the process of using the background difference method to derive the motion area, a good background reference model should be created [10].

Optical flow method
The advantage of the optical flow method is that it can derive the position information of the moving target in the video sequence relatively completely and it can also support the motion state of the camera [11]. Therefore, the optical flow method can detect moving targets from the camera. This method is more suitable for accurate analysis and processing and solves the problem of overlapping objects and obstacles in traditional moving target detection [12,13].

Frame difference
The frame difference method can be used to detect moving targets in dynamic scenes and permanently install cameras [14]. This method may lead to the inability to fully derive all the features and detection results related to the moving target, and the results obtained may be slightly wrong. Generally speaking, in this case, the next processing step is required, which is not conducive to the further analysis and processing of the image [15]. In addition, if the moving target does not move at a constant speed during the movement, it may move at a variable speed. Therefore, using the frame difference method may result in the detection of moving targets or only relatively small and shallow boundaries [16]. However, although the karate diff method cannot accurately derive the moving target, this method is usually used as the original algorithm to quickly determine whether the target enters the scene [17].

Graphic gathering
The process of classifying and digitizing continuous image signals and then sending the generated digital signals to frame memory or computer memory is called image acquisition [18]. Generally speaking, image acquisition can be divided into two categories: one is to capture static images, to obtain images at a given time [19]; the other is to capture static images, which are dynamic images, to obtain a specific time period [20]. Still image acquisition is mainly taken by a camera, and the captured image is stored in the camera as a digital signal or directly transmitted to a computer for subsequent processing [21]. The collection of dynamic images is mainly by digitally storing the images taken by the camera on the hard disk of the camera or directly transmitting to the computer for processing [22].

Image denoising
This intermediate process of removing and suppressing noise in the image is called image denoising, and image denoising generally exists in the image preprocessing process [23]. With the rapid development of digital image processing technology, image denoising methods can generally be divided into two categories: mean filtering and median filtering [24]. The mean image filtering method directly operates on the original image to be processed. According to this operation method, mean filtering denoising can be divided into direct operation on each pixel in the image and direct operation on the adjacent area of the pixel to be processed [25].

Mean filter
In image processing, the average neighborhood method is the most intuitive, simple, and easy to apply denoising method, and it is widely used in image noise processing [26]. The average filtering method replaces the gray value of pixels in the area with the average value of several pixels in the standard, eliminates the pixels that cannot represent the environmental pixel value, and makes the image smoother [27]. Assuming that the image to be processed is m(a, b), T represents the kernel, the total number of pixels in the kernel is represented by S, and the average filtered image is n(a, b), which can be expressed as:

Median filter
Median filtering sorts each pixel in a certain neighborhood of the image and selects an intermediate value to replace all pixels around the neighborhood, instead of simply replacing the average value of these pixels [28]. Assuming the mathematical formula is used and assuming it is the median value of all pixels in the neighborhood of x, then: When k is an odd number, x is equal to the above formula; when k is an even number, the value of x is equal to 1/2 of them.

Template method
The core idea of the template shooting action classification method is to convert the action sequence into a static pattern or a set of static patterns and match it with a known template. Through similarity calculation, the most matching template category is used as the classification result. According to whether the matching object is static mode or static mode, the time series is further divided into template matching and dynamic time warping [29]. Template matching directly compares static templates with existing examples. The features available in the process include spatial features such as contours, gradients, and optical flow, as well as temporal features containing timing information, such as trajectories.

Statistical modeling
Statistical models can generally be divided into two categories: production models and discriminative models. In the model training stage, the production model is trained, the model parameters of different action categories are extracted from the training sample set, and then the observation features to be classified in each model obtained from the previous training are input, and the degree of correspondence with the model is calculated, which is the potential for creation. The final classification result is the behavior category model with the highest matching degree: the discrete model directly models the operator category for the given conditional probability. The most commonly used discrete models are support vector machines and random fields [30]. of athletes are in motion; statistics refer to the situation where the limbs of athletes are completely still. The focus of basketball gesture recognition is to recognize various sports gestures. In order to effectively recognize different sports postures in basketball, the sports postures are gradually divided into two stages. First, according to whether the motion state is periodic, the posture of the human body is divided into two categories: continuous action and instantaneous action. The second step is to divide the body's posture into seven postures of walking, running, dribbling, jumping, shooting, passing, and catching the upper or lower limbs according to whether the state of the action is exercise. The basketball position recognition model automatically recognizes the seven sports positions of basketball players.

Sensor signal collection
Many sensor devices including an accelerometer, a gyroscope, an angular velocity meter, a pressure sensor or the like, in the data collection phase collect body posture information and perform different actions. The basic method of human body movement posture recognition is to install sensors on the key parts of the human body to detect the limb movement information of the human body. In basketball shooting action recognition, the movement information of the legs and arms of the human body is mainly collected. The sensor node formed by the combination of multiple sensor devices can convert the action information during the completion of the action into electrical signals for uploading and fulfill the requirements of subsequent logic operations, data storage, and communication. According to actual application requirements, it is difficult for a single sensor module to meet the work requirements. The information required in human posture recognition is complex and diverse, including physical and physiological information such as acceleration, angular velocity, or heart rate. The internal analysis and processing of the node needs to be completed, so the design of the node needs to include multiple sensor modules, which can be used in conjunction to complete the work requirements of the system. Generally, a sensor node includes four modules, which are mainly composed of four parts: processor module, power module, sensor module, and communication module. The processor module controls the normal operation of each functional module of the sensor node and performs the related processing of each signal; the sensor module realizes the function of detecting the movement information of the object, and realizes the transformation of the movement information to the electrical signal; the communication module is responsible for signal transmission, n nodes transmit wireless data to other devices; the power supply provides the energy for the normal operation of the entire sensor. At present, mobile devices such as mobile phones have also begun to integrate various sensor modules, which have the function of wireless communication. They will replace sensor nodes worn on key parts of the human body for signal collection. Compared with sensor nodes, mobile devices are worn at different locations. Fixed, this will have an impact on the recognition result of the system. When the sensor detects motion information, the device can be placed in a fixed position to avoid this impact.

Shot recognition
The essence of the basketball gesture recognition stage is to construct a classification model process that meets the basketball action data division. For each specific basketball action, after data collection, data preprocessing, data division, and feature extraction, a description of the specified basketball action can be obtained. The attribute set is the feature vector set. These feature vector sets are abstract data sets of basketball actions, and their corresponding classifications can be obtained through calculations in the classifier model. The attributes contained in the feature vector are complex. In order to eliminate irrelevant and redundant attribute values in the feature vector, it is necessary to perform feature selection on the feature vector. In the attribute selection, the first priority search algorithm and principal component analysis method are used. The feature selection realizes the dimensionality reduction of the feature vector, reduces the complexity of the classification calculation process, and improves the work efficiency of the system. In this experiment, sensor nodes are respectively fixed on the lower leg and forearm of the subject to detect the movement behavior information of different limbs. According to the different placement positions of the nodes, the data set of each movement is divided into upper limb movement data set and lower limb movement data set. In the action data set, classifiers are constructed for different sample sets to realize the specific division of the actions of the upper and lower limbs. The combination of the results of the upper and lower limbs can obtain the basketball movement posture of the current subject. In this paper, support vector machine model (SVM) is used to identify basketball shooting techniques. When (SVM) solves two types of classification problems, it will look for an h-dimensional hyperplane in the h − 1-dimensional sample feature space as the segmentation plane for the two types of samples. Usually, this plane is called a linear classifier. When the samples can be distinguished correctly, they are said to be linearly separable. When it is necessary to deal with the case of linear inseparability, SVM will map sample points to higher-dimensional or even infinite-dimensional space. At this time, this mapping is nonlinear, so sample points will become linearly separable in high-dimensional space. In this case, using the k(x, y) function that satisfies the Mercer condition as the inner product operation of the two sample features is equivalent to mapping the sample from the original feature space to a new feature space. Suppose the sample feature is x i , the sample category label is y i , and the Lagrangian coefficient is a i , bcan be obtained by any support vector, then the corresponding optimal classification function is defined as: This article also tries to use the Gaussian mixture model to eliminate interference from the background image of basketball shooting action recognition. In this model, assume that the pixel value of the recognized video at a certain moment t is Y t , and k is the Gaussian distribution number (generally 3 5), ϖ i is the i-th Gaussian distribution weight, μ i, t and σ i, t represent the mean and variance, respectively, g is the Gaussian distribution function, then the random probability corresponding to Y t is: Zhao and Liu EURASIP Journal on Advances in Signal Processing (2021) 2021:21 Page 7 of 14 5 Results and discussion

Results
Use the basketball shooting recognition model to capture the shooting situation, select 20 basketball players for analysis, use sensors to collect signals, and generate related action images from the model, including walking, running, dribbling, jumping, shooting, passing, and receiving. There are a total of seven postures. In this experiment, a total of 140 sets of data are collected, and the shooting situation is drawn into a table, as shown in Table 1: Table 1 shows the goals of each of the 20 basketball players. The actual goals and judgment goals of each player are different. The basketball shooting recognition model is used to capture the players' shooting conditions. The shooting accuracy range is 40-95%.
In order to display the experimental results more intuitively, the data in the table is drawn into a graph, as shown in Fig. 1: Divide the 20 athletes into four groups, each with five people, and plot their basketball goal percentages as a line chart, as shown in Fig. 2: It can be seen from the chart that according to the data collected by the basketball shot recognition model established in this article, among the twenty basketball players, only one has a shooting accuracy higher than 90%, and the shooting accuracy is between 80 and 90%. There are four athletes with shooting accuracy between 70 and 80%, six athletes with accuracy between 60 and 70%, and seven athletes with accuracy below 60%.

Discussion
In the process of collecting sports posture data, testers should complete the required basketball actions according to the default posture of the human body and normal exercises. Every basketball stop action includes upper limb movement and lower limb movement. When tracking sports basketball, it is necessary to analyze the upper and lower limb movements of athletes separately. For this reason, according to the characteristics of the athlete's upper and lower limbs, a classifier is constructed to recognize the posture of basketball players. In this paper, two models of support vector machine and Gaussian mixture model are used to recognize basketball players' shooting actions.
Two parameter values are set: the first parameter is the distance critical value, which determines the number of typical postures (representatives); the second important parameter is the assumed number of each action. The distance threshold is used to determine whether two histograms are different, which affects the hypothetical number of an action. Use the "shooting" experiment to study the influence of this parameter. The "shooting" is chosen because it is relatively short and changeable and has a high probability of hypothetical segmentation. It turns out that when the distance threshold is lower than 0.4, the assumed number is still high and unchanged. On the contrary, when the distance threshold exceeds 0.4, the hypothesis number decreases rapidly. Therefore, the distance threshold should be 0.4-0.8, depending on the required level of granularity. For all tests in this article, this parameter is set to 0.4 to maintain most of the intraclass variation. Based on the action recognition of the support vector machine model and the Gaussian mixture model, use the methods mentioned above to test the obtained data, and draw the test results into graphs, as shown in Figs. 3 and 4: From the above chart and data calculation, it can be seen that in order to obtain the size of the object from the video image and its corresponding position in the image, it must be able to determine the relationship between the corresponding point in the object image and the corresponding point in the image. The commonly used method is image calibration technology. Regardless of whether it is necessary to specify a calibration object, image calibration technology is divided into traditional camera calibration and self-correction methods. Traditional camera calibration methods have certain requirements for camera models. The size and shape of the calibration object should meet certain requirements. Image processing under known conditions, through mathematical transformation and calculation, can obtain the model of the internal and external parameters of the camera. The camera automatic adjustment method does not require a specific calibration object, but is based on the positional relationship between the calibration of the camera's circular image and the corresponding image taken Taking the experimental results of 100 tests as an example, the average accuracy rate of motion capture results for computer vision recognition using the support vector machine model is 95.9%, and the average accuracy rate of motion capture results for computer vision recognition using the Gaussian mixture model is 82.9%. Therefore, the use of support vector machine models for visual recognition and capture of basketball shooting movements has a high accuracy rate. It can be used in the teaching process of basketball coaches and athletes training. It is conducive to more accurately capture shooting-related actions and generate specific images, allowing coaches and athletes observe clearly the defects of the movement and correct them to improve training efficiency.

Conclusions
Computer vision recognition motion capture system is a technical device that measures the movement of objects in space. Its principle is based on computer graphics, which uses sensors or trackers to observe and record the trajectory of objects in threedimensional space. Under the current technical conditions, the fusion motion capture technology of computer vision recognition is used in the research of basketball shooting technology. With the rapid development of computer technology and microelectronics industry, computer vision recognition fusion motion capture technology will be used in sports work. I believe that in the near future, neural network and deep learning technology will be applied to professional sports work. Computer vision recognition system will bring huge changes to traditional basketball teaching and training work. The innovation of this article is to use a variety of methods, such as data analysis method, background difference method, optical flow method, and frame difference method, and design two classifications of shooting actions: template method and statistical model method, which fully integrate computer vision. The recognized motion capture technology is applied to the teaching of basketball, thereby improving the quality of teaching and promoting the development of basketball.
In the early stage of the research, this paper puts forward the method of shooting action recognition. Basketball action recognition is a kind of human body gesture recognition, including background difference method, optical flow method, and frame difference method. The background difference method is suitable when the camera is installed in a static state, and has the characteristics of accurate detection, simple algorithm, and easy implementation; the advantage of optical flow method is that through calculation and analysis, the position of the moving target in the video sequence can be more fully extracted Information and support the movement status of the camera. The frame difference method is mainly suitable for the detection of moving targets and cameras in dynamic scenes of fixed devices. The main disadvantage of this method is that it cannot fully output all the features and detection results related to moving targets. This article also proposes a shooting action image processing method, which is divided into image acquisition and image denoising. Image acquisition is for static image acquisition, that is, to take photos, and the purpose is to obtain images at a certain moment, and the other is for dynamic image acquisition. The purpose of video shooting is to obtain continuous images in a certain period of time. Image denoising is an intermediate process of removing and suppressing noise in the image. Two algorithms, mean filtering and median filtering, are proposed in the article. In addition, the article conceives two methods to classify shooting actions, including template method and statistical model method.
In the experimental stage, this paper first builds a basketball shooting recognition model, then uses sensors to collect signals, and finally establishes a support vector machine model (SVM) and a Gaussian mixture model in the field of shooting motion capture recognition to recognize and perform background images for basketball shooting motion recognition interference elimination processing. Based on the analysis of the experimental part, the article concludes that the average accuracy rate of motion capture using the support vector machine model for computer vision recognition is 95.9%, and the accuracy rate is high. It can be used in the teaching process of basketball coaches and athletes, which helps improve teaching. Training efficiency adds boost to the development of basketball career.
Abbreviations SVM: Support vector machine; FMC: Full matrix capture