### 3.1 Improved convolutional neural network model

The traditional CNN model obtains the probability distribution of the input through the gradient descent method and uses multiple convolution and pooling layers to classify the extracted features [24]. The improved CNN in this paper is realized by dimension reduction. In details, this paper decomposes the source image by fast finite shear wave transform, improves the convolutional neural network, determines the number of iterations and the optimal weight of the convolutional neural network through experiments, and then fuses the image through the inverse process of **fast finite shear wave transform**.

First, the training set \( {\left\{{x}_i,{y}_i\right\}}_{i-1}^M \) is used as input, and the output estimation function is:

$$ f\left(C,W,{x}_i\right)={W}^T\psi (x)+b $$

(2)

where *W* is the weight, *B* is the offset, *C* is the number of convolution operations, and *X* is the convolution operation. which is converted into function mapping through the hidden layer. When *L*[*C*, *W*, *x*_{i}] = [*y*_{i} − *Wψ*(*x*) + *b*]^{2} is an objective function, it can be defined as:

$$ P\left(C,W\right)=\sum \limits_{i=1}^ML\left[{y}_i,f\left(C,W,{x}_i\right)\right]+\gamma \frac{{\left\Vert W\right\Vert}^2}{2} $$

(3)

Assuming that the input equals the output, the estimation function is as follows:

$$ f(x)=\sum \limits_{i=1}^MK\left( Cx,{Cx}_i\right)+b $$

(4)

where the kernel function is *K*(*Cx*, *Cx*_{i}) = *ψ*(*Cx*)^{T}*ψ*(*Cx*_{i}), *i* = 1, …, *M* The formula of CNN's maximum pooling layer is as follows:

$$ {S}_{i,j,v}\left(\varphi \right)={\left(\sum \limits_{h=-\left\lfloor k/2\right\rfloor}^{\left\lfloor k/2\right\rfloor}\sum \limits_{w=-\left\lfloor k/2\right\rfloor}^{\left\lfloor k/2\right\rfloor }{\left|{\varphi}_{g\left(h,w,i,j,v\right)}\right|}^p\right)}^{1/p} $$

(5)

where *φ* is the *W×H×G* 3D array of the input image, which is also the feature map of each input image. *H* is the height, *W* is the width, *G* is the number of channels, *k=*2, which *g(h,w,i,j,v)* means the mapping from *R* to *k* at a certain step size. After continuous iteration, when *k*=3 and *R*=2, the result can maintain the invariance of feature scale to a great extent; the formula is:

$$ {u}_{i,j,l}\left(\varphi \right)=f\left(\sum \limits_{h=-\left\lfloor k/2\right\rfloor}^{\left\lfloor k/2\right\rfloor}\sum \limits_{w=-\left\lfloor k/2\right\rfloor}^{\left\lfloor k/2\right\rfloor}\sum \limits_{v=1}^M{\theta}_{h,w,v,l}\cdot \kern0.5em {\varphi}_{g\left(h,w,i,j,v\right)}\right) $$

(6)

where *θ* is the kernel weight, l represents the number of outputs, *i*, *j*, *u* represents the coordinates (*i*, *j*) on the scale *u*, and the activation function is ReLU.

The improved CNN greatly reduces the loss of information and effectively extracts the features and details of each image.

### 3.2 Integration process

The medical fusion algorithm of fast finite shear wave transform and convolutional neural network is divided into four steps.

Firstly, source image A and source image B are decomposed by fast finite shear wave transform to obtained *L*{*A*}^{l} and *L*{*B*}^{l}, where *L*{*A*}^{l} and *L*{*B*}^{l} represent the decomposition of the source image and the source image B at the l layer respectively.

Then, the weights of source image *A* and source image *B* are generated by using the improved convolution neural network, and the result is obtained.

Secondly, calculate the regional energy of the layer where the sum is located, as shown in Eq. (7).

$$ {\displaystyle \begin{array}{c}{E}_A^l\left(x,y\right)=\sum \limits_m\sum \limits_nL{\left\{A\right\}}^l{\left(x+m,y+n\right)}^2\\ {}{E}_B^l\left(x,y\right)=\sum \limits_m\sum \limits_nL{\left\{B\right\}}^l{\left(x+m,y+n\right)}^2\end{array}} $$

(7)

The weight map *W* is input into the *l* layer of fast finite shear wave transform to decompose *L*{*W*}^{l} The floor function is used to *H×W* get the highest number of layers of ⌊log_{2} min(*H*, *W*)⌋ the source image. Then, formula (8) is used for fusion.

$$ L{\left\{C\right\}}^l\left(x,y\right)=G{\left\{W\right\}}^l\left(x,y\right)+\left(1-G{\left\{W\right\}}^l\left(x,y\right)\right)-L{\left\{C\right\}}^l\left(x,y\right) $$

(8)

Finally, the fusion image C is reconstructed by *L*{C}^{1} inverse fast finite shear wave transforms. The flow chart of fusion is shown in Fig. 4.

The above algorithm can reduce the impact of weight allocation on the experiment, and overcome the problems of incomplete and unclear information in the image.