Time analysis of the proposed convolution algorithm. The charts describe time (y-axis) spent in particular phases of the proposed algorithm depending on decomposition parameter (x-axis). In the single-GPU implementation there are two parallel timelines, one for CPU, one for GPU. In the multi-GPU implementation two GPUs have been tested. Thus, there are three parallel timelines, one for CPU and one for each GPU. The overall time of the algorithm is equal to the top of the CPU timeline.