Timeline: single-GPU versus multi-GPU (a model situation). The dark boxes depict data transfers between CPU and GPU while the light boxes represent convolution computations. In the first row there is the single-GPU implementation. In the second row there is a timeline for parallel usage of two GPUs. The data transfers are performed concurrently but through a common bus, therefore they last twice longer. For the third row the data transfers are synchronized so that only one transfer is made at a time. In the last row the data transfers are overlapped with a convolution execution.