Fig. 4From: Level-wise aligned dual networks for text–video retrievalThe text-to-video retrieval results of our LADN and Dual Encoding on the MSR-VTT B partition [18]. The top 3 ranked videos and the ground truth are shown for each query. Additionally, the ground truth is marked with a red box, and the others are marked with a green box. The last column is the predicted concepts corresponding to the second columnBack to article page