Skip to main content
Fig. 2 | EURASIP Journal on Advances in Signal Processing

Fig. 2

From: Level-wise aligned dual networks for text–video retrieval

Fig. 2

The framework of our proposed LADN method. LADN first utilizes multi-level encoders to extract global, temporal, local, and spatial–temporal information in videos and text. Then, they are mapped into four different latent spaces and a semantic space. Finally, LADN aligns different levels of information in various spaces

Back to article page