Fig. 4From: CenterTransFuser: radar point cloud and visual information fusion for 3D object detectionCross-transformer model mainly includes two parts, i.e., the encoder and the decoder. The query matrices of the radar branch and the image branch, respectively, guide image information and radar information into multihead cross-attention for cross-modal information interaction and then into multihead joint cross-attention for deep contextual interactionBack to article page