This repository is the official implementation of CMFS, a unified framework that leverages CLIP-guided modality interaction to mitigate noise in multi-modal image fusion and segmentation.
Abstract: Due to the high-quality semantic information provided by the text modality, text-driven models have become the dominant approach for Multimodal Sentiment Analysis (MSA) in recent years.
Abstract: Multimodal emotion recognition (MER) aims to understand human emotions by leveraging multiple modalities. Previous MER methods have focused on learning enhanced multimodal representations ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results