This repository is the official implementation of CMFS, a unified framework that leverages CLIP-guided modality interaction to mitigate noise in multi-modal image fusion and segmentation.
Abstract: Due to the high-quality semantic information provided by the text modality, text-driven models have become the dominant approach for Multimodal Sentiment Analysis (MSA) in recent years.
Abstract: Multimodal emotion recognition (MER) aims to understand human emotions by leveraging multiple modalities. Previous MER methods have focused on learning enhanced multimodal representations ...