Abstract: In the field of multimodal learning, controlling the training of unimodal encoders from different perspectives is a primary approach to addressing Training Imbalance. However, the inherent ...