A Multi-head Attention Approach with Complementary Multimodal Fusion for Vehicle Detection