Multimodal Fusion Roadmap
Multimodal Fusion Roadmap
Main Objective
Reduce false alarms by combining radar, vision, and sound into one final decision system.
Best Practical Algorithms
Radar
Recommended baseline:
- OFDM range-Doppler processing,
- CA-CFAR,
- M/N temporal confirmation,
- simple Kalman or alpha-beta tracking.
Vision
Recommended practical path:
- YOLO11 detector,
- ByteTrack or BoT-SORT style tracking.
Recommended research-heavy alternative:
- RT-DETR.
Audio
Recommended baseline:
- MFCC + spectral features,
- LightGBM or similar classifier.
Recommended upgrade path:
- pretrained embeddings from PANNs,
- or AST/SSAST-style audio spectrogram models.
Best Fusion Strategy
Start with late fusion using calibrated per-modality scores.
Example idea:
P_fused = w_r * P_radar + w_v * P_vision + w_a * P_audio
Then require persistence over multiple windows.
Required Experiments
You must compare:
- radar only,
- radar + vision,
- radar + audio,
- radar + vision + audio.
This is the only way to prove the fusion claim.
Execution Order
- stabilize radar-only baseline,
- add temporal tracking,
- integrate vision,
- integrate audio,
- align timestamps,
- add late fusion,
- run ablations,
- only then try more advanced meta-classifiers.