Multimodal Fusion Roadmap

Main Objective

Reduce false alarms by combining radar, vision, and sound into one final decision system.

Best Practical Algorithms

Radar

Recommended baseline:

  1. OFDM range-Doppler processing,
  2. CA-CFAR,
  3. M/N temporal confirmation,
  4. simple Kalman or alpha-beta tracking.

Vision

Recommended practical path:

  1. YOLO11 detector,
  2. ByteTrack or BoT-SORT style tracking.

Recommended research-heavy alternative:

  1. RT-DETR.

Audio

Recommended baseline:

  1. MFCC + spectral features,
  2. LightGBM or similar classifier.

Recommended upgrade path:

  1. pretrained embeddings from PANNs,
  2. or AST/SSAST-style audio spectrogram models.

Best Fusion Strategy

Start with late fusion using calibrated per-modality scores.

Example idea:

P_fused = w_r * P_radar + w_v * P_vision + w_a * P_audio

Then require persistence over multiple windows.

Required Experiments

You must compare:

  1. radar only,
  2. radar + vision,
  3. radar + audio,
  4. radar + vision + audio.

This is the only way to prove the fusion claim.

Execution Order

  1. stabilize radar-only baseline,
  2. add temporal tracking,
  3. integrate vision,
  4. integrate audio,
  5. align timestamps,
  6. add late fusion,
  7. run ablations,
  8. only then try more advanced meta-classifiers.

Core Reference Notes

Built with LogoFlowershow