Why Fusion Works: Cross-Modal Learning
- DetectED

- Apr 6
- 2 min read
Updated: Apr 22
The Clinical Insight
Doctors don't rely on a single test. They combine:
Patient history (risk factors)
Physical exam (palpation, auscultation)
Imaging (X-ray, CT, ultrasound)
Lab tests (biopsy, blood work)
Each test provides different information. Together, they form a complete picture.
THORACIS AI does the same thing with two modalities:
Modality | What It Measures | Clinical Analogy |
Microwave | Dielectric contrast (tumor vs. healthy) | Imaging (CT/X-ray) |
Acoustic | Respiratory sound patterns | Auscultation (stethoscope) |
The Limitation of Single Modalities
Microwave alone (35.6% accuracy):
Detects structural anomalies (tumor vs. no tumor)
Cannot distinguish disease types (asthma vs. pneumonia)
Why? Different diseases can create similar dielectric changes
Acoustic alone (86.4% accuracy):
Classifies functional patterns (wheezes, crackles)
Cannot localize structural anomalies
Why? Different diseases can produce similar sounds
The Fusion Advantage
When we combine them (845 features → XGBoost → 99.3% accuracy), the model learns cross-modal correlations:
If microwave detects... | And acoustic detects... | Fusion learns... |
Tumor | Pneumonia pattern | Tumor causes pneumonia-like obstruction |
Tumor | Asthma pattern | Tumor causes airway narrowing (wheezing) |
No tumor | Pneumonia pattern | Likely infectious pneumonia (no structural mass) |
The Biological Reality
Why would a tumor produce pneumonia-like sounds?
When a tumor grows in the lung, it:
Occupies space (mass effect)
Obstructs airways (partial blockage)
Causes inflammation (immune response)
Accumulates fluid (edema)
These effects create crackles and wheezes—the same acoustic signatures as pneumonia.
So a patient with a tumor might sound like they have pneumonia. A clinician hearing only the sound might misdiagnose. But combining with imaging (microwave) reveals the underlying mass.
That's exactly what THORACIS AI does—automatically.
How Fusion Is Implemented
We use feature-level fusion (concatenation before classification):
Microwave Scan → Feature Extraction → [840 features]
↓
Audio Recording → YAMNet → Classifier → [5 probabilities]
↓
Concatenate → [845 features]
↓
XGBoost → DiagnosisAlternative fusion methods we considered:
Method | Description | Why We Didn't Use It |
Early fusion | Raw data fusion (before feature extraction) | Microwave and audio have different sampling rates/formats |
Late fusion | Combine predictions after separate models | Misses cross-modal correlations |
Feature fusion | Combine extracted features | Captures correlations, easy to implement |
The Results Speak for Themselves
Model | Accuracy | Improvement |
Microwave-only | 35.6% | Baseline |
Acoustic-only | 86.4% | +50.8% |
Fusion | 99.3% | +63.7% |
The fusion model doesn't just add the accuracies (35.6% + 86.4% = 122%—impossible). Instead, it learns something new: the relationship between structure and function.
Clinical Implications
Finding | Interpretation | Recommended Action |
Microwave negative + Acoustic normal | Healthy | Continue routine screening |
Microwave negative + Acoustic abnormal | Functional disease (asthma, COPD, pneumonia) | Treat underlying condition |
Microwave positive + Acoustic abnormal | Tumor causing functional symptoms | Urgent imaging referral |
Microwave positive + Acoustic normal | Early tumor (no symptoms yet) | Monitor, consider biopsy |
This decision support helps clinicians prioritize cases and avoid misdiagnosis.
The Bigger Picture
THORACIS AI demonstrates that multi-modal fusion is more than adding sensors. It's about designing systems that capture complementary information and learning how those signals relate.
The same approach could work for:
Breast cancer (microwave + ultrasound)
Brain tumors (microwave + EEG)
Cardiac disease (ECG + phonocardiogram)
What's Next?
Our next step is 3D image reconstruction—moving from "is there a tumor?" to "where exactly is it located?" Using the 4-antenna array, we can triangulate tumor position and create a 2D/3D heatmap overlay.



Comments