top of page

Why Fusion Works: Cross-Modal Learning

  • Writer: DetectED
    DetectED
  • Apr 6
  • 2 min read

Updated: Apr 22

The Clinical Insight

Doctors don't rely on a single test. They combine:

  • Patient history (risk factors)

  • Physical exam (palpation, auscultation)

  • Imaging (X-ray, CT, ultrasound)

  • Lab tests (biopsy, blood work)


Each test provides different information. Together, they form a complete picture.


THORACIS AI does the same thing with two modalities:

Modality

What It Measures

Clinical Analogy

Microwave

Dielectric contrast (tumor vs. healthy)

Imaging (CT/X-ray)

Acoustic

Respiratory sound patterns

Auscultation (stethoscope)


The Limitation of Single Modalities

Microwave alone (35.6% accuracy):

  • Detects structural anomalies (tumor vs. no tumor)

  • Cannot distinguish disease types (asthma vs. pneumonia)

  • Why? Different diseases can create similar dielectric changes


Acoustic alone (86.4% accuracy):

  • Classifies functional patterns (wheezes, crackles)

  • Cannot localize structural anomalies

  • Why? Different diseases can produce similar sounds


The Fusion Advantage

When we combine them (845 features → XGBoost → 99.3% accuracy), the model learns cross-modal correlations:

If microwave detects...

And acoustic detects...

Fusion learns...

Tumor

Pneumonia pattern

Tumor causes pneumonia-like obstruction

Tumor

Asthma pattern

Tumor causes airway narrowing (wheezing)

No tumor

Pneumonia pattern

Likely infectious pneumonia (no structural mass)


The Biological Reality

Why would a tumor produce pneumonia-like sounds?

When a tumor grows in the lung, it:

  1. Occupies space (mass effect)

  2. Obstructs airways (partial blockage)

  3. Causes inflammation (immune response)

  4. Accumulates fluid (edema)


These effects create crackles and wheezes—the same acoustic signatures as pneumonia.

So a patient with a tumor might sound like they have pneumonia. A clinician hearing only the sound might misdiagnose. But combining with imaging (microwave) reveals the underlying mass.

That's exactly what THORACIS AI does—automatically.


How Fusion Is Implemented

We use feature-level fusion (concatenation before classification):


Microwave Scan → Feature Extraction → [840 features]
                                         ↓
Audio Recording → YAMNet → Classifier → [5 probabilities]
                                         ↓
                              Concatenate → [845 features]
                                         ↓
                              XGBoost → Diagnosis
Alternative fusion methods we considered:

Method

Description

Why We Didn't Use It

Early fusion

Raw data fusion (before feature extraction)

Microwave and audio have different sampling rates/formats

Late fusion

Combine predictions after separate models

Misses cross-modal correlations

Feature fusion

Combine extracted features

Captures correlations, easy to implement


The Results Speak for Themselves

Model

Accuracy

Improvement

Microwave-only

35.6%

Baseline

Acoustic-only

86.4%

+50.8%

Fusion

99.3%

+63.7%

The fusion model doesn't just add the accuracies (35.6% + 86.4% = 122%—impossible). Instead, it learns something new: the relationship between structure and function.


Clinical Implications

Finding

Interpretation

Recommended Action

Microwave negative + Acoustic normal

Healthy

Continue routine screening

Microwave negative + Acoustic abnormal

Functional disease (asthma, COPD, pneumonia)

Treat underlying condition

Microwave positive + Acoustic abnormal

Tumor causing functional symptoms

Urgent imaging referral

Microwave positive + Acoustic normal

Early tumor (no symptoms yet)

Monitor, consider biopsy

This decision support helps clinicians prioritize cases and avoid misdiagnosis.


The Bigger Picture

THORACIS AI demonstrates that multi-modal fusion is more than adding sensors. It's about designing systems that capture complementary information and learning how those signals relate.

The same approach could work for:

  • Breast cancer (microwave + ultrasound)

  • Brain tumors (microwave + EEG)

  • Cardiac disease (ECG + phonocardiogram)


What's Next?

Our next step is 3D image reconstruction—moving from "is there a tumor?" to "where exactly is it located?" Using the 4-antenna array, we can triangulate tumor position and create a 2D/3D heatmap overlay.

Comments


bottom of page