Resilient Multimodal Industrial Surface Defect Detection with Uncertain Sensors Availability

Shuai Jiang       Yunfeng Ma      Jingyu Zhou      Yuan Bian      Yaonan Wang      Min Liu     
Hunan University
† Corresponding Author
IEEE TMech 2025
pipeline
Modalities missing scenarios caused by the uncertain availability of multiple sensors
pipeline
Task differentiation diagram. Our proposed MISDD-MM task demands multimodal learning under dynamic missing of RGB and 3D modalities, which differs from static modality-incomplete in MIISDD.

This article introduces a new challenging task that caters more to real-world data acquisition, i.e., multimodal industrial surface defect detection with missing modalities caused by uncertain sensors availability. The proposed resilient MISDD-MM framework is characterized by:

  • Cross-modal prompt learning for adapting learning mode transformation and information vacancy caused by modalities-missing
  • Symmetric contrastive learning for triple-modal (RGB/3D/Text) contrastive pre-training

Abstract

Multimodal industrial surface defect detection (MISDD) aims to identify and locate defect in industrial products by fusing RGB and 3D modalities. This article focuses on modality-missing problems caused by uncertain sensors availability in MISDD. In this context, the fusion of multiple modalities encounters several troubles, including learning mode transformation and information vacancy. To this end, we first propose cross-modal prompt learning, which includes: i) the cross-modal consistency prompt serves the establishment of information consistency of dual visual modalities; ii) the modality-specific prompt is inserted to adapt different input patterns; iii) the missing-aware prompt is attached to compensate for the information vacancy caused by dynamic modalities-missing. In addition, we propose symmetric contrastive learning, which utilizes text modality as a bridge for fusion of dual vision modalities. Specifically, a paired antithetical text prompt is designed to generate binary text semantics, and triple-modal contrastive pre-training is offered to accomplish multimodal learning. Experiment results show that our proposed method achieves 73.83% I-AUROC and 93.05% P-AUROC with a total missing rate 0.7 for RGB and 3D modalities (exceeding state-of-the-art methods 3.84% and 5.58% respectively), and outperforms existing approaches to varying degrees under different missing types and rates. The source code will be available at https://github.com/SvyJ/MISDD-MM.

Framework

The overall flowchart of our proposed framework for MISDD-MM. It consists of three serial phases: (I) Missing modalities configuration, which produced three input patterns through three modalities-missing settings. (II) Cross-modal prompt learning, which includes three specially designed prompts with colored solid lines, i.e., cross-modal consistency prompt, modality-specific prompt, and missing-aware prompt. (III) Symmetric contrastive learning, which performs triple-modal contrastive pre-training to generate defect detection results. Prompt injection occurs at early transformer layers, where input tokens are prepended with three prompts based on current modality availability.

pipeline

Results

I-AUROC (%), P-AUROC (%), and AUPRO (%) scores on MVTec 3D-AD dataset under different missing modality / rate (η). Missing modality = "both" represents RGB images and 3D data are missing with the rate of η/2 respectively.

pipeline

Category-level P-AUROC

pipeline

Visualization

pipeline

Variation Trend of I-AUROC with Different Missing Modalities Rates

pipeline

Few-shot Surface Defect Detection

Few-shot detection performance

pipeline

Citation

@article{jiang2025resilient,
  title={Resilient Multimodal Industrial Surface Defect Detection with Uncertain Sensors Availability},
  author={Jiang, Shuai and Ma, Yunfeng and Zhou, Jingyu and Wang, Yaonan and Liu, Min},
  journal={IEEE/ASME Transactions on Mechatronics},
  year={2025},
  publisher={IEEE}
}