A Survey on Foundation-Model-Based Industrial Defect Detection

A Survey on Foundation-Model-Based Industrial Defect Detection
recently, the emergence of foundation models has brought visual and textual semantic prior knowledge
VISUAL defect detection : To enhance the model’s ability to identify anomalous patterns, some methods further explore contrastive learning mechanisms between normal and abnormal feature
근데 문제가 있음
These methods typically rely on a large amount of high-quality training data to establish reliable feature distributions and contrastive relationships. -> 이게 real world에서는 불가능 함 -> 해결, CLIP, GPT, SAM(foundation model for vision segmentation ) based method
The foundation models themselves possess strong capabilities in understanding general vision and language, making it an important issue to explore how to effectively apply their foundational knowledge to industrial detection problems without additional training samples and annotations.
-> 요즘 트렌드는 학습을 잘 시키는 게 아닌가보다. 위 맥락에서도 알 수 있다. 원래 모델 자체가 강력하기에 zero-shot, few-shot으로 넘어갔나보다.
FM 도 두 개의 카테고리로 나뉨 2D, 3D
[2D]
SAM based 2D IAD : SAM provides semantic prior information acquired through extensive pre-training on vast amounts of data, object matching based on the masks generated by SAM is used to identify defect regions.
CLIP based 2D IAD : Semantic matching of short texts and images
GPT based 2D IAD : 이미지에 대한 디테일한 프롬프트를 통해 IAD
CLIP performs multi-modal feature extraction and alignment on image and point cloud data, SAM carries out fine-grained segmentation to isolate potential anomaly regions, and GPT provides semantic understanding and description of the detection results, assisting users in quickly obtaining analytical conclusions.
오케이 갈피 잡았다. 지금부터 CLIP base랑 GPT base를 잘 살펴보자
일단 클립 먼저
winclip -> anomalyclip -> SimCLIP -> Vcp-clip -> Adaclip -> promptad