结合视觉基础模型文本特征的遥感目标检测方法

许 越; 吴朝明; 昝露洋; 陈正超

doi:10.20278/j.jc2.2096-0204.2024.0099

结合视觉基础模型文本特征的遥感目标检测方法

Remote Sensing Object Detection Method Combining Text Features of Visual Foundation Models

摘要

摘要: 深度学习技术极大地提高了遥感影像目标检测能力，但由于高分辨率遥感影像所涉及的地理环境复杂、地物要素混杂且动态，“同物异谱”或“异物同谱”现象造成相似地物目标被误检，或者细小地物被漏检。提出一种结合CLIP和YOLOV8的多模态高分辨率遥感影像目标检测模型，通过引入视觉基础模型的文本信息增强对影像场景的理解。实验结果表明，与YOLOV8等模型相比，YOLO-CLIP在相似地物目标的区分中更具优势，细小目标漏检情况有较大改善，泛化性有所提升。

Abstract: While Deep learning technology has significantly enhanced the object detection capability of remote sensing images. high resolution remote sensing images face challenges due to complex geographical environments, the mixed and dynamic ground features, phenomena like "spectral variability"or "spectral confusion" often result in misdetection of similar objects or missed detection of small objects. This paper proposes a multimodal high-resolution remote sensing image object detection model combining CLIP and YOLOV8, which enhances the understanding of image scenes by incorporating text information from visual foundation models. Experimental results show that YOLO-CLIP has a distinct advantage in distinguishing similar objects compared to models like YOLOV8, significantly improving the detection of small targets and enhancing generalization. Key words remote sensing, object detection, deep learning, visual foundation models

HTML全文

参考文献(0)

施引文献

资源附件(0)