description abstract | Crack detection is crucial for assessing structural safety. However, its performance faces challenges when dealing with thin or irregular cracks, especially in complex backgrounds under poor lighting conditions. This paper presents the adaptive learning filters vision transformer (ALF-ViT), a method for pixel-level segmentation of concrete cracks under low-light conditions. This method incorporates two adaptive learning image filter modules based on the vision transformer: a convolutional neural network-based digital image processing (DIP) parameter predictor (C-DIP) and a dilated convolutional guided image filter (DCGIF), aimed at adaptively enhancing images and guiding enhanced segmentation masks to improve the effectiveness of segmentation detection. In experiments conducted on two public data sets and one self-made mixed-lighting data set, ALF-ViT demonstrated superior adaptability and performance under both normal and low-light conditions, achieving a mean intersection over union (mIoU) of 74.5%, a precision of 85.7%, and an F1 score of 80.3% on the publicly available Crack500 data set. On the self-made mixed-lighting data set, ALF-ViT achieves an mIoU of 73.3%. Compared to traditional methods such as U-Net and SegNet, which reach mIoUs of 62.9% and 41.3%, respectively, on similar tasks, ALF-ViT showed significant improvements. It also surpasses other advanced models like DeepLabv3+ and SegNet in both detection accuracy and robustness under variable lighting conditions. These results indicate that the proposed ALF-ViT outperforms recent segmentation networks on both low-light and well-lit crack databases, demonstrating its excellent generalization capability and immense potential for crack detection tasks under low-light conditions. | |