Adaptive Learning Filters–Embedded Vision Transformer for Pixel-Level Segmentation of Low-Light Concrete CracksSource: Journal of Performance of Constructed Facilities:;2025:;Volume ( 039 ):;issue: 003::page 04025007-1DOI: 10.1061/JPCFEV.CFENG-4952Publisher: American Society of Civil Engineers
Abstract: Crack detection is crucial for assessing structural safety. However, its performance faces challenges when dealing with thin or irregular cracks, especially in complex backgrounds under poor lighting conditions. This paper presents the adaptive learning filters vision transformer (ALF-ViT), a method for pixel-level segmentation of concrete cracks under low-light conditions. This method incorporates two adaptive learning image filter modules based on the vision transformer: a convolutional neural network-based digital image processing (DIP) parameter predictor (C-DIP) and a dilated convolutional guided image filter (DCGIF), aimed at adaptively enhancing images and guiding enhanced segmentation masks to improve the effectiveness of segmentation detection. In experiments conducted on two public data sets and one self-made mixed-lighting data set, ALF-ViT demonstrated superior adaptability and performance under both normal and low-light conditions, achieving a mean intersection over union (mIoU) of 74.5%, a precision of 85.7%, and an F1 score of 80.3% on the publicly available Crack500 data set. On the self-made mixed-lighting data set, ALF-ViT achieves an mIoU of 73.3%. Compared to traditional methods such as U-Net and SegNet, which reach mIoUs of 62.9% and 41.3%, respectively, on similar tasks, ALF-ViT showed significant improvements. It also surpasses other advanced models like DeepLabv3+ and SegNet in both detection accuracy and robustness under variable lighting conditions. These results indicate that the proposed ALF-ViT outperforms recent segmentation networks on both low-light and well-lit crack databases, demonstrating its excellent generalization capability and immense potential for crack detection tasks under low-light conditions.
|
Collections
Show full item record
contributor author | Qi Shen | |
contributor author | Binggang Xiao | |
contributor author | Hongmei Mi | |
contributor author | Jiabin Yu | |
contributor author | Lihua Xiao | |
date accessioned | 2025-08-17T23:03:05Z | |
date available | 2025-08-17T23:03:05Z | |
date copyright | 6/1/2025 12:00:00 AM | |
date issued | 2025 | |
identifier other | JPCFEV.CFENG-4952.pdf | |
identifier uri | http://yetl.yabesh.ir/yetl1/handle/yetl/4307832 | |
description abstract | Crack detection is crucial for assessing structural safety. However, its performance faces challenges when dealing with thin or irregular cracks, especially in complex backgrounds under poor lighting conditions. This paper presents the adaptive learning filters vision transformer (ALF-ViT), a method for pixel-level segmentation of concrete cracks under low-light conditions. This method incorporates two adaptive learning image filter modules based on the vision transformer: a convolutional neural network-based digital image processing (DIP) parameter predictor (C-DIP) and a dilated convolutional guided image filter (DCGIF), aimed at adaptively enhancing images and guiding enhanced segmentation masks to improve the effectiveness of segmentation detection. In experiments conducted on two public data sets and one self-made mixed-lighting data set, ALF-ViT demonstrated superior adaptability and performance under both normal and low-light conditions, achieving a mean intersection over union (mIoU) of 74.5%, a precision of 85.7%, and an F1 score of 80.3% on the publicly available Crack500 data set. On the self-made mixed-lighting data set, ALF-ViT achieves an mIoU of 73.3%. Compared to traditional methods such as U-Net and SegNet, which reach mIoUs of 62.9% and 41.3%, respectively, on similar tasks, ALF-ViT showed significant improvements. It also surpasses other advanced models like DeepLabv3+ and SegNet in both detection accuracy and robustness under variable lighting conditions. These results indicate that the proposed ALF-ViT outperforms recent segmentation networks on both low-light and well-lit crack databases, demonstrating its excellent generalization capability and immense potential for crack detection tasks under low-light conditions. | |
publisher | American Society of Civil Engineers | |
title | Adaptive Learning Filters–Embedded Vision Transformer for Pixel-Level Segmentation of Low-Light Concrete Cracks | |
type | Journal Article | |
journal volume | 39 | |
journal issue | 3 | |
journal title | Journal of Performance of Constructed Facilities | |
identifier doi | 10.1061/JPCFEV.CFENG-4952 | |
journal fristpage | 04025007-1 | |
journal lastpage | 04025007-11 | |
page | 11 | |
tree | Journal of Performance of Constructed Facilities:;2025:;Volume ( 039 ):;issue: 003 | |
contenttype | Fulltext |