Adaptive Learning Filters–Embedded Vision Transformer for Pixel-Level Segmentation of Low-Light Concrete Cracks

Qi Shen; Binggang Xiao; Hongmei Mi; Jiabin Yu; Lihua Xiao

Source: Journal of Performance of Constructed Facilities:;2025:;Volume ( 039 ):;issue: 003::page 04025007-1

Author:

DOI: 10.1061/JPCFEV.CFENG-4952

Publisher: American Society of Civil Engineers

Abstract: Crack detection is crucial for assessing structural safety. However, its performance faces challenges when dealing with thin or irregular cracks, especially in complex backgrounds under poor lighting conditions. This paper presents the adaptive learning filters vision transformer (ALF-ViT), a method for pixel-level segmentation of concrete cracks under low-light conditions. This method incorporates two adaptive learning image filter modules based on the vision transformer: a convolutional neural network-based digital image processing (DIP) parameter predictor (C-DIP) and a dilated convolutional guided image filter (DCGIF), aimed at adaptively enhancing images and guiding enhanced segmentation masks to improve the effectiveness of segmentation detection. In experiments conducted on two public data sets and one self-made mixed-lighting data set, ALF-ViT demonstrated superior adaptability and performance under both normal and low-light conditions, achieving a mean intersection over union (mIoU) of 74.5%, a precision of 85.7%, and an F1 score of 80.3% on the publicly available Crack500 data set. On the self-made mixed-lighting data set, ALF-ViT achieves an mIoU of 73.3%. Compared to traditional methods such as U-Net and SegNet, which reach mIoUs of 62.9% and 41.3%, respectively, on similar tasks, ALF-ViT showed significant improvements. It also surpasses other advanced models like DeepLabv3+ and SegNet in both detection accuracy and robustness under variable lighting conditions. These results indicate that the proposed ALF-ViT outperforms recent segmentation networks on both low-light and well-lit crack databases, demonstrating its excellent generalization capability and immense potential for crack detection tasks under low-light conditions.

Download: (1.480Mb)
Show Full MetaData Hide Full MetaData
Get RIS
Item Order
Go To Publisher
Price: 5000 Rial
Statistics

Adaptive Learning Filters–Embedded Vision Transformer for Pixel-Level Segmentation of Low-Light Concrete Cracks

URI

http://yetl.yabesh.ir/yetl1/handle/yetl/4307832

Collections

Journal of Performance of Constructed Facilities

Show full item record

contributor author	Qi Shen
contributor author	Binggang Xiao
contributor author	Hongmei Mi
contributor author	Jiabin Yu
contributor author	Lihua Xiao
date accessioned	2025-08-17T23:03:05Z
date available	2025-08-17T23:03:05Z
date copyright	6/1/2025 12:00:00 AM
date issued	2025
identifier other	JPCFEV.CFENG-4952.pdf
identifier uri	http://yetl.yabesh.ir/yetl1/handle/yetl/4307832
description abstract	Crack detection is crucial for assessing structural safety. However, its performance faces challenges when dealing with thin or irregular cracks, especially in complex backgrounds under poor lighting conditions. This paper presents the adaptive learning filters vision transformer (ALF-ViT), a method for pixel-level segmentation of concrete cracks under low-light conditions. This method incorporates two adaptive learning image filter modules based on the vision transformer: a convolutional neural network-based digital image processing (DIP) parameter predictor (C-DIP) and a dilated convolutional guided image filter (DCGIF), aimed at adaptively enhancing images and guiding enhanced segmentation masks to improve the effectiveness of segmentation detection. In experiments conducted on two public data sets and one self-made mixed-lighting data set, ALF-ViT demonstrated superior adaptability and performance under both normal and low-light conditions, achieving a mean intersection over union (mIoU) of 74.5%, a precision of 85.7%, and an F1 score of 80.3% on the publicly available Crack500 data set. On the self-made mixed-lighting data set, ALF-ViT achieves an mIoU of 73.3%. Compared to traditional methods such as U-Net and SegNet, which reach mIoUs of 62.9% and 41.3%, respectively, on similar tasks, ALF-ViT showed significant improvements. It also surpasses other advanced models like DeepLabv3+ and SegNet in both detection accuracy and robustness under variable lighting conditions. These results indicate that the proposed ALF-ViT outperforms recent segmentation networks on both low-light and well-lit crack databases, demonstrating its excellent generalization capability and immense potential for crack detection tasks under low-light conditions.
publisher	American Society of Civil Engineers
title	Adaptive Learning Filters–Embedded Vision Transformer for Pixel-Level Segmentation of Low-Light Concrete Cracks
type	Journal Article
journal volume	39
journal issue	3
journal title	Journal of Performance of Constructed Facilities
identifier doi	10.1061/JPCFEV.CFENG-4952
journal fristpage	04025007-1
journal lastpage	04025007-11
page	11
tree	Journal of Performance of Constructed Facilities:;2025:;Volume ( 039 ):;issue: 003
contenttype	Fulltext

YaBeSH Engineering and Technology Library

Archive