Chinese Named Entity Recognition for Bridge Damage and Defects Based on Text Mining and Natural Language Pretraining ModelsSource: Journal of Construction Engineering and Management:;2025:;Volume ( 151 ):;issue: 006::page 04025060-1DOI: 10.1061/JCEMD4.COENG-16019Publisher: American Society of Civil Engineers
Abstract: Bridge inspection reports are a vital source of data for bridge management and maintenance, encompassing essential structural information indispensable for damage evaluation and decision-making. However, in the process of automatically extracting unstructured textual data and identifying damage entities, because the same type of bridge damage entity often corresponds to multiple structural components, and strong correlations along with prominent nested features exist among entities, general named entity recognition (NER) methods have limited effectiveness. To address these issues, this study introduces a novel method for NER of damage and defects in bridge inspection, leveraging text mining and pretrained natural language models. First, the study constructs a specialized corpus of bridge damage and defects from a large number of bridge inspection reports, and fine-grained entity annotations are performed on sentences describing damage and defects. Next, the study proposes an advanced bridge damage entity recognition model, which integrates pretrained natural language models with deep learning models. The model leverages the Bidirectional Encoder Representations from Transformers (BERT) pretrained model to extract vector features from Chinese characters in damage-related sentences. It then utilizes a bidirectional long short-term memory (BiLSTM) network to capture sequential patterns of multitype entity labels. Finally, it integrates conditional random fields (CRF) to enforce label constraints, generating the optimal label sequence. The model is validated through experiments using the constructed Chinese bridge inspection damage and defect named entity corpus. Experimental results demonstrate that the model proposed in this study surpasses other mainstream NER models, achieving an F1 score of 98.31% and successfully identifying seven categories of fine-grained bridge damage entities. This study not only enhances the automation of extracting information from damage-related bridge inspection text sentences but also establishes a solid foundation for building knowledge graphs in the bridge domain, advancing the development of intelligent bridge management.
|
Show full item record
contributor author | Jiaqi Liu | |
contributor author | Weijie Li | |
contributor author | Fangchang Li | |
contributor author | Xuefeng Zhao | |
date accessioned | 2025-08-17T22:41:04Z | |
date available | 2025-08-17T22:41:04Z | |
date copyright | 6/1/2025 12:00:00 AM | |
date issued | 2025 | |
identifier other | JCEMD4.COENG-16019.pdf | |
identifier uri | http://yetl.yabesh.ir/yetl1/handle/yetl/4307293 | |
description abstract | Bridge inspection reports are a vital source of data for bridge management and maintenance, encompassing essential structural information indispensable for damage evaluation and decision-making. However, in the process of automatically extracting unstructured textual data and identifying damage entities, because the same type of bridge damage entity often corresponds to multiple structural components, and strong correlations along with prominent nested features exist among entities, general named entity recognition (NER) methods have limited effectiveness. To address these issues, this study introduces a novel method for NER of damage and defects in bridge inspection, leveraging text mining and pretrained natural language models. First, the study constructs a specialized corpus of bridge damage and defects from a large number of bridge inspection reports, and fine-grained entity annotations are performed on sentences describing damage and defects. Next, the study proposes an advanced bridge damage entity recognition model, which integrates pretrained natural language models with deep learning models. The model leverages the Bidirectional Encoder Representations from Transformers (BERT) pretrained model to extract vector features from Chinese characters in damage-related sentences. It then utilizes a bidirectional long short-term memory (BiLSTM) network to capture sequential patterns of multitype entity labels. Finally, it integrates conditional random fields (CRF) to enforce label constraints, generating the optimal label sequence. The model is validated through experiments using the constructed Chinese bridge inspection damage and defect named entity corpus. Experimental results demonstrate that the model proposed in this study surpasses other mainstream NER models, achieving an F1 score of 98.31% and successfully identifying seven categories of fine-grained bridge damage entities. This study not only enhances the automation of extracting information from damage-related bridge inspection text sentences but also establishes a solid foundation for building knowledge graphs in the bridge domain, advancing the development of intelligent bridge management. | |
publisher | American Society of Civil Engineers | |
title | Chinese Named Entity Recognition for Bridge Damage and Defects Based on Text Mining and Natural Language Pretraining Models | |
type | Journal Article | |
journal volume | 151 | |
journal issue | 6 | |
journal title | Journal of Construction Engineering and Management | |
identifier doi | 10.1061/JCEMD4.COENG-16019 | |
journal fristpage | 04025060-1 | |
journal lastpage | 04025060-12 | |
page | 12 | |
tree | Journal of Construction Engineering and Management:;2025:;Volume ( 151 ):;issue: 006 | |
contenttype | Fulltext |