Show simple item record

contributor authorJungyeon Kim
contributor authorSehwan Chung
contributor authorSeokho Chi
date accessioned2024-04-27T22:46:30Z
date available2024-04-27T22:46:30Z
date issued2024/06/01
identifier other10.1061-JCEMD4.COENG-14273.pdf
identifier urihttp://yetl.yabesh.ir/yetl1/handle/yetl/4297465
description abstractThe growth of the global construction market has attracted international companies to participate in overseas projects. Overseas projects are extremely dynamic with numerous uncertainties, raising the need to collect information about construction in host countries. Due to the vast amounts of text data in the construction industry, an automated method, specifically information retrieval, is required to find the necessary information. Previous studies have suggested automated methods to review various construction documents. However, these studies required substantial manual effort and mainly focused on only one language, resulting in loss of vital information because it is buried in documents written in the host country’s language. To address these limitations, this study proposes a cross-lingual information retrieval (CLIR) framework using pretrained Bidirectional Encoder Representations from Transformers (BERT) models to retrieve information from multilingual construction documents. The proposed framework employs language models (i.e., monolingual, multilingual, and cross-lingual) and trains these models on a construction data set to enhance their ability in construction-specific text. The framework achieved reliable performance of retrieval, even with minimal additional training using domain-specific data. The results indicate that training on the domain data set raises the level of retrieval, increasing the mean reciprocal rank of a specific task by up to 0.2128. With the employment of a monolingual model with machine translation, CLIR in a specific domain could be performed effectively without the need for a labeled data set. The suggested CLIR framework offers a practical alternative for dealing with construction documents in overseas projects, reducing time and cost while improving risk identification and mitigation.
publisherASCE
titleCross-Lingual Information Retrieval from Multilingual Construction Documents Using Pretrained Language Models
typeJournal Article
journal volume150
journal issue6
journal titleJournal of Construction Engineering and Management
identifier doi10.1061/JCEMD4.COENG-14273
journal fristpage04024041-1
journal lastpage04024041-15
page15
treeJournal of Construction Engineering and Management:;2024:;Volume ( 150 ):;issue: 006
contenttypeFulltext


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record