Show simple item record

contributor authorTuyen Le
contributor authorH. David Jeong
date accessioned2017-12-16T09:17:23Z
date available2017-12-16T09:17:23Z
date issued2017
identifier other%28ASCE%29CP.1943-5487.0000701.pdf
identifier urihttp://138.201.223.254:8080/yetl1/handle/yetl/4241012
description abstractThe inconsistency of data terminology has imposed big challenges on integrating transportation project data from distinct sources. Differences in meaning of data elements may lead to miscommunication between data senders and receivers. Semantic relations between terms in digital dictionaries, such as ontologies, can enable the semantics of a data element to be transparent and unambiguous to computer systems. However, because of the lack of effective automated methods, identifying these relations is labor intensive and time consuming. This paper presents a novel integrated methodology that leverages multiple computational techniques to extract heterogeneous American-English data terms used in different highway agencies and their semantic relations from design manuals and other technical specifications. The proposed method implements natural language processing (NLP) to detect data elements from text documents and uses machine learning to determine the semantic relatedness among terms using their occurrence statistics in a corpus. The study also consists of developing an algorithm that classifies semantically related terms into three different lexical groups including synonymy, hyponymy, and meronymy. The key merit in this technique is that the detection of semantic relations uses only linguistic information in texts and does not depend on other existing hand-coded semantic resources. A case study was undertaken that implemented the proposed method on a 16-million-word corpus of roadway design manuals to extract and classify roadway data items. The developed classifier was evaluated using a human-encoded test set, and the results show an overall performance of 92.76% in precision and 81.02% recall.
publisherAmerican Society of Civil Engineers
titleNLP-Based Approach to Semantic Classification of Heterogeneous Transportation Asset Data Terminology
typeJournal Paper
journal volume31
journal issue6
journal titleJournal of Computing in Civil Engineering
identifier doi10.1061/(ASCE)CP.1943-5487.0000701
treeJournal of Computing in Civil Engineering:;2017:;Volume ( 031 ):;issue: 006
contenttypeFulltext


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record