Semantic Text Classification for Supporting Automated Compliance Checking in ConstructionSource: Journal of Computing in Civil Engineering:;2016:;Volume ( 030 ):;issue: 001DOI: 10.1061/(ASCE)CP.1943-5487.0000301Publisher: American Society of Civil Engineers
Abstract: Automated regulatory and contractual compliance checking requires automated rule extraction from regulatory and contractual textual documents (e.g., contract specifications). Automated rule extraction is a challenging task that requires complex processing of text. In the proposed automated compliance checking (ACC) approach, the first step in automating the rule extraction process is automatically classifying the different documents and parts of documents (e.g., contract clauses) into predefined categories (environmental, safety, health, etc.) for preparing it for further text analysis and rule extraction. These categories are defined in a semantic model for normative reasoning. This paper presents a semantic, machine learning-based text classification algorithm for classifying clauses and subclauses of general conditions for supporting ACC in construction. The multilabel classification problem was transformed into a set of binary classification problems. Different machine learning algorithms, text preprocessing techniques, methods of text feature scoring, methods of feature weighting, and feature sizes were implemented and evaluated at different thresholds. The developed classifier achieved 100 and 96% recall and precision, respectively, on the testing data.
|
Collections
Show full item record
contributor author | Dareen M. Salama | |
contributor author | Nora M. El-Gohary | |
date accessioned | 2017-05-08T21:40:56Z | |
date available | 2017-05-08T21:40:56Z | |
date copyright | January 2016 | |
date issued | 2016 | |
identifier other | %28asce%29cp%2E1943-5487%2E0000309.pdf | |
identifier uri | http://yetl.yabesh.ir/yetl/handle/yetl/59283 | |
description abstract | Automated regulatory and contractual compliance checking requires automated rule extraction from regulatory and contractual textual documents (e.g., contract specifications). Automated rule extraction is a challenging task that requires complex processing of text. In the proposed automated compliance checking (ACC) approach, the first step in automating the rule extraction process is automatically classifying the different documents and parts of documents (e.g., contract clauses) into predefined categories (environmental, safety, health, etc.) for preparing it for further text analysis and rule extraction. These categories are defined in a semantic model for normative reasoning. This paper presents a semantic, machine learning-based text classification algorithm for classifying clauses and subclauses of general conditions for supporting ACC in construction. The multilabel classification problem was transformed into a set of binary classification problems. Different machine learning algorithms, text preprocessing techniques, methods of text feature scoring, methods of feature weighting, and feature sizes were implemented and evaluated at different thresholds. The developed classifier achieved 100 and 96% recall and precision, respectively, on the testing data. | |
publisher | American Society of Civil Engineers | |
title | Semantic Text Classification for Supporting Automated Compliance Checking in Construction | |
type | Journal Paper | |
journal volume | 30 | |
journal issue | 1 | |
journal title | Journal of Computing in Civil Engineering | |
identifier doi | 10.1061/(ASCE)CP.1943-5487.0000301 | |
tree | Journal of Computing in Civil Engineering:;2016:;Volume ( 030 ):;issue: 001 | |
contenttype | Fulltext |