YaBeSH Engineering and Technology Library

    • Journals
    • PaperQuest
    • YSE Standards
    • YaBeSH
    • Login
    View Item 
    •   YE&T Library
    • ASCE
    • Journal of Bridge Engineering
    • View Item
    •   YE&T Library
    • ASCE
    • Journal of Bridge Engineering
    • View Item
    • All Fields
    • Source Title
    • Year
    • Publisher
    • Title
    • Subject
    • Author
    • DOI
    • ISBN
    Advanced Search
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Archive

    Improving the Predictive Analytics of Machine-Learning Pipelines for Bridge Infrastructure Asset Management Applications: An Upstream Data Workflow to Address Data Quality Issues in the National Bridge Inventory Database

    Source: Journal of Bridge Engineering:;2024:;Volume ( 029 ):;issue: 001::page 04023103-1
    Author:
    Xi Hu
    ,
    Rayan H. Assaad
    DOI: 10.1061/JBENF2.BEENG-6012
    Publisher: ASCE
    Abstract: The increasing availability of bridge data from the National Bridge Inventory (NBI) offers a great opportunity to perform predictive analytics (such as bridge deterioration prediction) using machine learning (ML) pipelines for supporting bridge asset management. However, data quality issues (e.g., outliers and missing values) can significantly affect ML pipelines, requiring upstream tasks to be performed for ensuring the validity, applicability, and generalizability of pipelines. Among the tasks, outlier removal and missing value imputation are the most challenging due to a highly laborious process, a lack of data governance, and a mixture of heterogenous data quality issues and data types. To address this challenge, this paper proposes an upstream workflow for enhancing the downstream predictive analytics of bridge-related ML pipelines. The proposed upstream workflow was developed based on the NBI data collected for all States in the United States, which includes a total of 617,084 observations/bridges. Existing bridge domain knowledge from multiple sources (such as the bridge design manual and regulations) was leveraged to remove outliers. Then, this study applied and compared 10 statistical and ML-based data imputation techniques to impute missing values. Statistical analysis and imputation evaluation of NBI data indicated that: (1) 19 and 15 out of the total 38 frequently used features or variables had outliers and missing values, respectively; (2) categorical features are generally more prone to data dropping due to inapplicable values, while numeric features are more subjected to outliers; and (3) ML-based data imputation is more suitable than statistical imputation for both numeric and categorical features, especially for features with high missing rate. The proposed workflow was validated on its capability of improving downstream predictive analytics for bridge deck condition prediction, increasing the balanced accuracy by 6.85%–9.76%. This paper contributes to the body of knowledge by offering a novel upstream workflow that can be utilized as a benchmark for guiding researchers and bridge engineering practitioners to handle NBI data quality issues for better preforming predictive analytics using ML pipelines.
    • Download: (1.764Mb)
    • Show Full MetaData Hide Full MetaData
    • Get RIS
    • Item Order
    • Go To Publisher
    • Price: 5000 Rial
    • Statistics

      Improving the Predictive Analytics of Machine-Learning Pipelines for Bridge Infrastructure Asset Management Applications: An Upstream Data Workflow to Address Data Quality Issues in the National Bridge Inventory Database

    URI
    http://yetl.yabesh.ir/yetl1/handle/yetl/4297255
    Collections
    • Journal of Bridge Engineering

    Show full item record

    contributor authorXi Hu
    contributor authorRayan H. Assaad
    date accessioned2024-04-27T22:41:09Z
    date available2024-04-27T22:41:09Z
    date issued2024/01/01
    identifier other10.1061-JBENF2.BEENG-6012.pdf
    identifier urihttp://yetl.yabesh.ir/yetl1/handle/yetl/4297255
    description abstractThe increasing availability of bridge data from the National Bridge Inventory (NBI) offers a great opportunity to perform predictive analytics (such as bridge deterioration prediction) using machine learning (ML) pipelines for supporting bridge asset management. However, data quality issues (e.g., outliers and missing values) can significantly affect ML pipelines, requiring upstream tasks to be performed for ensuring the validity, applicability, and generalizability of pipelines. Among the tasks, outlier removal and missing value imputation are the most challenging due to a highly laborious process, a lack of data governance, and a mixture of heterogenous data quality issues and data types. To address this challenge, this paper proposes an upstream workflow for enhancing the downstream predictive analytics of bridge-related ML pipelines. The proposed upstream workflow was developed based on the NBI data collected for all States in the United States, which includes a total of 617,084 observations/bridges. Existing bridge domain knowledge from multiple sources (such as the bridge design manual and regulations) was leveraged to remove outliers. Then, this study applied and compared 10 statistical and ML-based data imputation techniques to impute missing values. Statistical analysis and imputation evaluation of NBI data indicated that: (1) 19 and 15 out of the total 38 frequently used features or variables had outliers and missing values, respectively; (2) categorical features are generally more prone to data dropping due to inapplicable values, while numeric features are more subjected to outliers; and (3) ML-based data imputation is more suitable than statistical imputation for both numeric and categorical features, especially for features with high missing rate. The proposed workflow was validated on its capability of improving downstream predictive analytics for bridge deck condition prediction, increasing the balanced accuracy by 6.85%–9.76%. This paper contributes to the body of knowledge by offering a novel upstream workflow that can be utilized as a benchmark for guiding researchers and bridge engineering practitioners to handle NBI data quality issues for better preforming predictive analytics using ML pipelines.
    publisherASCE
    titleImproving the Predictive Analytics of Machine-Learning Pipelines for Bridge Infrastructure Asset Management Applications: An Upstream Data Workflow to Address Data Quality Issues in the National Bridge Inventory Database
    typeJournal Article
    journal volume29
    journal issue1
    journal titleJournal of Bridge Engineering
    identifier doi10.1061/JBENF2.BEENG-6012
    journal fristpage04023103-1
    journal lastpage04023103-21
    page21
    treeJournal of Bridge Engineering:;2024:;Volume ( 029 ):;issue: 001
    contenttypeFulltext
    DSpace software copyright © 2002-2015  DuraSpace
    نرم افزار کتابخانه دیجیتال "دی اسپیس" فارسی شده توسط یابش برای کتابخانه های ایرانی | تماس با یابش
    yabeshDSpacePersian
     
    DSpace software copyright © 2002-2015  DuraSpace
    نرم افزار کتابخانه دیجیتال "دی اسپیس" فارسی شده توسط یابش برای کتابخانه های ایرانی | تماس با یابش
    yabeshDSpacePersian