YaBeSH Engineering and Technology Library

    • Journals
    • PaperQuest
    • YSE Standards
    • YaBeSH
    • Login
    View Item 
    •   YE&T Library
    • ASME
    • Journal of Mechanical Design
    • View Item
    •   YE&T Library
    • ASME
    • Journal of Mechanical Design
    • View Item
    • All Fields
    • Source Title
    • Year
    • Publisher
    • Title
    • Subject
    • Author
    • DOI
    • ISBN
    Advanced Search
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Archive

    A Method for Synthesizing Ontology-Based Textual Design Datasets: Evaluating the Potential of Large Language Model in Domain-Specific Dataset Generation

    Source: Journal of Mechanical Design:;2025:;volume( 147 ):;issue: 004::page 41707-1
    Author:
    Qiu, Yunjian
    ,
    Jin, Yan
    DOI: 10.1115/1.4067478
    Publisher: The American Society of Mechanical Engineers (ASME)
    Abstract: In engineering disciplines, leveraging generative language models requires using specialized datasets for training or fine-tuning the preexisting models. Compiling these domain-specific datasets is a complex endeavor, demanding significant human effort and resources. To address the problem of domain-specific dataset scarcity, this study investigates the potential of generative large language models (LLMs) in creating synthetic domain-specific textual datasets for engineering design domains. By harnessing the advanced capabilities of LLMs, such as GPT-4, a systematic methodology was developed to create high-fidelity datasets using designed prompts, evaluated against a manually labeled benchmark dataset through various computational measurements without human intervention. Findings suggest that well-designed prompts can significantly enhance the quality of domain-specific synthetic datasets with reduced manual effort. The research highlights the importance of prompt design in eliciting precise, domain-relevant information and discusses the balance between dataset robustness and richness. It is demonstrated that a language model trained on synthetic datasets can achieve a level of performance comparable to that of human-labeled, domain-specific datasets in terms of quality, offering a strategic solution to the limitations imposed by dataset shortages in engineering domains. The implications for design thinking processes are particularly noteworthy, with the potential to assist designers through GPT-4's structured reasoning capabilities. This work presents a complete guide for domain-specific dataset generation, automated evaluation metrics, and insights into the interplay between data robustness and comprehensiveness.
    • Download: (1.563Mb)
    • Show Full MetaData Hide Full MetaData
    • Get RIS
    • Item Order
    • Go To Publisher
    • Price: 5000 Rial
    • Statistics

      A Method for Synthesizing Ontology-Based Textual Design Datasets: Evaluating the Potential of Large Language Model in Domain-Specific Dataset Generation

    URI
    http://yetl.yabesh.ir/yetl1/handle/yetl/4305404
    Collections
    • Journal of Mechanical Design

    Show full item record

    contributor authorQiu, Yunjian
    contributor authorJin, Yan
    date accessioned2025-04-21T10:03:35Z
    date available2025-04-21T10:03:35Z
    date copyright1/29/2025 12:00:00 AM
    date issued2025
    identifier issn1050-0472
    identifier othermd_147_4_041707.pdf
    identifier urihttp://yetl.yabesh.ir/yetl1/handle/yetl/4305404
    description abstractIn engineering disciplines, leveraging generative language models requires using specialized datasets for training or fine-tuning the preexisting models. Compiling these domain-specific datasets is a complex endeavor, demanding significant human effort and resources. To address the problem of domain-specific dataset scarcity, this study investigates the potential of generative large language models (LLMs) in creating synthetic domain-specific textual datasets for engineering design domains. By harnessing the advanced capabilities of LLMs, such as GPT-4, a systematic methodology was developed to create high-fidelity datasets using designed prompts, evaluated against a manually labeled benchmark dataset through various computational measurements without human intervention. Findings suggest that well-designed prompts can significantly enhance the quality of domain-specific synthetic datasets with reduced manual effort. The research highlights the importance of prompt design in eliciting precise, domain-relevant information and discusses the balance between dataset robustness and richness. It is demonstrated that a language model trained on synthetic datasets can achieve a level of performance comparable to that of human-labeled, domain-specific datasets in terms of quality, offering a strategic solution to the limitations imposed by dataset shortages in engineering domains. The implications for design thinking processes are particularly noteworthy, with the potential to assist designers through GPT-4's structured reasoning capabilities. This work presents a complete guide for domain-specific dataset generation, automated evaluation metrics, and insights into the interplay between data robustness and comprehensiveness.
    publisherThe American Society of Mechanical Engineers (ASME)
    titleA Method for Synthesizing Ontology-Based Textual Design Datasets: Evaluating the Potential of Large Language Model in Domain-Specific Dataset Generation
    typeJournal Paper
    journal volume147
    journal issue4
    journal titleJournal of Mechanical Design
    identifier doi10.1115/1.4067478
    journal fristpage41707-1
    journal lastpage41707-14
    page14
    treeJournal of Mechanical Design:;2025:;volume( 147 ):;issue: 004
    contenttypeFulltext
    DSpace software copyright © 2002-2015  DuraSpace
    نرم افزار کتابخانه دیجیتال "دی اسپیس" فارسی شده توسط یابش برای کتابخانه های ایرانی | تماس با یابش
    yabeshDSpacePersian
     
    DSpace software copyright © 2002-2015  DuraSpace
    نرم افزار کتابخانه دیجیتال "دی اسپیس" فارسی شده توسط یابش برای کتابخانه های ایرانی | تماس با یابش
    yabeshDSpacePersian