Show simple item record

contributor authorQiu, Yunjian
contributor authorJin, Yan
date accessioned2025-04-21T10:03:35Z
date available2025-04-21T10:03:35Z
date copyright1/29/2025 12:00:00 AM
date issued2025
identifier issn1050-0472
identifier othermd_147_4_041707.pdf
identifier urihttp://yetl.yabesh.ir/yetl1/handle/yetl/4305404
description abstractIn engineering disciplines, leveraging generative language models requires using specialized datasets for training or fine-tuning the preexisting models. Compiling these domain-specific datasets is a complex endeavor, demanding significant human effort and resources. To address the problem of domain-specific dataset scarcity, this study investigates the potential of generative large language models (LLMs) in creating synthetic domain-specific textual datasets for engineering design domains. By harnessing the advanced capabilities of LLMs, such as GPT-4, a systematic methodology was developed to create high-fidelity datasets using designed prompts, evaluated against a manually labeled benchmark dataset through various computational measurements without human intervention. Findings suggest that well-designed prompts can significantly enhance the quality of domain-specific synthetic datasets with reduced manual effort. The research highlights the importance of prompt design in eliciting precise, domain-relevant information and discusses the balance between dataset robustness and richness. It is demonstrated that a language model trained on synthetic datasets can achieve a level of performance comparable to that of human-labeled, domain-specific datasets in terms of quality, offering a strategic solution to the limitations imposed by dataset shortages in engineering domains. The implications for design thinking processes are particularly noteworthy, with the potential to assist designers through GPT-4's structured reasoning capabilities. This work presents a complete guide for domain-specific dataset generation, automated evaluation metrics, and insights into the interplay between data robustness and comprehensiveness.
publisherThe American Society of Mechanical Engineers (ASME)
titleA Method for Synthesizing Ontology-Based Textual Design Datasets: Evaluating the Potential of Large Language Model in Domain-Specific Dataset Generation
typeJournal Paper
journal volume147
journal issue4
journal titleJournal of Mechanical Design
identifier doi10.1115/1.4067478
journal fristpage41707-1
journal lastpage41707-14
page14
treeJournal of Mechanical Design:;2025:;volume( 147 ):;issue: 004
contenttypeFulltext


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record