LLM4CAD: Multimodal Large Language Models for Three-Dimensional Computer-Aided Design Generation

Li, Xingang; Sun, Yuewan; Sha, Zhenghui

Source: Journal of Computing and Information Science in Engineering:;2024:;volume( 025 ):;issue: 002::page 21005-1

Author:

Li, Xingang

Sun, Yuewan

Sha, Zhenghui

DOI: 10.1115/1.4067085

Publisher: The American Society of Mechanical Engineers (ASME)

Abstract: The evolution of multimodal large language models (LLMs) capable of processing diverse input modalities (e.g., text and images) holds new prospects for their application in engineering design, such as the generation of 3D computer-aided design (CAD) models. However, little is known about the ability of multimodal LLMs to generate 3D design objects, and there is a lack of quantitative assessment. In this study, we develop an approach to enable LLMs to generate 3D CAD models (i.e., LLM4CAD) and perform experiments to evaluate their efficacy where GPT-4 and GPT-4V were employed as examples. To address the challenge of data scarcity for multimodal LLM studies, we created a data synthesis pipeline to generate CAD models, sketches, and image data of typical mechanical components (e.g., gears and springs) and collect their natural language descriptions with dimensional information using Amazon Mechanical Turk. We positioned the CAD program (programming script for CAD design) as a bridge, facilitating the conversion of LLMs’ textual output into tangible CAD design objects. We focus on two critical capabilities: the generation of syntactically correct CAD programs (Cap1) and the accuracy of the parsed 3D shapes (Cap2) quantified by intersection over union. The results show that both GPT-4 and GPT-4V demonstrate great potential in 3D CAD generation by just leveraging their zero-shot learning ability. Specifically, on average, GPT-4V outperforms when processing only text-based input, exceeding the results obtained using multimodal inputs, such as text with image, for Cap 1 and Cap 2. However, when examining category-specific results of mechanical components, the prominence of multimodal inputs is increasingly evident for more complex geometries (e.g., springs and gears) in both Cap 1 and Cap 2. The potential of multimodal LLMs to improve 3D CAD generation is clear, but their application must be carefully calibrated to the complexity of the target CAD models to be generated.

Download: (1.415Mb)
Show Full MetaData Hide Full MetaData
Get RIS
Item Order
Go To Publisher
Price: 5000 Rial
Statistics

LLM4CAD: Multimodal Large Language Models for Three-Dimensional Computer-Aided Design Generation

URI

http://yetl.yabesh.ir/yetl1/handle/yetl/4305364

Collections

Journal of Computing and Information Science in Engineering

Show full item record

contributor author	Li, Xingang
contributor author	Sun, Yuewan
contributor author	Sha, Zhenghui
date accessioned	2025-04-21T10:02:13Z
date available	2025-04-21T10:02:13Z
date copyright	12/12/2024 12:00:00 AM
date issued	2024
identifier issn	1530-9827
identifier other	jcise_25_2_021005.pdf
identifier uri	http://yetl.yabesh.ir/yetl1/handle/yetl/4305364
description abstract	The evolution of multimodal large language models (LLMs) capable of processing diverse input modalities (e.g., text and images) holds new prospects for their application in engineering design, such as the generation of 3D computer-aided design (CAD) models. However, little is known about the ability of multimodal LLMs to generate 3D design objects, and there is a lack of quantitative assessment. In this study, we develop an approach to enable LLMs to generate 3D CAD models (i.e., LLM4CAD) and perform experiments to evaluate their efficacy where GPT-4 and GPT-4V were employed as examples. To address the challenge of data scarcity for multimodal LLM studies, we created a data synthesis pipeline to generate CAD models, sketches, and image data of typical mechanical components (e.g., gears and springs) and collect their natural language descriptions with dimensional information using Amazon Mechanical Turk. We positioned the CAD program (programming script for CAD design) as a bridge, facilitating the conversion of LLMs’ textual output into tangible CAD design objects. We focus on two critical capabilities: the generation of syntactically correct CAD programs (Cap1) and the accuracy of the parsed 3D shapes (Cap2) quantified by intersection over union. The results show that both GPT-4 and GPT-4V demonstrate great potential in 3D CAD generation by just leveraging their zero-shot learning ability. Specifically, on average, GPT-4V outperforms when processing only text-based input, exceeding the results obtained using multimodal inputs, such as text with image, for Cap 1 and Cap 2. However, when examining category-specific results of mechanical components, the prominence of multimodal inputs is increasingly evident for more complex geometries (e.g., springs and gears) in both Cap 1 and Cap 2. The potential of multimodal LLMs to improve 3D CAD generation is clear, but their application must be carefully calibrated to the complexity of the target CAD models to be generated.
publisher	The American Society of Mechanical Engineers (ASME)
title	LLM4CAD: Multimodal Large Language Models for Three-Dimensional Computer-Aided Design Generation
type	Journal Paper
journal volume	25
journal issue	2
journal title	Journal of Computing and Information Science in Engineering
identifier doi	10.1115/1.4067085
journal fristpage	21005-1
journal lastpage	21005-14
page	14
tree	Journal of Computing and Information Science in Engineering:;2024:;volume( 025 ):;issue: 002
contenttype	Fulltext

YaBeSH Engineering and Technology Library

Archive