Vision-Language Models for Design Concept Generation: An Actor–Critic Framework

Ghasemi, Parisa; Moghaddam, Mohsen

Source: Journal of Mechanical Design:;2025:;volume( 147 ):;issue: 009::page 91402-1

Author:

Ghasemi, Parisa

Moghaddam, Mohsen

DOI: 10.1115/1.4067619

Publisher: The American Society of Mechanical Engineers (ASME)

Abstract: We introduce a novel actor-critic framework that utilizes vision-language models (VLMs) and large language models (LLMs) for design concept generation, particularly for producing a diverse array of innovative solutions to a given design problem. By leveraging the extensive data repositories and pattern recognition capabilities of these models, our framework achieves this goal through enabling iterative interactions between two VLM agents: an actor (i.e., concept generator) and a critic. The actor, a custom VLM (e.g., GPT-4) created using few-shot learning and fine-tuning techniques, generates initial design concepts that are improved iteratively based on guided feedback from the critic—a prompt-engineered LLM or a set of design-specific quantitative metrics. This process aims to optimize the generated concepts with respect to four metrics: novelty, feasibility, problem–solution relevancy, and variety. The framework incorporates both long-term and short-term memory models to examine how incorporating the history of interactions impacts decision-making and concept generation outcomes. We explored the efficacy of incorporating images alongside text in conveying design ideas within our actor–critic framework by experimenting with two mediums for the agents: vision language and language only. We extensively evaluated the framework through a case study using the AskNature dataset, comparing its performance against benchmarks such as GPT-4 and real-world biomimetic designs across various industrial examples. Our findings underscore the framework’s capability to iteratively refine and enhance the initial design concepts, achieving significant improvements across all metrics. We conclude by discussing the implications of the proposed framework for various design domains, along with its limitations and several directions for future research in this domain.

Download: (1.267Mb)
Show Full MetaData Hide Full MetaData
Get RIS
Item Order
Go To Publisher
Price: 5000 Rial
Statistics

Vision-Language Models for Design Concept Generation: An Actor–Critic Framework

URI

http://yetl.yabesh.ir/yetl1/handle/yetl/4308845

Collections

Journal of Mechanical Design

Show full item record

contributor author	Ghasemi, Parisa
contributor author	Moghaddam, Mohsen
date accessioned	2025-08-20T09:47:03Z
date available	2025-08-20T09:47:03Z
date copyright	4/2/2025 12:00:00 AM
date issued	2025
identifier issn	1050-0472
identifier other	md-24-1386.pdf
identifier uri	http://yetl.yabesh.ir/yetl1/handle/yetl/4308845
description abstract	We introduce a novel actor-critic framework that utilizes vision-language models (VLMs) and large language models (LLMs) for design concept generation, particularly for producing a diverse array of innovative solutions to a given design problem. By leveraging the extensive data repositories and pattern recognition capabilities of these models, our framework achieves this goal through enabling iterative interactions between two VLM agents: an actor (i.e., concept generator) and a critic. The actor, a custom VLM (e.g., GPT-4) created using few-shot learning and fine-tuning techniques, generates initial design concepts that are improved iteratively based on guided feedback from the critic—a prompt-engineered LLM or a set of design-specific quantitative metrics. This process aims to optimize the generated concepts with respect to four metrics: novelty, feasibility, problem–solution relevancy, and variety. The framework incorporates both long-term and short-term memory models to examine how incorporating the history of interactions impacts decision-making and concept generation outcomes. We explored the efficacy of incorporating images alongside text in conveying design ideas within our actor–critic framework by experimenting with two mediums for the agents: vision language and language only. We extensively evaluated the framework through a case study using the AskNature dataset, comparing its performance against benchmarks such as GPT-4 and real-world biomimetic designs across various industrial examples. Our findings underscore the framework’s capability to iteratively refine and enhance the initial design concepts, achieving significant improvements across all metrics. We conclude by discussing the implications of the proposed framework for various design domains, along with its limitations and several directions for future research in this domain.
publisher	The American Society of Mechanical Engineers (ASME)
title	Vision-Language Models for Design Concept Generation: An Actor–Critic Framework
type	Journal Paper
journal volume	147
journal issue	9
journal title	Journal of Mechanical Design
identifier doi	10.1115/1.4067619
journal fristpage	91402-1
journal lastpage	91402-20
page	20
tree	Journal of Mechanical Design:;2025:;volume( 147 ):;issue: 009
contenttype	Fulltext

YaBeSH Engineering and Technology Library

Archive