Vision-Language Models for Design Concept Generation: An Actor–Critic FrameworkSource: Journal of Mechanical Design:;2025:;volume( 147 ):;issue: 009::page 91402-1DOI: 10.1115/1.4067619Publisher: The American Society of Mechanical Engineers (ASME)
Abstract: We introduce a novel actor-critic framework that utilizes vision-language models (VLMs) and large language models (LLMs) for design concept generation, particularly for producing a diverse array of innovative solutions to a given design problem. By leveraging the extensive data repositories and pattern recognition capabilities of these models, our framework achieves this goal through enabling iterative interactions between two VLM agents: an actor (i.e., concept generator) and a critic. The actor, a custom VLM (e.g., GPT-4) created using few-shot learning and fine-tuning techniques, generates initial design concepts that are improved iteratively based on guided feedback from the critic—a prompt-engineered LLM or a set of design-specific quantitative metrics. This process aims to optimize the generated concepts with respect to four metrics: novelty, feasibility, problem–solution relevancy, and variety. The framework incorporates both long-term and short-term memory models to examine how incorporating the history of interactions impacts decision-making and concept generation outcomes. We explored the efficacy of incorporating images alongside text in conveying design ideas within our actor–critic framework by experimenting with two mediums for the agents: vision language and language only. We extensively evaluated the framework through a case study using the AskNature dataset, comparing its performance against benchmarks such as GPT-4 and real-world biomimetic designs across various industrial examples. Our findings underscore the framework’s capability to iteratively refine and enhance the initial design concepts, achieving significant improvements across all metrics. We conclude by discussing the implications of the proposed framework for various design domains, along with its limitations and several directions for future research in this domain.
|
Collections
Show full item record
contributor author | Ghasemi, Parisa | |
contributor author | Moghaddam, Mohsen | |
date accessioned | 2025-08-20T09:47:03Z | |
date available | 2025-08-20T09:47:03Z | |
date copyright | 4/2/2025 12:00:00 AM | |
date issued | 2025 | |
identifier issn | 1050-0472 | |
identifier other | md-24-1386.pdf | |
identifier uri | http://yetl.yabesh.ir/yetl1/handle/yetl/4308845 | |
description abstract | We introduce a novel actor-critic framework that utilizes vision-language models (VLMs) and large language models (LLMs) for design concept generation, particularly for producing a diverse array of innovative solutions to a given design problem. By leveraging the extensive data repositories and pattern recognition capabilities of these models, our framework achieves this goal through enabling iterative interactions between two VLM agents: an actor (i.e., concept generator) and a critic. The actor, a custom VLM (e.g., GPT-4) created using few-shot learning and fine-tuning techniques, generates initial design concepts that are improved iteratively based on guided feedback from the critic—a prompt-engineered LLM or a set of design-specific quantitative metrics. This process aims to optimize the generated concepts with respect to four metrics: novelty, feasibility, problem–solution relevancy, and variety. The framework incorporates both long-term and short-term memory models to examine how incorporating the history of interactions impacts decision-making and concept generation outcomes. We explored the efficacy of incorporating images alongside text in conveying design ideas within our actor–critic framework by experimenting with two mediums for the agents: vision language and language only. We extensively evaluated the framework through a case study using the AskNature dataset, comparing its performance against benchmarks such as GPT-4 and real-world biomimetic designs across various industrial examples. Our findings underscore the framework’s capability to iteratively refine and enhance the initial design concepts, achieving significant improvements across all metrics. We conclude by discussing the implications of the proposed framework for various design domains, along with its limitations and several directions for future research in this domain. | |
publisher | The American Society of Mechanical Engineers (ASME) | |
title | Vision-Language Models for Design Concept Generation: An Actor–Critic Framework | |
type | Journal Paper | |
journal volume | 147 | |
journal issue | 9 | |
journal title | Journal of Mechanical Design | |
identifier doi | 10.1115/1.4067619 | |
journal fristpage | 91402-1 | |
journal lastpage | 91402-20 | |
page | 20 | |
tree | Journal of Mechanical Design:;2025:;volume( 147 ):;issue: 009 | |
contenttype | Fulltext |