YaBeSH Engineering and Technology Library

    • Journals
    • PaperQuest
    • YSE Standards
    • YaBeSH
    • Login
    View Item 
    •   YE&T Library
    • ASME
    • Journal of Computing and Information Science in Engineering
    • View Item
    •   YE&T Library
    • ASME
    • Journal of Computing and Information Science in Engineering
    • View Item
    • All Fields
    • Source Title
    • Year
    • Publisher
    • Title
    • Subject
    • Author
    • DOI
    • ISBN
    Advanced Search
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Archive

    DesignQA: A Multimodal Benchmark for Evaluating Large Language Models’ Understanding of Engineering Documentation

    Source: Journal of Computing and Information Science in Engineering:;2024:;volume( 025 ):;issue: 002::page 21009-1
    Author:
    Doris, Anna C.
    ,
    Grandi, Daniele
    ,
    Tomich, Ryan
    ,
    Alam, Md Ferdous
    ,
    Ataei, Mohammadmehdi
    ,
    Cheong, Hyunmin
    ,
    Ahmed, Faez
    DOI: 10.1115/1.4067333
    Publisher: The American Society of Mechanical Engineers (ASME)
    Abstract: This research introduces DesignQA, a novel benchmark aimed at evaluating the proficiency of multimodal large language models (MLLMs) in comprehending and applying engineering requirements in technical documentation. Developed with a focus on real-world engineering challenges, DesignQA uniquely combines multimodal data—including textual design requirements, CAD images, and engineering drawings—derived from the Formula SAE student competition. Unlike many existing MLLM benchmarks, DesignQA contains document-grounded visual questions where the input image and the input document come from different sources. The benchmark features automatic evaluation metrics and is divided into segments—Rule Comprehension, Rule Compliance, and Rule Extraction—based on tasks that engineers perform when designing according to requirements. We evaluate state-of-the-art models (at the time of writing) like GPT-4o, GPT-4, Claude-Opus, Gemini-1.0, and LLaVA-1.5 against the benchmark, and our study uncovers the existing gaps in MLLMs’ abilities to interpret complex engineering documentation. The MLLMs tested, while promising, struggle to reliably retrieve relevant rules from the Formula SAE documentation, face challenges in recognizing technical components in CAD images and encounter difficulty in analyzing engineering drawings. These findings underscore the need for multimodal models that can better handle the multifaceted questions characteristic of design according to technical documentation. This benchmark sets a foundation for future advancements in AI-supported engineering design processes. DesignQA is publicly available at online.
    • Download: (1.055Mb)
    • Show Full MetaData Hide Full MetaData
    • Get RIS
    • Item Order
    • Go To Publisher
    • Price: 5000 Rial
    • Statistics

      DesignQA: A Multimodal Benchmark for Evaluating Large Language Models’ Understanding of Engineering Documentation

    URI
    http://yetl.yabesh.ir/yetl1/handle/yetl/4305596
    Collections
    • Journal of Computing and Information Science in Engineering

    Show full item record

    contributor authorDoris, Anna C.
    contributor authorGrandi, Daniele
    contributor authorTomich, Ryan
    contributor authorAlam, Md Ferdous
    contributor authorAtaei, Mohammadmehdi
    contributor authorCheong, Hyunmin
    contributor authorAhmed, Faez
    date accessioned2025-04-21T10:08:52Z
    date available2025-04-21T10:08:52Z
    date copyright12/23/2024 12:00:00 AM
    date issued2024
    identifier issn1530-9827
    identifier otherjcise_25_2_021009.pdf
    identifier urihttp://yetl.yabesh.ir/yetl1/handle/yetl/4305596
    description abstractThis research introduces DesignQA, a novel benchmark aimed at evaluating the proficiency of multimodal large language models (MLLMs) in comprehending and applying engineering requirements in technical documentation. Developed with a focus on real-world engineering challenges, DesignQA uniquely combines multimodal data—including textual design requirements, CAD images, and engineering drawings—derived from the Formula SAE student competition. Unlike many existing MLLM benchmarks, DesignQA contains document-grounded visual questions where the input image and the input document come from different sources. The benchmark features automatic evaluation metrics and is divided into segments—Rule Comprehension, Rule Compliance, and Rule Extraction—based on tasks that engineers perform when designing according to requirements. We evaluate state-of-the-art models (at the time of writing) like GPT-4o, GPT-4, Claude-Opus, Gemini-1.0, and LLaVA-1.5 against the benchmark, and our study uncovers the existing gaps in MLLMs’ abilities to interpret complex engineering documentation. The MLLMs tested, while promising, struggle to reliably retrieve relevant rules from the Formula SAE documentation, face challenges in recognizing technical components in CAD images and encounter difficulty in analyzing engineering drawings. These findings underscore the need for multimodal models that can better handle the multifaceted questions characteristic of design according to technical documentation. This benchmark sets a foundation for future advancements in AI-supported engineering design processes. DesignQA is publicly available at online.
    publisherThe American Society of Mechanical Engineers (ASME)
    titleDesignQA: A Multimodal Benchmark for Evaluating Large Language Models’ Understanding of Engineering Documentation
    typeJournal Paper
    journal volume25
    journal issue2
    journal titleJournal of Computing and Information Science in Engineering
    identifier doi10.1115/1.4067333
    journal fristpage21009-1
    journal lastpage21009-17
    page17
    treeJournal of Computing and Information Science in Engineering:;2024:;volume( 025 ):;issue: 002
    contenttypeFulltext
    DSpace software copyright © 2002-2015  DuraSpace
    نرم افزار کتابخانه دیجیتال "دی اسپیس" فارسی شده توسط یابش برای کتابخانه های ایرانی | تماس با یابش
    yabeshDSpacePersian
     
    DSpace software copyright © 2002-2015  DuraSpace
    نرم افزار کتابخانه دیجیتال "دی اسپیس" فارسی شده توسط یابش برای کتابخانه های ایرانی | تماس با یابش
    yabeshDSpacePersian