您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。 [2024AI研发数字峰会AiDD北京站]:王俊杰-基于多模态大模型的用户界面交互和测试 - 发现报告

王俊杰-基于多模态大模型的用户界面交互和测试

报告封面

基于多模态大模型的用户界面交互和测试 王俊杰中国科学院软件研究所 演讲嘉宾 王俊杰 中国科学院软件研究所研究员,博士生导师 中国科学院软件研究所研究员,博士生导师,中国科学院特聘研究岗位、青年创新促进会会员,主要从事智能化软件工程、软件质量等方面的研究,近年来主要关注智能软件测试、大模型驱动的软件测试等。在国际著名学术期刊/会议发表60余篇高水平学术论文,四次荣获ACM/IEEE杰出论文奖。主持和参与了多项国家自然科学基金项目、科技部重点研发计划、CCF-华为胡杨林基金等。担任CCF A类期刊TSE的Associate Editor,ICSE、FSE、ISSRE等的PCmember,TOSEM、EMSE、AUSE、软件学报等期刊的审稿人。 1.用户界面测试现状和挑战2.测试输入生成技术3.面向测试路径规划的自动化GUI测试技术4.基于多模态大模型的自动化GUI测试技术5.针对文本输入的模糊测试技术6.面向文本输入组件的交互提升技术7.总结和展望 用户界面测试现状和挑战 相关成果 ◆多篇论文发表在软件工程和人机交互领域旗舰会议/期刊ICSE、TSE、CHI等 ◆贝壳找房app、抖音app、华为鸿蒙生态、新能源汽车车载系统进行了应用或对接中 ICSE 2024 ICSE 2024 ICSE 2023 Under submission TSE 2024 FSE2024-SE2030 提纲 ◆用户界面测试现状和挑战 ◆面向测试路径规划的自动化GUI测试技术 ◆基于多模态大模型的自动化GUI测试技术 ◆针对文本输入的模糊测试技术 ◆面向文本输入组件的交互提升技术 Text Input Generation AskLLMto fill in the blank according to the generated prompts Text input generation •Set up linguistic patterns to generate prompts based on the current page Text input generation for mobile app testing •Examples Evaluation •Passing rate: 0.87 •Significant activity boost and122% (51 vs 23) more bugsby added to GUItesting tools 提纲 ◆用户界面测试现状和挑战◆测试输入生成技术◆面向测试路径规划的自动化GUI测试技术◆基于多模态大模型的自动化GUI测试技术◆针对文本输入的模糊测试技术◆面向文本输入组件的交互提升技术 GPTDroid: Function-aware Automatic GUI testing ◆Auto GUI testing with LLM •Formulate the automatic GUI testing problemto an interactive question & answering task •to let the LLM conduct the whole app testingby understanding the GUI semanticinformation and automatically inferringpossible operation steps GPTDroid: Function-aware Automatic GUI testing ◆GUI context extraction ◆GUI prompting and executive commandgeneration ◆Functionality-aware memory prompting •Testing sequence memorizer to record all the detailedinteractive testing information, e.g., the exploredactivities and widgets GPTDroid: Function-aware Automatic GUI testing ◆GUI context extraction •Accurately depict the GUI page currently under test, as well as its contained widgetsinformation from a more micro perspective, and the app information from a moremacro perspective. GPTDroid: Function-aware Automatic GUI testing ◆GUI prompting and executive command generation •Feedback prompt:inform the LLM error occurred and re-try for querying next operation•Command generation:provide LLM with the output template, including available operationsand operation primitives GPTDroid: Function-aware Automatic GUI testing ◆Functionality-aware memory prompting •Build a testing sequence memorizer to record all detailed testing information•Query the LLM about the function-level progress of the testing•Functionality-aware memory prompt Example Evaluation ◆75% average activity coverage, 32% higher than the best baselines◆detects 95 bugs for the 93 apps, 31% higher than the best baselines Evaluation ◆Capability ofGPTDroid •Function-aware exploration through long meaningful testing trace•Prioritization•Valid text inputs & compound operations 提纲 ◆用户界面测试现状和挑战◆测试输入生成技术◆面向测试路径规划的自动化GUI测试技术◆基于多模态大模型的自动化GUI测试技术◆针对文本输入的模糊测试技术◆面向文本输入组件的交互提升技术 ◆Crash-bugs vs. Non-crash functional bugsVisionDroid:Vision-driven Automated MobileGUI Testing ◆Vision-driven, multi-agent collaborativeVisionDroid:Vision-driven Automated MobileGUI Testing automated GUI testing approach for detectingnon-crash functional bugs ◆Explorer Agent: navigates through the app, captures viewhierarchies and screenshots, and guides the explorationtowards diverse GUI pages while focusing on the app’sfunctionalities. ◆Monitor Agent: supervises the testing process, records theexploration history, and triggers the detector agent at theappropriate time. ◆Detector Agent: identifies potential functional bugs byexamining whether there are any issues in the logicaltransitions that occur during GUI page changes ◆Challenge 1: Aligning visual and text for MLLM input.VisionDroid:Vision-driven Automated MobileGUI Testing •Alignment method that integrates text properties withvisual context;•Screenshot annotation method, pay attention todifferent types of actionable widgets, resolve issue ofoverlapping ◆Challenge 2: Functionality-oriented exploration. •Infer and abstract the current functionality fromdetailed exploration sequences, avoids exceeding tokenlimits when interacting with the LLM, enableexploration more focusing on the functionality aspect ◆Challenge 3: Inferring test oracle. •Let the Monitor Agent trigger the Detector Agent atthe end of each functionality exploration•Functionality-aware Chain-of-Thought (COT) to enablethe MLLM to first explicitly infer oracles and thendetect functional bugs based on these inference Explorer Agent Monitor Agent Detector Agent ◆Enriching Detector Prompt with Example•bug description, bug screenshot and naturallanguage described bug reproduction pathwhich facilitates the MLLM understanding ofwhat the non-crash functional bugs are Evaluation ◆50%-72%precision and42%-65% recall◆more than 14%-112% and 108%-147% boost inaverage recall andprecisioncompared withthe best baseline 面向自然语言描述的测试用例迁移 ◆测试需求:我家-添加资产流程